Introduction
The Cerrado biome covers 22% of the Brazilian territory, is characterised by its great biodiversity of species and recognised as the richest savannah in the world. However, of all the hotspots in the world, it has the lowest percentage of fully protected areas (8.21%), mainly due to agribusiness pressure and the clearing of production areas (Ministério do Meio Ambiente, 2022).
This predatory exploitation has created a scenario of vulnerability for forest species, in addition to the loss of products such as fruits, resins and oils, among others, which could be used in food production, pharmacological products and cosmetics through sustainable management (Strassburg et al., Reference Strassburg, Brooks, Feltran-Barbieri, Iribarrem, Crouzeilles, Loyola, Latawiec, Oliveira Filho, Scaramuzza, Scarano, Soares Filho and Balmford2017). One species native to the Cerrado that has potential for sustainable management is Annona crassiflora Mart., popularly known as marolo or araticum (Cohen et al., Reference Cohen, Sano, Silva and Melo2010).
A. crassiflora is a fruit-bearing tree species of the Annonaceae family with a wide distribution in Brazilian regions. It has hermaphrodite flowers, an allogamous reproductive system favoured by insects (mainly beetles) and zoochoric seed dispersal (Cavalcante et al., Reference Cavalcante, Naves, Franceschinelli and Silva2009). The species provides ecological services and has economic potential, mainly through its fruits. They serve as a food source for animal species, mainly birds. They also have good sensory acceptance for human consumption, and high nutritional quality, containing bioactive compounds (Arruda et al., Reference Arruda, Botrel, Fernandes and Almeida2016). The fruits can be eaten fresh or processed, and are a source of food for communities in Cerrado areas, as well as being sold in local markets (Pereira et al., Reference Pereira, Lemes, Castro, Peixoto, Safadi and Oliveira2022). In addition, its wood contains alkaloids such as atherospermidine and liriodenine, which have antimicrobial activities against human hepatoma cells (Inoue et al., Reference Inoue, Santana, Vilhena, Souza Filho, Guilhon, Possamai, Silva and Dallacort2010).
Despite the potential of the species, the exploitation of native Cerrado areas and the fragmentation of this biome (MMA, 2022) have resulted in the spatial isolation of forest remnants, which may lead to a decrease in the number of A. crassiflora individuals. Spatial isolation hinders or prevents migration between populations, reducing gene flow and increasing differentiation between them (Santos and Oliveira, Reference Santos, Oliveira, Oliveira and Cruz2020). In addition, reduction in population size favours genetic drift, which randomly alters allele frequencies and promotes allelic fixation and loss. These factors reduce genetic diversity and alter the genetic structure of populations, making them more vulnerable to adverse events such as pests, diseases and climate change (Ellegren and Galtier, Reference Ellegren and Galtier2016).
In this context, molecular markers can detect genetic variation, assisting decision-making related to the conservation and management of natural populations (Filippos, Reference Filippos2016). Different types of molecular markers are available, but inter-simple sequence repeats (ISSRs), characterised as dominant, universal and multilocus, are recognised for covering a high level of polymorphism distributed throughout the genome (Turchetto-Zolet et al., Reference Turchetto-Zolet, Turchetto, Zanella and Passaia2017). Furthermore, ISSRs have been successfully used to determine the diversity and genetic structure of the genus Annona (Gwinner et al., Reference Gwinner, Setotaw, Rodrigues, França, Silveira, Pio and Pasqual2016; Samaradiwakara et al., Reference Samaradiwakara, Samarasinghe, Shantha, Jayarathna, Dehigaspitiya and Ubeysekera2020; Sá et al., Reference Sá, Lima, Viana, Lopes, Carvalho, Valente and Lima2022).
Therefore, the aim of this study was to assess the diversity and genetic structure of natural populations of the A. crassiflora species using ISSR molecular markers to provide information for use in conservation and genetic breeding programmes for the species.
Materials and methods
Sampling and collection sites
The sampling sites were selected in the state of Minas Gerais, covering the areas where A. crassiflora grows. To this end, eight populations were identified, varying between ecological reserves and pastures, both with an area of less than 50 ha, where 24 individuals of the species were randomly sampled per population, for a total of 192 adult trees (Table 1). Variations in climatic conditions were observed depending on the location of the collection area, with mean annual temperatures ranging from 18.2 (Carmo da Cachoeira) to 22.7°C (Januária), and annual precipitation from 972 (Grão Mongol) to 1582 mm (Carmo da Cachoeira) (Alvares et al., Reference Alvares, Stape, Sentelhas, Gonçalves and Sparovek2013).
DNA extraction
Leaf tissue samples from each selected individuals were used to extract genomic DNA using the method of Moog and Bond (Reference Moog and Bond2003). The concentration and purity of the extracted DNA was then assessed by spectrophotometry in a NanoDrop 2000C (Thermo Scientific), using the A 260/A 280 ratio (1.80 ≤ A 260/A 280 ≥ 2.00) as a quality parameter (Aguilar et al., Reference Aguilar, López, Aceituno, Ávila, Guerreiro and Quesada2016).
ISSR genotyping
Polymerase chain reactions (PCRs) were performed using DNA aliquots from individuals at a final concentration of 10 ng/μl and 10 ISSR primers (A16; JOHN; MANNY; and UBCs 807; 834; 835; 840; 841; 855 and 857) developed by the University of British Columbia, Vancouver, Canada. The total reaction volume was 12 μl, containing 2 μl of genomic DNA, 1.2 μl of 10× PCR buffer (500 mM Tris-HCl pH 8.0, 200 mM KCl, 2.5 mg/ml BSA, 200 mM tartrazine and 1% Ficol), 1.2 μl of dNTP + MgCl2 (2.5 mM dNTP, 25 mM MgCl2), 0.15 μl of Taq DNA polymerase (5 U/μl) and 2 μl of each primer (2 mM).
Amplifications were performed in a thermocycler (GeneAmp PCR System 9700) according to protocols previously used for other forest species (Silva Júnior et al., Reference Silva Júnior, Cabral, Sartori, Souza, Miranda, Caldeira, Moreira and Godinho2020; Vieira et al., Reference Vieira, Souza, Silva Júnior, Alves, Miranda, Moreira and Caldeira2022), including an initial denaturation step (5 min at 94°C), followed by 37 cycles of denaturation (15 s at 94°C), annealing (30 s at 47°C) and extension (1 min at 72°C). At the end of the last cycle, a final extension was performed (7 min at 72°C).
The amplified fragments were separated by electrophoresis on a 1.5% agarose gel with 1× TBE buffer (10.8 g/l Tris base; 5.5 g/l boric acid; 0.83 g/l EDTA) at 100 V for 4 h. The gels were stained with ethidium bromide (5 mg/ml) and photographed under ultraviolet light in a photodocumenter (UVP DigiDoc-It System) linked to UVP Doc-Itls image analysis software. The size of the fragments was estimated by comparison with the molecular weight marker 100 pb ladder (Ludwig Biotec).
Data analysis
As ISSR markers are dominant markers, a binary matrix was generated where the presence of fragments is scored as 1, i.e. the locus is considered homozygous dominant or heterozygous, and the absence as 0, indicating homozygous recessive (no amplification). The binary matrix was then used to perform a bootstrap analysis, which assesses the optimal number of fragments required to perform the study. For each locus, the number of fragments obtained was counted and the total number of fragments (N F) was also determined. In addition, the informativeness of the ISSR markers was assessed through the polymorphic information content (PIC), which was carried out for each primer and for the entire set of primers. The Genes program (Cruz, Reference Cruz2016) was used for the optimal number of fragments and PIC analyses.
Genetic dissimilarity (DG) between pairs of individuals was determined using the Jaccard coefficient. For populations and combined data, pairs of individuals with minimum (IDGmin) and maximum (IDGmax) genetic dissimilarities were identified and the corresponding values were found (DGmin−max). The DG matrix between individuals was used to construct a dendrogram, obtained by the unweighted grouping of means (UPGMA) method, with a cut-off point proposed by Mojena (Reference Mojena1977), establishing the coefficient k = 1.25. The consistency between the dissimilarity matrix and the clusters presented in the dendrogram was evaluated by the cophenetic correlation coefficient (CCC). The Genes program (Cruz, Reference Cruz2016) was used for these analyses. R software (R Core Team, 2016) was used to generate the circular dendrogram, using the packages vegan (Oksanen et al., Reference Oksanen, Blanchet, Friendly, Kindt, Legendre, Mcglinn, Minchin, O'hara, Simpson, Solymos, Stevens, Szoecs and Wagner2018), cluster (Maechler et al., Reference Maechler, Rousseeuw, Struyf, Hubert, Hornik, Studer, Roudier, Gonzalez, Kozlowski, Schubert and Murphy2019), dendextend (Galili et al., Reference Galili, Benjamini, Simpson, Jefferis, Gallota, Renaudie, Hornik, Ligges, Spiess, Horvath, Langfelder, Loo, Vries, Gu, Cath, Ma, Krzysiek, Hummel, Clark, Graybuck, Jdetribol, Ho, Perreault, Hennig, Bradley, Huang and Schupp2020), extrafactor (Kassambara and Mundt, Reference Kassambara and Mundt2017), ggpubr (Kassambara, Reference Kassambara2020), cowplot (Wilke, Reference Wilke2019) and gridExtra (Auguie and Antonov, Reference Auguie and Antonov2017).
Levels of genetic diversity were measured for populations and combined data using Popgene 3.2 software (Yeh and Boyle, Reference Yeh and Boyle1997). The parameters calculated were the number of observed alleles (A O), number of effective alleles (A E), Nei diversity index (H*) (Nei, Reference Nei1978) and Shannon index (I*) (Shannon and Weaver, Reference Shannon and Weaver1949).
Genetic structuring of the populations and combined data was assessed by the analysis of molecular variance (AMOVA) with two hierarchical levels (Excoffier et al., Reference Excoffier, Smouse and Quattro1992). In addition, considering populations pairwise, gene flow values (N m) were determined using Arlequin 3.5 software (Excoffier and Lischer, Reference Excoffier and Lischer2010) and geographic distance in kilometres by GPS TrackMaker software (Trackmaker, 2022), which were evaluated for correlation by the Mantel test with 1000 permutations (Mantel, Reference Mantel1967) using the Genes program (Cruz, Reference Cruz2016). Finally, a Bayesian approach was used to determine the number of genetic groups (K) using Structure 2.3 software (Falush et al., Reference Falush, Stephens and Pritchard2007). The number of groups (K) was set to vary between 1 and 11, with 20 samples taken for each K value and 10,000 Monte Carlo interactions via Markov chains. Data were exported to Structure Harvester software (Earl and Vonholdt, Reference Earl and Vonholdt2012), and the best value of K was indicated by the ad hoc ΔK method (Evanno et al., Reference Evanno, Regnaut and Goudet2005).
Results
Efficiency of ISSR markers
The analysis of the optimal number of fragments resulted in a power of 0.049 and a correlation of 0.948, where 41 polymorphic fragments would be sufficient to assess the diversity and genetic structure between and within populations of the A. crassiflora species. Furthermore, the efficiency of the ISSR markers was also evaluated by the PIC, with values ranging from 0.34 for the John primer to 0.47 for the A16 primer, with an average of 0.39. The number of fragments per primer was also determined, ranging from 3 to 8, with a total of 61 fragments (Table 2).
N F, total number of fragments; PIC, polymorphic information content.
a H = A, T or C; R = A or G; V = A, C or G and Y = C or T.
Dissimilarity and genetic diversity
The DG calculated between individuals within populations showed equality in the genetic makeup of some pairs of individuals (IDGmin = 0.00) occurring in the Morro da Garça (MG), Carmo da Cachoeira (CC), Grão Mongol (GM) and Januária (JAN) populations, and the maximum value (IDGmax = 0.55) was observed between the pairs of individuals CVB3 × CVB23 and MG10 × MG23, located in the Curvelo B (CVB) and MG populations, respectively. For the combined data, the maximum genetic dissimilarity (IDGmax = 0.63) was observed between the pair of individuals CVB4 × CC21, located in the CVB and CC populations, respectively (Table 3).
IDGmin, pair of individuals with minimal genetic dissimilarity; IDGmax, pair of individuals with maximum genetic dissimilarity; DGmin−max, minimum and maximum genetic dissimilarity; A O, number of alleles observed; A E, number of effective alleles; H*: Nei genetic diversity index; I*: Shannon index.
The cluster analysis of the DG between individuals, obtained by the UPGMA method, resulted in 11 groups, with a cut-off point of 0.317. The largest group, called G1, consisted of 70 individuals, 18 from the GM population, 17 from the JAN, 10 from the Montes Claros (MC), 8 from the MG, 5 from the Curvelo C (CVC), 5 from the CC, 4 from the CVB and 3 from the Curvelo A (CVA). Other large groups were G2, G3, G4 and G5 with 40, 16, 30 and 19 individuals. The two smaller groups consisted of only one individual each, both identified in the MG population. The CCC was 87% (Fig. 1).
When the populations were evaluated individually, the highest values for the number of observed (A O) and effective (A E) alleles were found for the CVB population, while the lowest values were found for the JAN population The genetic diversity index of Nei (H*) and Shannon (I*) determined for the individual populations resulted in higher values of genetic diversity for the CVB population, which also indicated high levels contained in the CVA, MG and CC populations. For the combined data, there was an increase in the number of observed alleles (A O = 2.00); however, the number of effective alleles (A E = 1.61) remained lower than the resulting alleles in the CVB and MG populations. The genetic diversity parameters, on the other hand, were higher when calculated for the combined data, resulting in high genetic diversity, as indicated by the values of H* and I*, which were 0.35 and 0.52, respectively (Table 3).
Genetic structure
The AMOVA gave a global estimate of ΦST equal to 0.1439, i.e. 14.39% of the total genetic variation was found between populations. Therefore, most of the genetic variation (85.61%) is within populations. The highest value determined for the average number of migrants (N m = 7.03) between paired populations was observed between the CVC and MG populations, which are geographically separated by approximately 31.11 km, indicating gene flow between them. The lowest value (N m = 2.15) was between the CVC and JAN populations, separated by 364.03 km (Table 4). The value of N m for the combined data was 2.03.
N m values are shown on the upper diagonal, and values of geographic distances between populations are shown on the lower diagonal.
The Mantel test revealed a negative (−0.5576) and significant correlation at 1% probability between the values of gene flow (N m) and geographic distances (km), indicating that the smaller the geographic distance, the greater the gene flow between populations.
The Bayesian approach resulted in four genetic groups (k = 4) (Fig. 2(a)). The genetic groupings represented by graphs for each population show that they all have the four genetic groups. However, for the GM and JAN populations there was a predominance of one genetic group indicated in dark blue (Fig. 2(b)).
Discussion
Genetic dissimilarity
Most primers evaluated individually were moderately informative (PIC values between 0.25 and 0.45 according to Tatikonda et al., Reference Tatikonda, Wani, Kannan, Beerelli, Sreedevi, Hoisington, Devi and Varshney2009). Furthermore, when evaluated together (Table 2), they were also classified as moderate and were therefore sufficient for genetic characterisation in A. crassiflora populations. However, it is important to note that although ISSR markers are useful for analysing genetic diversity, they have some limitations. These include their dominant nature, which prevents homozygous and heterozygous individuals from being distinguished, and their sensitivity to amplification conditions, which can lead to difficulties in reproducing results (Turchetto-Zolet et al., Reference Turchetto-Zolet, Turchetto, Zanella and Passaia2017). These limitations must be taken into account when interpreting the data, especially in studies requiring high genetic precision.
Analysis of DG between individuals of A. crassiflora revealed pairs with values equal to 0 (Table 2), suggesting possible self-fertilisation and/or asexual reproduction. The species is characterised as allogamous (Cavalcante et al., Reference Cavalcante, Naves, Franceschinelli and Silva2009), with self-fertilisation occurring at very low rates (<5%) in the absence of self-incompatibility. According to Kiill and Costa (Reference Kiill and Costa2003), A. crassiflora is self-incompatible; however, it is dichogamous and the seeds resulting from geitonogamy have lower viability, which does not make self-fertilisation impossible. These results may indicate the possibility of asexual reproduction, as suggested by Pimenta et al. (Reference Pimenta, Amano and Zuffellato-Riba2017) and Souza et al. (Reference Souza, Souza, Naves, Guimarães and Melo2020), since DG values equal to 0 represent clones.
Despite genetically similar individuals, there was high genetic variability, confirmed by the maximum genetic dissimilarity between individuals (IDGmax) within populations (Table 2) and by the formation of 11 groups in the UPGMA cluster analysis (Fig. 1). These groups mix individuals from different populations, indicating a sharing of alleles between them.
Genetic diversity
The higher values for observed alleles (A O) and effective alleles (A E) in the CVB and MG populations indicate better allele coverage and distribution compared to the other populations. The increase in these values for the combined data indicates the occurrence of private alleles between the populations. The A O value in the combined data is similar to that observed in diploid species, with an average of 1.61 for A E, indicating a good distribution of alleles (Silva Júnior et al., Reference Silva Júnior, Cabral, Sartori, Souza, Miranda, Caldeira, Moreira and Godinho2020).
For the H* and I* indices, which classify the level of genetic diversity, the values found for the combined data were higher than those observed in previous studies with the same species (Gwinner et al., Reference Gwinner, Setotaw, Rodrigues, França, Silveira, Pio and Pasqual2016), genus (Samaradiwakara et al., Reference Samaradiwakara, Samarasinghe, Shantha, Jayarathna, Dehigaspitiya and Ubeysekera2020; Sá et al., Reference Sá, Lima, Viana, Lopes, Carvalho, Valente and Lima2022), family or species with similar characteristics (Vieira et al., Reference Vieira, Souza, Silva Júnior, Alves, Miranda, Moreira and Caldeira2022). In addition, the values of the Shannon index (I*) vary between 0 and 1, with values close to 1 indicating high genetic diversity (Lewontin, Reference Lewontin1972). Therefore, there is high genetic diversity in CVB and MG populations, moderate in CVA, CVC and CC, and low in MC, GM and JAN. The high diversity in CVB and MG may be related to the smaller geographical distance and gene flow between them, according to the Mantel test. In contrast, the JAN population showed the lowest levels of diversity (H* = 0.19 and I* = 0.28), possibly due to geographical isolation, anthropogenic influences (Table 1) and genetic drift.
For the combined data, the H* and I* indices indicate high genetic diversity for the species, with values higher than those found in previous studies on A. crassiflora (H* = 0.17 and I* = 0.28; Gwinner et al., Reference Gwinner, Setotaw, Rodrigues, França, Silveira, Pio and Pasqual2016) and the genus Annona (H* = 0.22 and I* = 0.32; Sá et al., Reference Sá, Lima, Viana, Lopes, Carvalho, Valente and Lima2022). Differences in genetic diversity indices between studies can be attributed to the number of individuals sampled and the conservation status of the populations.
The high and moderate genetic diversity in some populations indicates their ability to be maintained over generations in the face of disturbance (Ellegren and Galtier, Reference Ellegren and Galtier2016). All the populations evaluated can be used in breeding programmes to establish a base population, due to the high genetic diversity indices of the combined data. However, in order to conserve the populations with the lowest genetic diversity, it is essential to raise awareness among the local communities, especially where fruit is harvested, and to plant seedlings with genetically divergent material, including from the populations sampled in this study.
Genetic structure
The AMOVA confirms a greater genetic variation within the A. crassiflora populations (Table 4), suggesting a sharing of alleles between them. However, the ΦST value indicates moderate genetic differentiation (Wright, Reference Wright1978), possibly influenced by populations with less genetic diversity and the presence of private alleles.
The N m values indicate significant gene flow between populations (Table 4), with values greater than 1 as suggested by Wright (Reference Wright1951), although this reflects past gene flow. A negative correlation between genetic and geographic distance was also observed, which was confirmed by the Mantel test, showing that geographically closer populations have greater gene flow and are more genetically similar than would be expected by chance (Santo-Silva et al., Reference Santo-Silva, Almeida, Tabarelli and Peres2016). This pattern is consistent with the characteristics of A. crassiflora, an allogamous species with predominantly beetle pollination and seed dispersal by zoochory (Cavalcante et al., Reference Cavalcante, Naves, Franceschinelli and Silva2009). Consequently, as the spatial distance between populations increases, pollen and seed dispersal by these agents becomes more restricted.
The Bayesian analysis confirmed the previous analyses and identified four groups in all populations (Fig. 2(a)), confirming gene flow. Most of these have a mixture of genetic groups, while GM and JAN have a majority group (Fig. 2(b)). For these populations, factors such as allelic loss and fixation and the consequent increase in homozygosity may reduce adaptive value through the expression of deleterious alleles, especially in the absence of sustainable management and conservation efforts (Ellegren and Galtier, Reference Ellegren and Galtier2016).
Conclusion
There is high genetic diversity for the combined data from all the sampled populations of A. crassiflora in the state of Minas Gerais, Brazil, but levels of diversity in the individual populations vary from high to low, possibly due to geographical isolation, anthropogenic disturbance and evolutionary factors. The greatest genetic variation has occurred within the populations due to gene flow between them, with the presence of four genetic groups. However, there is moderate genetic differentiation, with the dominance of one genetic group, particularly in the GM and JAN populations. It is recommended to raise the awareness of the local communities, especially in the areas where the fruit is harvested. In addition, knowledge of the genetic variability of A. crassiflora can be used to collect seeds and produce seedlings from genetically divergent mother trees, thus improving the maintenance of future populations in the face of natural selection, genetic drift and inbreeding. Finally, the establishment of germplasm banks, including permanent collections of pollen, seeds, tissue cultures and seed orchards, would benefit both conservation and breeding programmes.
Acknowledgements
The authors acknowledge the Universidade Federal de Lavras for training. The authors also acknowledge the Fundação de Amparo à Pesquisa do Estado de Minas Gerais (FAPEMIG) and the Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq) for financial support. The authors appreciate the Coordenação de Aperfeiçoamento de Pessoal de Nível Superior – Brazil (CAPES) (Funding Code 001).