INTRODUCTION
Coevolution occurs at many scales and is driven by interactions between species that lead to changes in the evolutionary trajectory of each interacting species. Host–parasite coevolution examples are numerous (algae and virus, Bellec et al. Reference Bellec, Clerissi, Edern, Foulon, Simon, Grimsley and Desdevises2014; e.g. pocket gophers and chewing lice, Hafner et al. Reference Hafner, Sudman, Villablanca, Spradling, Demastes and Nadler1994; insects and fungi, Zhang et al. Reference Zhang, Zhang, Li, Ma, Wang, Xiang, Liu, An, Xu and Liu2014) and shaped evolutionary theory (Anderson and May, Reference Anderson and May1982; May and Anderson, Reference May and Anderson1983). However, unresolved evolutionary histories of several parasitic groups preclude analyses of coevolutionary relationships and the timing of events of the intimate relationship with their hosts.
The evolutionary relationships and time of divergence among major Protozoa groups are contentious (Adl et al. Reference Adl, Leander, Simpson, Archibald, Anderson, Bass, Bowser, Brugerolle, Farmer, Karpov, Kolisko, Lane, Lodge, Mann, Meisterfeld, Mendoza, Moestrup, Mozley-Standridge, Smirnov and Spiegel2007). Although all members of apicomplexans are parasitic and share specific features related to parasitism (e.g. an apical secretory structure mediating locomotion and cellular invasion), its extreme radiation (>6000 species known), adaptation to different niches in higher level eukaryotes (targeted hosts), lack of distinguishable morphological characters, genomic variation and complex life cycles involving multiple stages of infections make it difficult to recover deep evolutionary history and ancestry (Javaux et al. Reference Javaux, Knoll and Walter2001; Templeton et al. Reference Templeton, Iyer, Anantharaman, Enomoto, Abrahante, Subramanian, Hoffman, Abrahamsen and Aravind2004; Keeling et al. Reference Keeling, Burger, Durnford, Lang, Lee, Pearlman, Roger and Gray2005; Ginger, Reference Ginger2006; Adl et al. Reference Adl, Leander, Simpson, Archibald, Anderson, Bass, Bowser, Brugerolle, Farmer, Karpov, Kolisko, Lane, Lodge, Mann, Meisterfeld, Mendoza, Moestrup, Mozley-Standridge, Smirnov and Spiegel2007; Kuo et al. Reference Kuo, Wares and Kissinger2008; Wasmuth et al. Reference Wasmuth, Daub, Peregrín-Alvarez, Finney and Parkinson2009; De Baets and Littlewood, Reference De Baets, Littlewood, Kenneth De and Littlewood2015). Compelling evidence, however, has progressively emerging and our knowledge of the diversity, origin and evolution of parasitic protists have benefited from molecular methods (Gilabert and Wasmuth, Reference Gilabert and Wasmuth2013; Sierra et al. Reference Sierra, Cañas-Duarte, Burki, Schwelm, Fogelqvist, Dixelius, González-García, Gile, Slamovits, Klopp, Restrepo, Arzul and Pawlowski2016).
One of the most important infectious diseases in vertebrates is caused by the Apicomplexa protozoan Cryptosporidium. Different species of this unicellular organism have been found in all living vertebrate groups with some species shared within the same taxonomic Class (e.g. a wide range of mammals including humans, sheep, goats and cattle are the hosts of Cryptosporidium parvum). Species of Cryptosporidium are morphologically indistinguishable and their identification is mainly based on molecular characterization (Xiao et al. Reference Xiao, Escalante, Yang, Sulaiman, Escalante, Montali, Fayer and Lal1999; Fayer, Reference Fayer2010). The phylogenetic position has also been debated with the genus placed within the coccidian clade initially, whereas recent molecular studies confirmed a close affinity to the gregarines (Carreno et al. Reference Carreno, Matrin and Barta1999; Barta and Thompson, Reference Barta and Thompson2006).
To the best of our knowledge no molecular clock analysis has been applied to establish the timeline of Cryptosporidium evolution and test the congruence of its time of diversification to the origin of major groups of host vertebrates. The evolutionary origins and extent of host–parasite interactions can be inferred from time calibrated tree phylogenies. The symmetry in times of divergence between hosts and parasites can provide evidence of coevolution. So, linking results that yield similar dates of divergence from dated trees of host–parasite associations at least hints that a common history of interacting lineages is shared (De Vienne et al. Reference De Vienne, Refrégier, López-Villavicencio, Tellier, Hood and Giraud2013). Here, we use molecular data, a number of calibration points and cophylogeny to compare temporal phylogenies and interactions between Cryptosporidium and their hosts in order to understand the underlying macroevolutionary mechanisms driving evolution of Cryptosporidium diversity. Does the origin of Cryptosporidium follow that of the origin of modern Vertebrata clades?
METHODS
Taxon sampling
We assembled a dataset of DNA sequences deposited in GenBank corresponding to 18S ribosomal RNA (18S), actin gene (actin) and 70 kilodalton heat shock protein (hsp70). Our sampling includes data from 27 species within the NCBI taxonomy database. Sequence data of additional Apicomplexa species were downloaded from GenBank as a close outgroup. These lineages were from groups closely related to Cryptosporidium (e.g. gregarines, coccidia and hematozoa) and provide appropriate context for dating analyses (Table 1). Sequences of other lineages within Alveolata (Ciliophora) were retrieved and included within the analysis. Rhizaria and Stramenopiles species were used as a known outgroup to all these taxa. We obtained two or more sequences of the same species from different sources when available to minimize systematic errors. After comparison only one sequence for each species was retained for subsequent analysis. A list of specimens and GenBank accession numbers of the sequences included in this study are presented in Table 1.
Phylogenetic analysis
Alignment of individual datasets was performed with SATé-II program v2·2·7 using MAFFT aligner and MUSCLE merger (Liu et al. Reference Liu, Warnow, Holder, Nelesen, Yu, Stamatakis and Linder2012). Each gene alignment was checked by eye and further refined by hand prior to phylogenetic analysis. The substitution model was chosen in jModelTest v0·1·1 (Posada, Reference Posada2008) based on the Akaike Information Criterion (Posada, Reference Posada2008). Prior to concatenated analyses, single gene datasets were inspected for evidence of significant incongruence by comparing preliminary Maximum Likelihood (ML) trees using RAxML v8·2·4 (Stamatakis et al. Reference Stamatakis, Hoover and Rougemont2008; Stamatakis, Reference Stamatakis2014) and a general time reversible model with gamma distribution (GTR + Γ). We observed no significant conflict among individual phylogenies and all subsequent analyses were performed with concatenated data. A 4-way partition by gene strategy was used for the concatenated analysis. The partition scheme was as follow: the fragment of the 18S rRNA and first-, second and third-codon position for the protein-coding actin and hsp70 genes. RY-coding at the third codon position was used as a partition strategy. ML analyses were implemented in RAxML using a GTR + Γ model with bootstrapping automatically stopped employing the majority rule criterion. Bayesian phylogenetic analyses (BA) were implemented in MrBayes v3·2·6 (Ronquist and Huelsenbeck, Reference Ronquist and Huelsenbeck2003; Ronquist et al. Reference Ronquist, Teslenko, Van Der Mark, Ayres, Darling, Höhna, Larget, Liu, Suchard and Huelsenbeck2012) using 10 million generations sampled every 5000th generation, a burn in of 10%, and GTR + Γ + I model of evolution. Convergence and mixing were assessed using Tracer v1·6 (http://tree.bio.ed.ac.uk/software/tracer/) by examining log-likelihood values across generations and ensuring that post-burn-in samples yielded an effective sample size (ESS) of >200 for all parameters. RAxML and MrBayes analyses were performed via the CIPRES portal (Miller et al. Reference Miller, Pfeiffer and Schwartz2010). Trees were viewed using FigTree v1·4·2 (http://tree.bio.ed.ac.uk/software/figtree/).
Molecular dating of Cryptosporidium
Divergence times were estimated in BEAST v1·8·0 (Drummond and Rambaut, Reference Drummond and Rambaut2007) using the dataset partitioned as for the phylogenetics analyses and an uncorrelated relaxed Bayesian clock with rates among branches distributed according to a lognormal distribution (Drummond et al. Reference Drummond, Ho, Phillips and Rambaut2006). A relaxed clock model can account the variation in substitution rates among lineages (Thorne et al. Reference Thorne, Kishino and Painter1998) while a lognormal distribution accommodates greater flexibility regarding a cladogenetic event (Ho and Phillips, Reference Ho and Phillips2009). A Birth-Death process was implemented for the speciation model (Rooney, Reference Rooney2004). The XML file was generated using BEAUti v1·8·0 (Drummond et al. Reference Drummond, Suchard, Xie and Rambaut2012) with subsequent modifications by hand. The following dates and calibration priors were used according to mean date estimations in Parfrey et al. (Reference Parfrey, Lahr, Knoll and Katz2011). The root prior had a normal distribution of 1365–1577 Mya (95% range) and Rhizaria a normal distribution of 1017–1256 Mya (95% range). For comparison, we also used other calibration constraints as found in Parfrey et al. (Reference Parfrey, Lahr, Knoll and Katz2011) and Eme et al. (Reference Eme, Sharpe, Brown, Roger, Keeling and Koonin2014). First, a normal distribution of 1110–1315 Mya (95% range) for the root prior and 816–1016 (95% range) for the time of the most common ancestor (tmrca) in Rhizaria, secondly, a prior of 1371–1626 Mya (95% range) and 983–1266 (95% range) for Rhizaria, according to analysis (b) and (e) in Parfrey et al. (Reference Parfrey, Lahr, Knoll and Katz2011), respectively. Divergence estimations based on the CIR clock model with soft- (900–1580 Mya) and hard-bound (1500–1850 Mya) calibration constraints in Eme et al. (Reference Eme, Sharpe, Brown, Roger, Keeling and Koonin2014) were also included. We combined the results of three independent runs of 40 million generations each to ensure ESS were above 200. TreeAnnotator v1·8·0 (Drummond and Rambaut, Reference Drummond and Rambaut2007) was used to combine and summarize trees files, obtain a maximum clade credibility consensus tree, and calculate 95% credibility intervals. Chains were sampled every 4000th generation and a burn-in of 10% (4 million generations) was used. Convergence and diagnostics of the Markov process were evaluated by the stability of parameter estimates across generations using Tracer v1·6 (http://tree.bio.ed.ac.uk/software/tracer/). The tree with the times of divergences and Highest Posterior Density (HPD) intervals was visualized using FigTree v1·4·2 (http://tree.bio.ed.ac.uk/software/figtree/).
Dating of vertebrate evolution
The relationships and ages of major clades of vertebrates were based on those estimated by Hedges and Kumar (Reference Hedges and Kumar2009). For comparative analysis we also used molecular timescales for vertebrate evolution as found in Wiens (Reference Wiens2015).
Global fit tests
Global fit analyses and tanglegram visualization were performed on ML tree analyses of Cryptosporidium and their hosts (Table 1). Cytochrome b sequence data were used to generate a ML tree for the most predominant hosts that have been documented for Cryptosporidium species (Xiao et al. Reference Xiao, Sulaiman, Ryan, Zhou, Atwill, Tischler, Zhang, Fayer and Lal2002, Reference Xiao, Fayer, Ryan and Upton2004; Fayer, Reference Fayer2010; Šlapeta, Reference Šlapeta2013). Distance matrices were calculated using the ‘cophenetic’ and ‘dist.node’ commands within the ‘ape’ package in R (Paradis et al. Reference Paradis, Claude and Strimmer2004; R Development Core Team, 2014). A third rectangular matrix was generated for host-parasite links allowing multiple linkages between host and parasite species. We estimated the overall congruence between host and parasite topologies using the patristic distance matrices with the null hypothesis of independent evolution in ParaFit (Legendre et al. Reference Legendre, Desdevises and Bazin2002). The fit between the Cryptosporidium and host topologies was assessed using the distance-based analysis and a ‘cailliez’ correction (Cailliez, Reference Cailliez1983) with 999 permutations.
RESULTS
Phylogenetic analysis
The complete alignment of the three gene fragments contained 4653 bp comprising 1850 bp of 18S, 1056 bp of actin and 1747 bp of hsp70. Bayesian inference yielded a consensus tree that was topologically congruent with the ML tree, with ML bootstrap support and Bayesian posterior probabilities largely consistent among nodes (Fig. 1A and Supplementary Fig. S1). We identified three well-supported clades for internal groups within Cryptosporidium with similar levels of statistical support from ML and Bayesian analyses (Fig. 1A). The first well-supported split leads to a clade comprising only Cryptosporidium ‘struthionis’ (clade A), the second clade includes Cryptosporidium galli, Cryptosporidium fragile, Cyptosporidium serpentis, Cryptosporidium andersoni and Cryptosporidium muris (clade B) and a third large clade includes all other species (clade C).
Timing of diversification
Our study showed that the most recent common ancestor of the Cryptosporidium parasite lineage is found near to the Paleozoic/Proterozoic boundary about 590 (877–345) Mya (Fig. 1A) and represents a basal split to the clade composed by C. ‘struthionis’. The estimated time for the split of the other two major clades within Cryptosporidium occurred during the middle Paleozoic ~368 (560–218) Mya, but clade B lineage formation was around the late Jurassic 162 (291–76) Mya whereas clade C originated during the late Paleozoic 265 (409–153) Mya. Among representatives of Cryptosporidium within clade C there was evidence of several relatively early lineage-splitting events since the Paleogene (Fig. 1A). Differences in divergence times for the crown Cryptosporidium clade reported from all other analyses are relatively small with estimated times after 400 Mya and before 700 Mya but the width of the 95% HPD intervals overlapping among interval age estimations (Supplementary Figs S2–S5).
The molecular clock based on an analysis by Hedges and Kumar (Reference Hedges and Kumar2009) showed that the most common ancestor of extant vertebrates is found around 600 Mya. The ages of the Vertebrata origin estimated by Hedges and Kumar (Reference Hedges and Kumar2009) are older than those estimated by Wiens (Reference Wiens2015). These time trees differ in the crown age of Vertebrata by about 100 My. The phylogeny and time of divergences of the major vertebrate clades is also shown in Fig. 1B.
Global fit tests
The cophylogenetic analysis also revealed statistically significant patterns of association between hosts and parasites (Global test: ParaFitGlobal = 1·02, P-value = 0·01). Comparisons of host and parasite phylogenies based on distance and topology-based analyses provided support for a common macroevolutionary scenario between Cryptosporidium and their vertebrate hosts (Fig. 2).
DISCUSSION
Our comparison of the divergence times provides evidence for the origin of Cryptosporidium parasites close to the time of the most common ancestor for all vertebrates about 600 Mya. Different calibration points used in this study yield no significant differences for the root of extant Cryptosporidium clade. However, estimated ages for the crown group of Cryptosporidium are older [679 (1012–393) Mya] and younger [408 (703–180) Mya] when a CIR model and hard- and soft-bound is respectively implemented. These times of the origin of Cryptosporidium nevertheless overlap with interval age estimations reported for the origin of Vertebrata (Kumar and Hedges, Reference Kumar and Hedges1998; Blair and Hedges, Reference Blair and Hedges2005; Erwin et al. Reference Erwin, Laflamme, Tweedt, Sperling, Pisani and Peterson2011; Hedges et al. Reference Hedges, Marin, Suleski, Paymer and Kumar2015). The basal split between clades B and C about 400 Mya is also congruent with the age of the Actinopterygii clade where fish species that are hosts to Cryptosporidium molnari belong to. Analysis of the dated molecular phylogenies suggests that the origin of the clade C, which infects mainly mammalian hosts, is concordant with the age of the stem group of mammals during the Triassic (Close et al. Reference Close, Friedman, Lloyd and Benson2015). Yet much of the taxonomic diversity of Cryptosporidium originated in the Cretaceous, as did most of the terrestrial vertebrates groups (Cooper and Penny, Reference Cooper and Penny1997; Kumar and Hedges, Reference Kumar and Hedges1998). Taxonomic and ecological diversity in Cryptosporidium appears to have evolved during the Cretaceous and provided a launching pad for later diversification during the Tertiary period when mammalian and avian orders diversified after the K–Pg event (Dos Reis et al. Reference Dos Reis, Inoue, Hasegawa, Asher, Donoghue and Yang2012; O'Leary et al. Reference O'leary, Bloch, Flynn, Gaudin, Giallombardo, Giannini, Goldberg, Kraatz, Luo, Meng, Ni, Novacek, Perini, Randall, Rougier, Sargis, Silcox, Simmons, Spaulding, Velazco, Weksler, Wible and Cirranello2013; Jarvis et al. Reference Jarvis, Mirarab, Aberer, Li, Houde, Li, Ho, Faircloth, Nabholz, Howard, Suh, Weber, Da Fonseca, Li, Zhang, Li, Zhou, Narula, Liu, Ganapathy, Boussau, Bayzid, Zavidovych, Subramanian, Gabaldón, Capella-Gutiérrez, Huerta-Cepas, Rekepalli, Munch and Schierup2014; Claramunt and Cracraft, Reference Claramunt and Cracraft2015; Prum et al. Reference Prum, Berv, Dornburg, Field, Townsend, Lemmon and Lemmon2015). In this respect the evolution of these parasites mirrors the evolution of vertebrates, primarily in terms of the diversification of terrestrial Eutheria and Metatheria mammals and Palaeognathae and Neognathae birds (e.g. Jetz et al. Reference Jetz, Thomas, Joy, Hartmann and Mooers2012; Jarvis et al. Reference Jarvis, Mirarab, Aberer, Li, Houde, Li, Ho, Faircloth, Nabholz, Howard, Suh, Weber, Da Fonseca, Li, Zhang, Li, Zhou, Narula, Liu, Ganapathy, Boussau, Bayzid, Zavidovych, Subramanian, Gabaldón, Capella-Gutiérrez, Huerta-Cepas, Rekepalli, Munch and Schierup2014; Close et al. Reference Close, Friedman, Lloyd and Benson2015). Our analyses also find support for the evolution of Cryptosporidium hominis with our human ancestors. The split between C. hominis and C. cuniculus around 6 (1·4–14) Mya suggests an approximate date concordant with our hominini ancestor likely tracing the evolution of C. hominis parasite back to that speciation event (Langergraber et al. Reference Langergraber, Prüfer, Rowney, Boesch, Crockford, Fawcett, Inoue, Inoue-Muruyama, Mitani, Muller, Robbins, Schubert, Stoinski, Viola, Watts, Wittig, Wrangham, Zuberbühler, Pääbo and Vigilant2012).
The age congruencies regarding the coevolution of Cryptosporidium and vertebrates from our estimation of divergence times are supported by the global fit test of host-parasite cophylogenetic pattern. The cophylogenetic statistical analysis indicates a predominance of coevolution compared with host shifting despite some parasites infecting multiple hosts. Some Cryptosporidium species seem to be host-restricted to a single host (e.g. C. viatorum has been only found in humans) but others are distributed across different hosts (e.g. C. parvum is found in humans, cattle, sheep, goats) sometimes achieving high prevalence in one or more hosts (Xiao et al. Reference Xiao, Sulaiman, Ryan, Zhou, Atwill, Tischler, Zhang, Fayer and Lal2002, Reference Xiao, Fayer, Ryan and Upton2004; Fayer, Reference Fayer2010; Cacciò and Widmer, Reference Cacciò and Widmer2013; Šlapeta, Reference Šlapeta2013). Cryptosporidium species infecting closely related hosts within some subgroups is especially common within clade C. For instance, C. parvum, C. hominis and C. cuniculus seem to arise owing to movement and specialization to new mammal hosts (e.g. Koehler et al. Reference Koehler, Whipp, Haydon and Gasser2014). These species are not sufficiently specialized to individual hosts to prevent gene flow; therefore it is likely that shifting occurs because there are not ecological barriers for their populations to disperse among different closely related hosts. Such host shifting could be involved in coevolution of resistance factors by the host populations (Ricklefs et al. Reference Ricklefs, Outlaw, Svensson-Coelho, Medeiros, Ellis and Latta2014) but finer resolution analysis, preferably using whole-genome sequences over shorter timescales, are likely required to resolve these parasite-host population level questions.
Host shifting through different host-vertebrate combinations might indicate that the diversity of Cryptosporidium parasites has not been determined yet. Numerous diverse isolates have been characterized probably encompassing more species than those formally described so far (e.g. Alvarez-Pellitero et al. Reference Alvarez-Pellitero, Quiroga, Sitja-Bobadilla, Redondo, Palenzuela, Vazquez and Nieto2004; Li et al. Reference Li, Pereira, Larsen, Xiao, Phillips, Striby, Mccowan and Atwill2015; Ryan et al. Reference Ryan, Paparini, Tong, Yang, Gibson-Kueh, O'hara, Lymbery and Xiao2015). For example, the still undescribed strain Cryptosporidium ‘struthionis’ has been isolated from ostrich, yet close relatives strains have been found in coprolites of moa (Wood et al. Reference Wood, Wilmshurst, Rawlence, Bonner, Worthy, Kinsella and Cooper2013) and free-living in tidal-flat (Wilms et al. Reference Wilms, Sass, Köpke, Köster, Cypionka and Engelen2006) and ballast water (Pagenkopp et al. Reference Pagenkopp, Fleischer, Carney, Holzer and Ruiz2016). Cryptosporidium ‘struthionis’ is on a relatively long branch with seemingly phylogenetically deep origins. This long-branch would probably be broken with additional taxon sampling and sequence data (Bergsten, Reference Bergsten2005; Slack et al. Reference Slack, Delsuc, Mclenachan, Arnason and Penny2007). Future taxonomic work will impact our understanding of Cryptosporidium evolution dramatically and will stimulate comparative studies to address the growing number of questions regarding the evolution of protozoan parasites.
SUPPLEMENTARY MATERIAL
The supplementary material for this article can be found at http://dx.doi.org/10.1017/S0031182016001323.
ACKNOWLEDGEMENTS
The first author (JCGR) would like to thank to the New Zealand Ministry of Health for support. m EpiLab members provided useful discussions on different stages of the study. We are grateful with two anonymous reviewers for provided helpful comments that improved this manuscript.
FINANCIAL SUPPORT
This research received no specific grant from any funding agency, commercial or not-for-profit sectors.