INTRODUCTION
Cryptosporidiosis is a gastro-intestinal disease caused by the protozoan Cryptosporidium, typically presenting in humans as diarrhoea, abdominal pain, nausea, vomiting and low grade fever (Farthing, Reference Farthing and Petry2000). Clinical cases in livestock are mainly in neonates, but older animals can also be significant shedders of oocysts (Pritchard et al. Reference Pritchard, Marshall, Giles, Chalmers and Marshall2007; Wells et al. Reference Wells, Shaw, Hotchkiss, Gilray, Ayton, Green, Katzer, Wells and Innes2015). Diagnostic tests identify the genus, with species identification undertaken in specialist and reference or research laboratories (Chalmers and Katzer, Reference Chalmers and Katzer2013). Cryptosporidium parvum is one of the major causes of zoonotically-acquired human cryptosporidiosis, and in the UK C. parvum accounts for nearly half of all investigated cases of human cryptosporidiosis with an estimated 25% of non-travel-related, sporadic C. parvum cases acquired from direct contact with farm animals (Chalmers et al. Reference Chalmers, Smith, Elwin, Clifton-Hadley and Giles2011). Other routes of this faecal-oral infection include person-to-person spread, or via a vehicle such as drinking or recreational water, food and fomites (Casemore, Reference Casemore1990). To properly establish the burden of illness from potential exposures and to implement appropriate interventions, the ability to identify sources of contamination and routes of transmission by further differentiation of C. parvum isolates is desirable. However, there is currently no standardized genotyping scheme. Sequencing a hyper-variable region of the gene encoding a 60 kDa glycoprotein (GP60) is commonly used, including testing samples from patients and animals during zoonotic outbreak investigations (Chalmers and Giles, Reference Chalmers and Giles2010). GP60 family IIa is commonly found in cattle and in human cases and outbreaks involving animal contact (Brook et al. Reference Brook, Hart, French and Christley2009; Chalmers and Giles, Reference Chalmers and Giles2010; Chalmers et al. Reference Chalmers and Giles2010; Robertson et al. Reference Robertson, Björkman, Axén, Fayer, Caccio and Widmer2014). Subtype family IId is also commonly found in sheep and goats (Robertson et al. Reference Robertson, Björkman, Axén, Fayer, Caccio and Widmer2014) and has been found in human cases in outbreaks linked to open farms and a swimming pool (Cryptosporidium Reference Unit unpublished data). However, multi-locus analyses are more discriminatory (Feng et al. Reference Feng, Torres, Li, Wang, Bowman and Xiao2013), and multi-locus sequence typing (MLST) provides definitive detection of polymorphisms and has been used especially with loci containing variable-number of tandem-repeat (VNTR) units (Gatei et al. Reference Gatei, Hart, Gilman, Das, Cama and Xiao2006; Xiao and Ryan, Reference Xiao, Ryan, Fayer and Xiao2008; Widmer and Cacciò, Reference Widmer and Cacciò2015). However, MLST is expensive and time consuming. During outbreak investigations, rapid characterization of multiple isolates may be required to supplement epidemiological and environmental investigations, and for surveillance large numbers may need to be analysed. Multiple-locus VNTR analysis (MLVA) by slab gel or capillary electrophoretic (CE) sizing of amplified DNA fragments may provide a tool to enable initial characterization of outbreak isolates and linkage of cases with each other or suspected sources of contamination or infection. In one comparative study, fragment sizing C. parvum loci by CE provided better typability, discriminatory power, ease of use, and was more straightforward than sequencing repeat regions (Díaz et al. Reference Díaz, Hadfield, Quílez, Soilán, López, Panadero, Díez-Baños, Morrondo and Chalmers2012). Additionally, the presence of multiple genotypes in a sample is likely to be identified more readily than by Sanger sequencing. Although one study has provided direct statistical comparison of fragment sizing and sequencing of four loci and showed that both laboratory methods and data analyses influenced the inferences on the population structure of C. parvum (Widmer and Cacciò, Reference Widmer and Cacciò2015), the choice of loci and their underlying characteristics will undoubtedly affect the outcome of such analyses.
Examples of the utility of MLVA of C. parvum have been documented previously but few investigations have used the same sets of loci, primers, analytical platforms, or allele nomenclature, hindering both comparison of allelic profiles and performance (Robinson and Chalmers, Reference Robinson and Chalmers2012). One meta-analysis of three sets of data generated using different analytical platforms used the assumption that fragment sizes generated were comparable across platforms (Caccio et al. Reference Caccio, de Waele and Widmer2015). If MLVA is to be applied as a rapid tool to support outbreak investigations and have meaningful application across both human and animal health surveillance internationally, then there needs to be structured development to enable harmonized application in different laboratories using different analytical platforms and running conditions, accounting for the potential influence of sequence composition and DNA conformation (Pasqualotto et al. Reference Pasqualotto, Denningm and Anderson2007). Nadon et al. (Reference Nadon, Trees, Ng, Møller Nielsen, Reimer, Maxwell, Kubota and Gerner-Smidt2013) have identified, through consensus agreement, processes for the development of MLVA for bacterial surveillance and outbreak investigations, which should also be applicable to polyclonal samples such as Cryptosporidium spp. oocysts. These steps include: selection and naming of loci, assay design and validation, the need for calibration sets of samples, and standardized allele nomenclature (Nadon et al. Reference Nadon, Trees, Ng, Møller Nielsen, Reimer, Maxwell, Kubota and Gerner-Smidt2013). Specifically pertaining to the selection of loci, Nadon acknowledged that, while there is an inverse relationship between repeat unit length and detected variation, repeat units <5 bp may be hard to differentiate in capillary electrophoresis. However, 3 bp differences have been reported to be differentiated using platforms such as ABI 3730 (Life Technologies) (Hotchkiss et al. Reference Hotchkiss, Gilray, Brennan, Christley, Morrison, Jonsson, Innes and Katzer2015) and the QIAxcel (Qiagen) (Drumo et al. Reference Drumo, Widmer, Morrison, Tait, Grelloni, D'Avino, Pozio and Caccio2012; Caccio et al. Reference Caccio, de Waele and Widmer2015). Additionally, it was advised that insertions and deletions should be absent in repeat units, that only those loci with 100% conserved flanking sequences should be used, and that primers should be placed as close as possible to the repeat unit (Nadon et al. Reference Nadon, Trees, Ng, Møller Nielsen, Reimer, Maxwell, Kubota and Gerner-Smidt2013).
To investigate the suitability of selected loci for the potential application of MLVA to C. parvum surveillance and outbreak investigations, we undertook in silico and in vitro studies. Since human C. parvum outbreak investigations frequently involve animal sampling, this included inter-laboratory sample exchange between laboratories involved in both human and animal health investigations.
MATERIALS AND METHOD
Loci and their attributes
Cryptosporidium parvum VNTR loci containing repeat units >2 bp, identified previously as being the potentially most useful (Robinson and Chalmers, Reference Robinson and Chalmers2012) or used in previous studies (Caccio et al. Reference Caccio, de Waele and Widmer2015; Hotchkiss et al. Reference Hotchkiss, Gilray, Brennan, Christley, Morrison, Jonsson, Innes and Katzer2015), were selected: MSA, MSD, MSF, MM18, MM19, MS9-Mallon (hereafter referred to as MS9), GP60 and TP14.
To evaluate whether the loci met the standards for inter-laboratory surveillance and outbreak investigation proposed by Nadon et al. Reference Nadon, Trees, Ng, Møller Nielsen, Reimer, Maxwell, Kubota and Gerner-Smidt2013, sequences were selected to represent a broad range of alleles and aligned using BioEdit 7·0·9 (http://www.mbio.ncsu.edu/BioEdit/bioedit.html). These sequences were selected from our own archives and the National Center for Biotechnology Information's GenBank database (MM5: KP172504, KP172505, KP265906-KP265911; MM18: KP172508; MM19: KP172512-KP172515, KP265912, KP265914-KP265926; GP60: AB242224-AB242227, AB242229, AF403166-AF403168, AY149610, AY149612, AY149614-AY149616, AY382675, AY738185-AY738186, AY738188-AY738189, AY738191, AY738193-AY738195, AY873780-AY873782, DQ192502, DQ192508, DQ630514-DQ630516, DQ630519, DQ648531-DQ648537, DQ648541, DQ648544, EU140508, EU164810-EU164811; TP14: KM222505-KM222508). Individual sequences were checked for completeness (for the purpose of this study the primer sequences shown in Table 1 were retained) and quality (no ambiguous bases or suspected anomalies). The true fragment size of each allele was identified and the following attributes tabulated and assessed for suitability (Nadon et al. Reference Nadon, Trees, Ng, Møller Nielsen, Reimer, Maxwell, Kubota and Gerner-Smidt2013): chromosome location, repeat unit length, repeat unit heterogeneity of DNA and amino acid sequences, flanking region conservation and proximity to repeat unit.
a Forward primer overlaps first repeat.
b A primer cocktail (equal concentrations) was used to allow for polymorphisms in C. parvum primer sites.
Reproducibility of MLVA
To investigate the impact of the attributes of the loci and to pilot test the reproducibility of MLVA, providing a proof of concept for future inter-laboratory investigations, the nine loci were used in vitro in our three laboratories. These have remits either for investigation of human cryptosporidiosis and suspected animal sources (Cryptosporidium Reference Unit, CRU and Scottish Parasite Diagnostic and Reference Laboratory, SPDRL) or livestock cryptosporidiosis (Moredun Research Institute, MRI). A set of 14 DNA samples, extracted from the national collection of Cryptosporidium oocysts at the CRU as described previously (Chalmers et al. Reference Chalmers, Elwin, Thomas, Guy and Mason2009, Reference Chalmers, Smith, Elwin, Clifton-Hadley and Giles2011), was confirmed as containing C. parvum DNA by real-time polymerase chain reaction (PCR) of the Lib13 gene (Hadfield et al. Reference Hadfield, Robinson, Elwin and Chalmers2011) and GP60 subtypes were identified by sequencing (Alves et al. Reference Alves, Xiao, Sulaiman, Lal, Matos and Antunes2003; Sulaiman et al. Reference Sulaiman, Hira, Zhou, Al-Ali, Al-Shelahi, Shweiki, Iqbal, Khalid and Xiao2005). Isolates were selected to represent a range of GP60 subtypes. DNA was distributed by post. In house PCRs were used to amplify fragments corresponding to the variable regions of each locus as described below. The primer sets are described in Table 1. DNA from isolates representing a range of sequenced reference alleles was included in each PCR and sizing reaction.
At the CRU, all nine loci were investigated with previously validated single round PCRs (CRU unpublished data) using 1 µL template, except MM19 using 5 µL, in final reaction volumes of 20 µL containing 2·5 mm MgCl2, 200 µ m dNTPs, 500 µg mL−1 non-acetylated bovine serum albumin and 1 unit of Hotstar DNA Taq polymerase in 1× PCR buffer. Primer concentrations were 500 nm for MSA, MSD, MSF, MS9 and MM5, 300 nm for MM18, TP14 and GP60, and 200 nm for MM19. An addition of 2 µL Q solution was included for MM18, TP14 and GP60. Standard PCR cycling conditions were 40 cycles of denaturation at 95 °C for 30 s, annealing at 55 °C except MM18 at 63 °C and MM19 at 61 °C for 30 s and extension at 72 °C for 60 s followed by a final extension at 72 °C for 10 min. Fragment sizing of PCR products, diluted 1 in 10 in QX dilution buffer, was by capillary electrophoresis in a temperature-controlled room (25 °C using a QIAxcel on programme OH700 with a 15 bp/600 bp QX DNA Alignment Marker and a 25–500 bp QX Size Marker (Qiagen, Crawley, UK).
At the MRI all nine loci, and at the SPDRL eight loci (GP60 was not used), were investigated with validated nested PCRs using 1 µL DNA or primary product diluted 1:100 as template in final reaction volumes of 20 µL as described previously (Hotchkiss et al. Reference Hotchkiss, Gilray, Brennan, Christley, Morrison, Jonsson, Innes and Katzer2015). Standard PCR cycling conditions were 30 cycles of 95 °C for 50 s, 50 °C for 50 s and 65 °C for 60 s. Fragment sizing of FAM-labelled (Eurofins Genomics, UK) PCR products was undertaken using capillary electrophoresis on two different analytical platforms: MRI used the ABI 3730 (Applied Biosystems; University of Dundee) with the Genescan ROX500 size standard (Applied Biosystems), and SPDRL used the ABI 3500XL with the GeneScan 600 LIZ size standard. Trace files were analysed at the MRI using STRand (http://www.vgl.ucdavis.edu/informatics/strand) and at the SPDRL using GeneMapper Software 5 (Applied Biosystems).
In all three laboratories, the peak sizes were compared and matched with those of the sequenced reference amplicons to enable an adjusted fragment size to be recorded, representing the true fragment size of the sequenced reference standard. Any samples that could not be aligned to a reference standard were sequenced to confirm the presence of a new allele. Sequences generated and/or newly used in this study were deposited in GenBank under accession numbers KT922174 to KT922224.
Reproducibility of allele assignment based on fragment sizing
Alleles were compared between laboratories and primer sets in two ways: first, using the adjusted fragment sizes, but this did not permit ready comparison where different primers were used for four of the nine loci: MM19, MS9, TP14 and GP60 (Table 1); second, the adjusted fragment sizes were normalized by deducting from the larger products the difference between the larger and shorter sequenced products, as this was found to be consistent for the reference alleles.
Standardized allele nomenclature
To determine if a standardized allele nomenclature could be generated that would circumvent the need for standardized primer sets, the copy number of repeats was calculated from the adjusted fragment size minus the off set size divided by the repeat size. For complex loci with more than one repeat region it was assumed that the fragment was generated by the same combination of repeat unit copy numbers as the reference sequence for that allele. Thus, for the first repeat one to nine copies were designated 01 to 09, and 10 or more copies by the two digit integer and likewise for the second repeat, so that an allele containing two copies of the first repeat and three of the second repeat would be named 0203.
Sensitivity
The number of alleles identified using single round PCRs was compared with those assigned using nested PCR.
RESULTS
Loci and their attributes
Comparison of the attributes of MLVA loci revealed variable performance for the nine C. parvum loci (Table 2). The loci were not distributed across all eight C. parvum chromosomes; one was on each of chromosomes one and three, there were two loci on each of chromosomes five and six, and three were on chromosome eight (Table 2).
a The nucleotide sequence for MSF was originally published in reverse orientation (Tanriverdi et al. Reference Tanriverdi, Markovics, Arslan, Itik, Shkap and Widmer2006).
DNA sequence analysis and alignment identified that all of the loci were within open reading frames and the repeat units encoded various amino acid residues (Table 2). Translation to the amino acid sequences and their subsequent alignment simplified identification of the true start and end points of the repeat units, and revealed that additional repeat units were present in six loci: consistently in MSA, MSD, MM9 and TP14 and more rarely in MM18 and GP60, the latter being well documented in GP60 family IIa (Table 2 and examples in Fig. 1). Heterogeneity of the DNA sequences within the repeat units was identified commonly, sometimes affecting the amino acids (MM18, MM19, first region in MSA, second region in MSD, first region in TP14) and sometimes not (GP60, MM5, two regions within MS9) (Table 2). Furthermore, insertions were found interspersed between copies of the repeat in MM18, interrupting the tandem nature of the repeats and changing the fragment size non-uniformly (Fig. 1). Only MSF contained a single repeat region with a homogenous repeat unit (Fig. 1).
The primer sets used varied in their proximity to the repeat unit (Table 1), but most generated amplicons <400 bp with the exception of the MRI/SPDRL primers for MS9 and the largest MM19 and GP60 alleles (Table 3). The regions flanking the repeat units were generally well conserved, with the major exception of GP60 (Table 2). In GP60, the region downstream of the repeat unit is highly polymorphic and allows for differentiation of isolates of the same species into allelic families based on sequence data (Strong et al, Reference Strong, Gut and Nelson2000). For example, the downstream regions of families IIa and IId, are only 70% similar. In addition, at MM19 rare insertions were identified downstream of the repeat unit in two sequences found on GenBank: KP265923 which has a 6 bp [AG] insert and KP265925 which has a 36 bp insert [TGAGIEAGVGIG].
a Observed distribution in the sequenced reference standard.
b Calculated from the fragment size minus the offset size divided by the repeat size.
Reproducibility of allele assignment based on fragment sizing
Although this pilot study was too small for robust analysis of the relationship between real and measured fragment sizes, one observed trend was that the measured fragments at the MRI were more often larger than the sequenced size, and those from the SPDRL and CRU were more often smaller. Additionally, the size difference appeared to be more consistent at those loci with a generally lower GC content (MSD, MS9, MM5, GP60 and TP14), whereas for MM19 and MSF size differences tended to increase with fragment size and for MM18 and MSA there was no discernable relationship (data not shown). However, for most loci assigning the correct allele was straightforward although for loci with short repeat units (3 bp in MM5, GP60 and TP14), the concentration of the PCR amplicon could affect the ability to align the test samples to the sequenced standards, especially on the QIAxcel. For alleles to be correctly assigned, it was essential that sequenced reference standards were included in the PCR and analysis.
The use of normalized fragment sizes permitted naming regardless of whether the same or different primer sets were used (Table 3). Allele assignation by the three laboratories was concordant with the exception of MS9 where interpretable results were not obtained from one laboratory (Table 4).
DAMP – did not amplify.
a MRI and SPDRL only, CRU DAMP.
b SPDRL and CRU only, MRI DAMP.
c MRI only, CRU DAMP.
d CRU and MRI only, SPDRL DAMP.
The primary purpose of investigating this set of 14 samples was to investigate whether the attributes identified in silico affected the reproducibility of allele assignation, but we also found that samples with the same GP60 sequenced allele were readily differentiated by the combination of loci investigated. The three GP60 IIdA17G1 samples differed from each other at three, six and five other loci, and the three IIdA18G1 samples differed at six, five, and three other loci (Table 4). Of the three GP60 family IIa samples, IIaA16G2R1 and IIaA17G1R1 could not be differentiated by 8 of the 9 loci and no amplicons were generated using MM18 for the IIaA17G1R1 sample. The IIaA16G3R1 sample could be differentiated using MM5, MM18, MM19 and TP14. In GP60 family IId, only TP14 was mono-allelic, with multiple alleles identified for the other loci (Table 4).
Standardized allele nomenclature based on copy number of repeats
The calculation of the copy number of repeats was readily applied to the adjusted fragment sizes of MSF, MM5 and MM19 which are simple loci containing single repeat units (Tables 2 and 3). However, application of this nomenclature in the complex loci MSA, MSD, MS9, MM18, TP14 and GP60 with multiple repeat units (Tables 2 and 3), was based on the assumption that the copy numbers of the different repeat units in the samples was the same as those in the sequenced reference alleles, which we consider misleading.
Sensitivity
Single round PCRs enabled full allelic profiles to be generated for 12 of the 14 samples, and only 5 alleles overall were not assigned, three in one sample and two in another (Table 4). However, one of these samples was not fully profiled by nested PCR either. Overall, nested PCRs provided only four more data points in the entire sample set compared with single round PCR (Table 4). Laboratory workflow was simplified by single round PCR.
DISCUSSION
We have investigated nine of the ten top ranking C. parvum loci identified on the basis of prior MLVA performance for variability (Robinson and Chalmers, Reference Robinson and Chalmers2012), that have been used in previous studies (Caccio et al. Reference Caccio, de Waele and Widmer2015; Hotchkiss et al. Reference Hotchkiss, Gilray, Brennan, Christley, Morrison, Jonsson, Innes and Katzer2015), by assessing their attributes in silico in terms of proposed guidelines (Nadon et al. Reference Nadon, Trees, Ng, Møller Nielsen, Reimer, Maxwell, Kubota and Gerner-Smidt2013) and in vitro through sample exchange. In silico analyses revealed that not all these loci met the proposed guideline criteria and may not be ideal MLVA choices for inter-laboratory surveillance and outbreak investigations. However, despite some of the apparent shortcomings, the in vitro study demonstrated that reproducible allele assignation was possible for all these loci in a meaningful way. This was achieved through the use of sequenced reference standards and normalization of fragment sizes, requiring inter-laboratory communication to define a baseline allowing for the use of different PCR protocols. Nested PCRs yielded only very slightly more information than single round PCRs; the latter provides greatly improved workflow in emergency response.
The five attributes used to assess the VNTR loci were: chromosome location; repeat units ⩾5 base pairs; no insertions and deletions in the repeat units; perfect homogenous repeats should be preferred; and only loci with 100% conserved flanking sequences should be used (Nadon et al. Reference Nadon, Trees, Ng, Møller Nielsen, Reimer, Maxwell, Kubota and Gerner-Smidt2013). First, the loci were found not to be distributed across all eight chromosomes; when selecting MLVA loci for epidemiological investigations, distribution across chromosomes is desirable as it ensures they are sufficiently distant to exclude physical linkage (Widmer and Sullivan, Reference Widmer and Sullivan2012). However, if more than eight markers are needed for high-resolution genotyping some clustering would be inevitable. The inclusion of linked loci can be valuable in population genetics, for example in studies of linkage disequilibrium. Secondly, seven of the nine loci contained repeat units that were longer than 5 bp. Although the capillary electrophoresis platforms used in this study were capable of differentiating 3 bp, which concurs with previous studies (Drumo et al. Reference Drumo, Widmer, Morrison, Tait, Grelloni, D'Avino, Pozio and Caccio2012; Caccio et al. Reference Caccio, de Waele and Widmer2015; Hotchkiss et al. Reference Hotchkiss, Gilray, Brennan, Christley, Morrison, Jonsson, Innes and Katzer2015), this was through judicious use of sequenced reference standards representing a range of alleles and maintaining optimal running conditions especially for the QIAxcel (CRU unpublished data). The practicalities of assigning 3 bp alleles was more challenging than for longer repeats, and the precision of analysis of MM5 has been reported previously to be impaired (Hotchkiss et al. Reference Hotchkiss, Gilray, Brennan, Christley, Morrison, Jonsson, Innes and Katzer2015). For a robust, standardized scheme ⩾5 bp would be more desirable.
The nine loci were all within open reading frames and all the repeat units coded for amino acids; identifying some repeat units from DNA sequences was open to interpretation, but was clarified by analysis of the amino acid sequences. Sequence variation was identified within the repeat units of eight of the nine loci, the only exception being MSF. This variation has not been reported previously for MSA, MSD, MS9 and TP14 and contrasts with the simple sequence repeats reported previously (summarized by Robinson and Chalmers, Reference Robinson and Chalmers2012). The variation seen in the amino acid sequences of the repeat units in MSA, MSD, TP14, MM18 and MM19 may have a biological effect.
Multiple repeat units were identified in six loci and although recognized previously in GP60 (Alves et al. Reference Alves, Xiao, Sulaiman, Lal, Matos and Antunes2003; Sulaiman et al. Reference Sulaiman, Hira, Zhou, Al-Ali, Al-Shelahi, Shweiki, Iqbal, Khalid and Xiao2005) this was identified for the first time in MSA, MSD, MS9, MM18 and TP14. The presence of multiple repeat units did not prevent allele assignation based on adjusted fragment sizes, although the size difference between alleles was not as predictable as for homogenous units. A standardized allele nomenclature based on calculation of the actual copy number of repeats that would also allow for the use of alternative primers (Larsson et al. Reference Larsson, Torpdahl, Petersen, Sørensen, Lindstedt and Nielsen2009) meant that assumptions were made about the distribution of the copy numbers within those loci that were more complex than originally thought. The practice of allocating the same copy number pattern for the different repeat units as that found in the sequenced reference allele (Nadon et al. Reference Nadon, Trees, Ng, Møller Nielsen, Reimer, Maxwell, Kubota and Gerner-Smidt2013) would lead to under-reporting of variation in the complex loci, biased by the selection of the reference sequence. For example, we identified that TP14 had two repeat units, the length of the first being 3 bp and the second 9 bp (Table 2; Fig. 1). The two alleles in this study were newly identified and therefore sequence data identified their configuration 2302 and 2602; however, had we found a 238 bp fragment this could have been assigned to reference sequence JF342563 which is configured with 2603, but another sequence, JQ954685, also has the same sized fragment but was configured 2902. We consider that the assumption is not helpful, and this strategy should not be pursued; the issue could be avoided altogether if only simple VNTR loci are used. However, these seem to be in the minority of those currently identified and further work is needed to identify more suitable loci.
The proximity of the (internal) primers to the repeat region partially determined the overall size of the amplicons, which determines the size markers to use and has been shown to affect the performance of the CE machine (Hotchkiss et al. Reference Hotchkiss, Gilray, Brennan, Christley, Morrison, Jonsson, Innes and Katzer2015). The resolution of the QIAxcel is optimal for fragments <300 bp especially with shorter repeat units (Qiagen). Thus the primers need to be designed taking this into account. Finally, most of the flanking regions were either homogenous or generally conserved, but where they were not, such as in GP60, heterogeneity may pose two problems: the fragment size could be affected not only by the VNTRs but also by variation in the flanking sequence, and some of the primer sites also included polymorphisms that requires a primer cocktail to improve the sensitivity by allowing amplification of a range of variants. This heterogeneity is acknowledged by, and forms a critical part of, GP60 sequence nomenclature (Sulaiman et al. Reference Sulaiman, Hira, Zhou, Al-Ali, Al-Shelahi, Shweiki, Iqbal, Khalid and Xiao2005) but may affect fragment sizing.
Only MSF met all of the criteria and was the only true simple tandem repeat, providing a good example for identification of future loci. The attributes of the nine loci may go some way to clarify the arguments that have been raised against the use of fragment sizing for genotyping Cryptosporidium isolates. In one study, fragment sizing was compared with sequencing amplicons of MM5, MM19, MS9 and GP60 and showed that single locus distance matrices were weakly correlated, but that this correlation was not maintained when the data were combined in multi-locus genotypes (Widmer and Cacciò, Reference Widmer and Cacciò2015). The authors argued that the simplicity of genotyping using amplicon length data is potentially offset by its limited resolution (Widmer and Cacciò, Reference Widmer and Cacciò2015). However, we propose that the attributes of the loci investigated are critical to this and the comparison needs to be explored further using loci that are better suited to MLVA since the repeat units of MM5, MM19, MS9 and GP60 are all polymorphic and we have demonstrated that MS9 contains four repeat units (Table 2). We agree that the development and adherence to a set of guidelines for locus identification and standardization of genotyping analyses by any method is important.
The increasing availability of C. parvum whole genome sequences (Andersson et al. Reference Andersson, Sikora, Karlberg, Winiecka-Krusnell, Alm, Beser and Arrighi2015; Hadfield et al. Reference Hadfield, Pachebat, Swain, Robinson, Cameron, Alexander, Hegarty, Elwin and Chalmers2015) provides the means to identify new, appropriate loci for a robust MLVA scheme, and this work is underway. In addition, genome sequence data have contributed to our understanding of these loci, for example MSF was originally published in reverse orientation (Tanriverdi et al. Reference Tanriverdi, Markovics, Arslan, Itik, Shkap and Widmer2006). For many pathogens, especially culturable bacteria such as Shiga toxin-producing Escherichia coli O157, whole genome sequencing has superseded MLVA and other traditional typing methods (Dallman et al. Reference Dallman, Byrne, Ashton, Cowley, Perry, Adak, Petrovska, Ellis, Elson, Underwood, Green, Hanage, Jenkins, Grant and Wain2015). However, for Cryptosporidium lengthy processing is required to generate suitable DNA from clinical samples (Hadfield et al. Reference Hadfield, Pachebat, Swain, Robinson, Cameron, Alexander, Hegarty, Elwin and Chalmers2015) even when whole genome amplification is used (Andersson et al. Reference Andersson, Sikora, Karlberg, Winiecka-Krusnell, Alm, Beser and Arrighi2015), and routine application for timely Cryptosporidium surveillance and outbreak investigations is currently a distant reality.
We undertook a preliminary assessment of the reproducibility of MLVA applied to 14 DNA samples selected to provide a range of GP60 alleles from families IIa and IId. Even in this small study, where some samples with the same GP60 sequences were compared, different allelic profiles were generated concurring with previous findings that single locus analysis underestimates diversity in C. parvum (Widmer and Sullivan, Reference Widmer and Sullivan2012). While the use of GP60 sequencing has been useful in characterizing the aetiology of zoonotic C. parvum outbreaks (Chalmers and Giles, Reference Chalmers and Giles2010), a multilocus approach is needed to improve discrimination during outbreak investigations. Previously, in a study focussing on GP60 family IIa, MSA, MSD and MSF were monoallelic which is what we found here (Hotchkiss et al. Reference Hotchkiss, Gilray, Brennan, Christley, Morrison, Jonsson, Innes and Katzer2015). However, multiple alleles were found at these three loci in family IId, demonstrating that consideration of the host and parasite population is important in marker selection.
Concluding remarks
Although most loci were not ideal for MLVA according to the proposed guideline standards, it was possible to use different capillary electrophoresis platforms and assign reproducible allelic profiles to a set of samples, by using previously sequenced, co-amplified reference standards. If a centrally curated database and archive of all identified alleles were maintained then cloned, reference material could be circulated to participating laboratories. In this way, laboratories could use bespoke protocols and primer sets without compromising allele assignment. MLVA assays for Cryptosporidium are still in the development phase and there is no consensus on the number of markers or which they should be. While resolution might be increased by using more markers, the necessity depends on the epidemiological question being asked. From this proof of principle study it is not possible to comment on how many or which markers are desirable or essential. There is a need to re-define loci and a set of rules for selection, application and analysis for inter-laboratory schemes, as well as nomenclature for locus and allele naming. This could be achieved through a consensus meeting and it is proposed that this is enabled by COST Action FA1408: A European Network for Foodborne Parasites (Euro-FBP; www.euro-fbp.eu). Loci for investigation of both C. hominis and C. parvum should be considered. Full validation studies, supported by calibration samples, are needed to compare MLVA analysis between different laboratories following guidelines for validation of typing schemes (Struelens, Reference Struelens1996; van Belkum et al. Reference van Belkum, Tassios, Dijkshoorn, Haeggman, Cookson, Fry, Fussing, Green, Feil, Gerner-Smidt, Brisse and Struelens2007; Nadon et al. Reference Nadon, Trees, Ng, Møller Nielsen, Reimer, Maxwell, Kubota and Gerner-Smidt2013) and permitting analysis for typability, discriminatory power, reproducibility and epidemiological concordance. The cost of MLVA could be reduced by multiplexing loci with significantly different expected fragment sizes and different fluorescent labels. Finally, standardized nomenclature needs to be agreed, including consultation with end users including health professionals (Palm et al. Reference Palm, Johansson, Ozin, Friedrich, Grundmann, Larsson and Struelens2012).
ACKNOWLEDGEMENTS
We are grateful to Frank Katzer, Moredun Research Institute, for helpful comments on the manuscript.
FINANCIAL SUPPORT
The research leading to these results has received funding from the European Union Seventh Framework Programme (RMC, GR and SM [FP7/2007-2013] [FP7/2007-2011] under Grant agreement no: 311846); the Scottish Government (EH and JG) under SPASE workstrand 3.2.3.