In 1987, Hudson proposed an estimator for the scaled recombination parameter C=4Nc, where N is the population size and c is the recombination rate between the two most distant of a set of segregating sites. This work came shortly after Kreitman (1983) published the first set of population genetic data at the DNA sequence level. Kreitman had been able to sequence 2·7 kilobases of the Drosophila melanogaster genome in 11 samples. It was felt at that time that population genetics was entering a new era, although Hudson cautioned that sufficiently large data sets for his new estimator ‘may require prohibitively large research efforts’.
Hudson's estimator is based on the variance of the number of site differences between pairs of haplotypes and an estimate of the scaled mutation rate θ=4Nμ. The variance of the number of differences had already been shown by Brown et al. (Reference Brown, Feldman and Nevo1980) to be a convenient single-statistic summary of all the pairwise linkage disequilibria among a set of loci. The need for such a statistic continues as there is still doubt as to how well two-locus associations capture the full multilocus structure. Hudson provided an elegant derivation of the expected value of his statistic as a function of the unknown value C. His method of moments approach to estimation has the great virtue of simplicity although it would not be expected to behave as well as the maximum-likelihood methods that he (Hudson, Reference Hudson, Takahata and Clark1993) and others (e.g. Kuhner et al., Reference Kuhner, Yamato and Felsenstein2000; Wall, Reference Wall2000; Fearnhead and Donnelly, Reference Fearnhead and Donnelly2001) developed later. Likelihood methods exploit all the information in a data set rather than just the information in a summary statistic and will do well provided the underlying evolutionary model is appropriate for the data being addressed. Writing 10 years after Hudson, Wakeley kept the same moment approach but provided modifications to Hudson's method that improved its performance.
Since 1983 the human genome has been sequenced, as have the genomes of several other species. There is now a ‘1000 genomes’ project (http://www.1000genomes.org) under way for humans, and new sequencing techniques will make it possible very soon for population geneticists to obtain large samples of DNA sequence data. In 1987, Hudson wished for more extensive DNA sequence data but he could not have foreseen the remarkable explosion of intermediate data – single-nucleotide polymorphisms (SNPs). Human geneticists are now generating 1 million SNP profiles for samples of thousands of individuals. By 2002, Hudson had produced a simulation procedure for SNP data (Hudson, Reference Hudson2002), and this has been used in studies such as Li and Stephens (Reference Li and Stephens2003) to detect recombination rate ‘hotspots’.
Hudson's 1987 paper has the hallmarks of a classic paper. It introduced a new and simple method for estimating recombination rates from population samples rather than from pedigree data. More sophisticated methods have since been introduced, including composite-likelihood (Hudson, Reference Hudson2001) and others reviewed by Hellenthal and Stephens (Reference Hellenthal and Stephens2006), but the original method still has utility in evolutionary studies (e.g. Meikeljohn et al., Reference Meikeljohn, Kim, Hartl and Parsch2004).
In 1987, Hudson proposed an estimator for the scaled recombination parameter C=4Nc, where N is the population size and c is the recombination rate between the two most distant of a set of segregating sites. This work came shortly after Kreitman (1983) published the first set of population genetic data at the DNA sequence level. Kreitman had been able to sequence 2·7 kilobases of the Drosophila melanogaster genome in 11 samples. It was felt at that time that population genetics was entering a new era, although Hudson cautioned that sufficiently large data sets for his new estimator ‘may require prohibitively large research efforts’.
Hudson's estimator is based on the variance of the number of site differences between pairs of haplotypes and an estimate of the scaled mutation rate θ=4Nμ. The variance of the number of differences had already been shown by Brown et al. (Reference Brown, Feldman and Nevo1980) to be a convenient single-statistic summary of all the pairwise linkage disequilibria among a set of loci. The need for such a statistic continues as there is still doubt as to how well two-locus associations capture the full multilocus structure. Hudson provided an elegant derivation of the expected value of his statistic as a function of the unknown value C. His method of moments approach to estimation has the great virtue of simplicity although it would not be expected to behave as well as the maximum-likelihood methods that he (Hudson, Reference Hudson, Takahata and Clark1993) and others (e.g. Kuhner et al., Reference Kuhner, Yamato and Felsenstein2000; Wall, Reference Wall2000; Fearnhead and Donnelly, Reference Fearnhead and Donnelly2001) developed later. Likelihood methods exploit all the information in a data set rather than just the information in a summary statistic and will do well provided the underlying evolutionary model is appropriate for the data being addressed. Writing 10 years after Hudson, Wakeley kept the same moment approach but provided modifications to Hudson's method that improved its performance.
Since 1983 the human genome has been sequenced, as have the genomes of several other species. There is now a ‘1000 genomes’ project (http://www.1000genomes.org) under way for humans, and new sequencing techniques will make it possible very soon for population geneticists to obtain large samples of DNA sequence data. In 1987, Hudson wished for more extensive DNA sequence data but he could not have foreseen the remarkable explosion of intermediate data – single-nucleotide polymorphisms (SNPs). Human geneticists are now generating 1 million SNP profiles for samples of thousands of individuals. By 2002, Hudson had produced a simulation procedure for SNP data (Hudson, Reference Hudson2002), and this has been used in studies such as Li and Stephens (Reference Li and Stephens2003) to detect recombination rate ‘hotspots’.
Hudson's 1987 paper has the hallmarks of a classic paper. It introduced a new and simple method for estimating recombination rates from population samples rather than from pedigree data. More sophisticated methods have since been introduced, including composite-likelihood (Hudson, Reference Hudson2001) and others reviewed by Hellenthal and Stephens (Reference Hellenthal and Stephens2006), but the original method still has utility in evolutionary studies (e.g. Meikeljohn et al., Reference Meikeljohn, Kim, Hartl and Parsch2004).
Acknowledgements
This work was supported in part by National Institutes of Health (NIH) grant GM 075091. The assistance of Dr T. Bhangale was very helpful.