Hostname: page-component-586b7cd67f-r5fsc Total loading time: 0 Render date: 2024-11-28T00:23:59.206Z Has data issue: false hasContentIssue false

Estimating the recombination parameter: a commentary on ‘Estimating the recombination parameter of a finite population model without selection’ by Richard R. Hudson

Published online by Cambridge University Press:  29 October 2008

B. S. WEIR*
Affiliation:
Department of Biostatistics, University of Washington, Seattle, WA 98195-7232, USA
*
Rights & Permissions [Opens in a new window]

Abstract

Type
Article Commentary
Copyright
Copyright © 2008 Cambridge University Press

In 1987, Hudson proposed an estimator for the scaled recombination parameter C=4Nc, where N is the population size and c is the recombination rate between the two most distant of a set of segregating sites. This work came shortly after Kreitman (1983) published the first set of population genetic data at the DNA sequence level. Kreitman had been able to sequence 2·7 kilobases of the Drosophila melanogaster genome in 11 samples. It was felt at that time that population genetics was entering a new era, although Hudson cautioned that sufficiently large data sets for his new estimator ‘may require prohibitively large research efforts’.

Hudson's estimator is based on the variance of the number of site differences between pairs of haplotypes and an estimate of the scaled mutation rate θ=4Nμ. The variance of the number of differences had already been shown by Brown et al. (Reference Brown, Feldman and Nevo1980) to be a convenient single-statistic summary of all the pairwise linkage disequilibria among a set of loci. The need for such a statistic continues as there is still doubt as to how well two-locus associations capture the full multilocus structure. Hudson provided an elegant derivation of the expected value of his statistic as a function of the unknown value C. His method of moments approach to estimation has the great virtue of simplicity although it would not be expected to behave as well as the maximum-likelihood methods that he (Hudson, Reference Hudson, Takahata and Clark1993) and others (e.g. Kuhner et al., Reference Kuhner, Yamato and Felsenstein2000; Wall, Reference Wall2000; Fearnhead and Donnelly, Reference Fearnhead and Donnelly2001) developed later. Likelihood methods exploit all the information in a data set rather than just the information in a summary statistic and will do well provided the underlying evolutionary model is appropriate for the data being addressed. Writing 10 years after Hudson, Wakeley kept the same moment approach but provided modifications to Hudson's method that improved its performance.

Since 1983 the human genome has been sequenced, as have the genomes of several other species. There is now a ‘1000 genomes’ project (http://www.1000genomes.org) under way for humans, and new sequencing techniques will make it possible very soon for population geneticists to obtain large samples of DNA sequence data. In 1987, Hudson wished for more extensive DNA sequence data but he could not have foreseen the remarkable explosion of intermediate data – single-nucleotide polymorphisms (SNPs). Human geneticists are now generating 1 million SNP profiles for samples of thousands of individuals. By 2002, Hudson had produced a simulation procedure for SNP data (Hudson, Reference Hudson2002), and this has been used in studies such as Li and Stephens (Reference Li and Stephens2003) to detect recombination rate ‘hotspots’.

Hudson's 1987 paper has the hallmarks of a classic paper. It introduced a new and simple method for estimating recombination rates from population samples rather than from pedigree data. More sophisticated methods have since been introduced, including composite-likelihood (Hudson, Reference Hudson2001) and others reviewed by Hellenthal and Stephens (Reference Hellenthal and Stephens2006), but the original method still has utility in evolutionary studies (e.g. Meikeljohn et al., Reference Meikeljohn, Kim, Hartl and Parsch2004).

Acknowledgements

This work was supported in part by National Institutes of Health (NIH) grant GM 075091. The assistance of Dr T. Bhangale was very helpful.

References

Brown, A. H. D., Feldman, M. W. & Nevo, E. (1980). Multilocus structure of natural populations of Hordeum spontaneum. Genetics 96, 523536.CrossRefGoogle ScholarPubMed
Fearnhead, P. & Donnelly, P. (2001). Estimating recombination rates from population genetic data. Genetics 159, 12991318.CrossRefGoogle ScholarPubMed
Hellenthal, G. & Stephens, M. (2006). Insights into recombination from population genetic variation. Current Opinion in Genetics and Development 16, 565572.CrossRefGoogle ScholarPubMed
Hudson, R. R. (1993) The how and why of generating gene genealogies. In Mechanisms of Molecular Evolution (ed. Takahata, N. and Clark, A. G.), pp. 2336. Sunderland, MA: Sinauer Associates.Google Scholar
Hudson, R. R. (2001). Two-locus sampling distributions and their application. Genetics 159, 18051817.CrossRefGoogle ScholarPubMed
Hudson, R. R. (2002). Generating samples under a Wright–Fisher neutral model of genetic variation. Bioinformatics 18, 337338.CrossRefGoogle Scholar
Kreitman, M. (1983). Nucleotide polymorphism at the alcohol dehydrogenase locus of Drosophila melanogaster. Nature 304, 412417.CrossRefGoogle ScholarPubMed
Kuhner, M. K., Yamato, J. & Felsenstein, J. (2000). Maximum likelihood estimation of recombination rates from population data. Genetics 156, 13931401.CrossRefGoogle ScholarPubMed
Li, N. & Stephens, M. (2003). Modeling linkage disequilibrium and identifying recombination hotspots using single-nucleotide polymorphism data. Genetics 165, 22132233.CrossRefGoogle ScholarPubMed
Meikeljohn, C. D., Kim, Y., Hartl, D. L. & Parsch, J. (2004). Identification of a locus under complex positive selection in Drosophila simulans by haplotype mapping and composite-likelihood estimation. Genetics 168, 265279.CrossRefGoogle Scholar
Wakeley, J. (1997). Using the variance of pairwise differences to estimate the recombination rate. Genetical Research 69, 4548.CrossRefGoogle ScholarPubMed
Wall, J. D. (2000). A comparison of estimators of the population recombination rate. Molecular Biology and Evolution 17, 65163.CrossRefGoogle ScholarPubMed