Hostname: page-component-cd9895bd7-fscjk Total loading time: 0 Render date: 2024-12-27T12:53:58.520Z Has data issue: false hasContentIssue false

Computational Inference Beyond Kingman's Coalescent

Published online by Cambridge University Press:  30 January 2018

Jere Koskela*
Affiliation:
University of Warwick
Paul Jenkins*
Affiliation:
University of Warwick
Dario Spanò*
Affiliation:
University of Warwick
*
Postal address: Mathematics Institute, University of Warwick, Coventry CV4 7AL, UK. Email address: [email protected]
∗∗ Postal address: Department of Statistics, University of Warwick, Coventry CV4 7AL, UK.
∗∗ Postal address: Department of Statistics, University of Warwick, Coventry CV4 7AL, UK.
Rights & Permissions [Opens in a new window]

Abstract

Core share and HTML view are not available for this content. However, as you have access to this content, a full PDF is available via the ‘Save PDF’ action button.

Full likelihood inference under Kingman's coalescent is a computationally challenging problem to which importance sampling (IS) and the product of approximate conditionals (PAC) methods have been applied successfully. Both methods can be expressed in terms of families of intractable conditional sampling distributions (CSDs), and rely on principled approximations for accurate inference. Recently, more general Λ- and Ξ-coalescents have been observed to provide better modelling fits to some genetic data sets. We derive families of approximate CSDs for finite sites Λ- and Ξ-coalescents, and use them to obtain ‘approximately optimal’ IS and PAC algorithms for Λ-coalescents, yielding substantial gains in efficiency over existing methods.

Type
Research Article
Copyright
© Applied Probability Trust 

References

Árnason, E. (2004). Mitochondrial cytochrome b DNA variation in the high-fecundity Atlantic cod: trans-Atlantic clines and shallow gene genealogy. Genetics 166, 18711885.Google Scholar
Birkner, M. and Blath, J. (2008). Computing likelihoods for coalescents with multiple collisions in the infinitely many sites model. J. Math. Biol. 57, 435465.Google Scholar
Birkner, M. and Blath, J. (2009). Measure-valued diffusions, general coalescents and population genetic inference. In Trends in Stochastic Analysis (London Math. Soc. Lecture Notes Ser. 353), Cambridge University Press, pp. 329363.CrossRefGoogle Scholar
Birkner, M., Blath, J. and Eldon, B. (2013). An ancestral recombination graph for diploid populations with skewed offspring distribution. Genetics 193, 255290.Google Scholar
Birkner, M., Blath, J. and Steinrücken, M. (2011). Importance sampling for Lambda-coalescents in the infinitely many sites model. Theoret. Pop. Biol. 79, 155173.Google Scholar
Birkner, M. et al. (2009). A modified lookdown construction for the Xi–Fleming–Viot process with mutation and populations with recurrent bottlenecks. ALEA Lat. Amer. J. Prob. Math. Statist. 6, 2561.Google Scholar
Boom, J. D. G., Boulding, E. G. and Beckenback, A. T. (1994). Mitochondrial DNA variation in introduced populations of Pacific oyster, Crassostrea gigas, in British Columbia. Canad. J. Fish. Aquat. Sci. 51, 16081614.Google Scholar
De Iorio, M. and Griffiths, R. C. (2004). Importance sampling on coalescent histories. I. Adv. Appl. Prob. 36, 417433.Google Scholar
De Iorio, M. and Griffiths, R. C. (2004). Importance sampling on coalescent histories. II. Subdivided population models. Adv. Appl. Prob. 36, 434454.Google Scholar
De Iorio, M., Griffiths, R. C., Leblois, R. and Rousset, F. (2005). Stepwise mutation likelihood computation by sequential importance sampling in subdivided population models. Theoret. Pop. Biol. 68, 4153.Google Scholar
Donnelly, P. and Kurtz, T. G. (1999). Particle representations for measure-valued population models. Ann. Prob. 27, 166205.CrossRefGoogle Scholar
Eldon, B. and Wakeley, J. (2006). Coalescent processes when the distribution of offspring number among individuals is highly skewed. Genetics 172, 26212633.Google Scholar
Fearnhead, P. and Donnelly, P. (2001). Estimating recombination rates from population genetic data. Genetics 159, 12991318.Google Scholar
Felsenstein, J., Kuhner, M. K., Yamato, J. and Beerli, P. (1999). Likelihoods on Coalescents: A Monte Carlo Sampling Approach to Inferring Parameters from Population Samples of Molecular Data (IMS Lect. Notes Monogr. Ser. 33), Institute of Mathematical Statistics, Hayward, CA, pp. 163185.Google Scholar
Görür, D. and Teh, Y. W. (2008). An efficient sequential {Monte Carlo} algorithm for coalescent clustering. In Advances in Neural Information Processing Systems 21 (NIPS 2008), 8pp.Google Scholar
Griffiths, R. C. and Marjoram, P. (1996). Ancestral inference from samples of DNA sequences with recombination. J. Comput. Biol. 3, 479502.Google Scholar
Griffiths, R. C. and Tavaré, S. (1994). Ancestral inference in population genetics. Statist. Sci. 9, 307319.Google Scholar
Griffiths, R. C. and Tavaré, S. (1994). Sampling theory for neutral alleles in a varying environment. Phil. Trans. R. Soc. London B 344, 403410.Google Scholar
Griffiths, R. C. and Tavaré, S. (1994). Simulating probability distributions in the coalescent. Theoret. Pop. Biol. 46, 131159.Google Scholar
Griffiths, R. C. and Tavaré, S. (1999). The ages of mutations in gene trees. Ann. Appl. Prob. 9, 567590.Google Scholar
Griffiths, R. C., Jenkins, P. A. and Song, Y. S. (2008). Importance sampling and the two-locus model with subdivided population structure. Adv. Appl. Prob. 40, 473500.Google Scholar
Hobolth, A., Uyenoyama, M. K. and Wiuf, C. (2008). Importance sampling for the infinite sites model. Statist. Appl. Genet. Mol. Biol. 7, Article 32.CrossRefGoogle ScholarPubMed
Jenkins, P. A. (2012). Stopping-time resampling and population genetic inference under coalescent models. Statist. Appl. Genet. Mol. Biol. 11, Article 9.Google Scholar
Jenkins, P. A. and Griffiths, R. C. (2011). Inference from samples of DNA sequences using a two-locus model. J. Comput. Biol. 18, 109127.Google Scholar
Kingman, J. F. C. (1982). The coalescent. Stoch. Process. Appl. 13, 235248.CrossRefGoogle Scholar
Li, N. and Stephens, M. (2003). Modeling linkage disequilibrium and identifying recombination hotspots using single-nucleotide polymorphism data. Genetics 165, 22132233.Google Scholar
Meng, X.-L. and Wong, W. H. (1996). Simulating ratios of normalizing constants via a simple identity: a theoretical exploration. Statist. Sinica 6, 831860.Google Scholar
Möhle, M. (2006). On sampling distributions for coalescent processes with simultaneous multiple collisions. Bernoulli 12, 3553.Google Scholar
Möhle, M. and Sagitov, S. (2001). A classification of coalescent processes for haploid exchangeable population models. Ann. Prob. 29, 15471562.Google Scholar
Möhle, M. and Sagitov, S. (2003). Coalescent patterns in diploid exchangeable population models. J. Math. Biol. 47, 337352.Google Scholar
Paul, J. S. and Song, Y. S. (2010). A principled approach to deriving approximate conditional sampling distributions in population genetic models with recombination. Genetics 186, 321338.Google Scholar
Paul, J. S., Steinrücken, M. and Song, Y. S. (2011). An accurate sequentially Markov conditional sampling distribution for the coalescent with recombination. Genetics 187, 11151128.Google Scholar
Pitman, J. (1999). Coalescents with multiple collisions. Ann. Prob. 27, 18701902.CrossRefGoogle Scholar
Sagitov, S. (1999). The general coalescent with asynchronous mergers of ancestral lines. J. Appl. Prob. 36, 11161125.Google Scholar
Sargsyan, O. and Wakeley, J. (2008). A coalescent process with simultaneous multiple mergers for approximating the gene genealogies of many marine organisms. Theoret. Pop. Biol. 74, 104114.Google Scholar
Schweinsberg, J. (2000). Coalescents with simultaneous multiple collisions. Electron. J. Prob. 5, 50pp.Google Scholar
Schweinsberg, J. (2003). Coalescent processes obtained from supercritical Galton–Watson processes. Stoch. Process. Appl. 106, 107139.Google Scholar
Sheehan, S., Harris, K. and Song, Y. S. (2013). Estimating variable effective population sizes from multiple genomes: a sequentially Markov conditional sampling distribution approach. Genetics 194, 647662.Google Scholar
Steinrücken, M., Birkner, M. and Blath, J. (2013). Analysis of DNA sequence variation within marine species using Beta-coalescents. Theoret. Pop. Biol. 87, 1524.CrossRefGoogle ScholarPubMed
Steinrücken, M., Paul, J. S. and Song, Y. S. (2013). A sequentially Markov conditional sampling distribution for structured populations with migration and recombination. Theoret. Pop. Biol. 87, 5161.Google Scholar
Stephens, M. and Donnelly, P. (2000). Inference in molecular population genetics. J. R. Statist. Soc. B 62, 605655.Google Scholar
Taylor, J. E. and Véber, A. (2009). Coalescent processes in subdivided populations subject to recurrent mass extinctions. Electron. J. Prob. 14, 242288.Google Scholar