Asymptotics of the allele frequency spectrum and the number of alleles

Ross A. Maller; Soudabeh Shemehsavar

doi:10.1017/jpr.2024.84

Asymptotics of the allele frequency spectrum and the number of alleles

Part of: Applications Stochastic processes Distribution theory

Published online by Cambridge University Press: 22 November 2024

Ross A. Maller

and

Soudabeh Shemehsavar

Show author details

Ross A. Maller*: Affiliation:
The Australian National University
Soudabeh Shemehsavar*: Affiliation:
Murdoch University and University of Tehran
*: *Postal address: Research School of Finance, Actuarial Studies and Statistics, Australian National University, Canberra, ACT, 0200, Australia. Email address: [email protected]
**Postal address: College of Science, Technology, Engineering and Mathematics, and Centre for Healthy Ageing, Health Future Institute, Murdoch University, and School of Mathematics, Statistics & Computer Sciences, University of Tehran. Email address: [email protected]

Article contents

Abstract
References

Get access

Rights & Permissions

Abstract

We derive large-sample and other limiting distributions of components of the allele frequency spectrum vector, $\mathbf{M}_n$, joint with the number of alleles, $K_n$, from a sample of n genes. Models analysed include those constructed from gamma and $\alpha$-stable subordinators by Kingman (thus including the Ewens model), the two-parameter extension by Pitman and Yor, and a two-parameter version constructed by omitting large jumps from an $\alpha$-stable subordinator. In each case the limiting distribution of a finite number of components of $\mathbf{M}_n$ is derived, joint with $K_n$. New results include that in the Poisson–Dirichlet case, $\mathbf{M}_n$ and $K_n$ are asymptotically independent after centering and norming for $K_n$, and it is notable, especially for statistical applications, that in other cases the limiting distribution of a finite number of components of $\mathbf{M}_n$, after centering and an unusual $n^{\alpha/2}$ norming, conditional on that of $K_n$, is normal.

Keywords

Allele frequency spectrum generalised Poisson–Dirichlet laws Ewens sampling formula Kingman’s Poisson–Dirichlet distributions Pitman sampling formula gene and species distributions

MSC classification

Primary: 60G51: Processes with independent increments; L'evy processes 60G52: Stable processes 60G55: Point processes

Secondary: 60G57: Random measures 62E20: Asymptotic distribution theory 62P10: Applications to biology and medical sciences

Type: Original Article
Information: Journal of Applied Probability , Volume 62 , Issue 2 , June 2025 , pp. 516 - 540

DOI: https://doi.org/10.1017/jpr.2024.84 [Opens in a new window]
Copyright: © The Author(s), 2024. Published by Cambridge University Press on behalf of Applied Probability Trust

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Article purchase

Temporarily unavailable

References

Arratia, R. and Baxendale, P. (2015). Bounded size bias coupling: a Gamma function bound, and universal Dickman-function behavior. Prob. Theory Related Fields 162, 411–429.CrossRef Google Scholar

Arratia, R., Barbour, A. and Tavaré, S. (2003). Logarithmic Combinatorial Structures: A Probabilistic Approach (EMS Monographs in Mathematics). European Mathematical Society, Zurich.CrossRef Google Scholar

Basdevant, A. and Goldschmidt, C. (2008). Asymptotics of the allele frequency spectrum associated with the Bolthausen–Sznitman coalescent. Electron. J. Prob. 13, 486–512.CrossRef Google Scholar

Berestycki, J., Berestycki, N. and Schweinsberg, J. (2007). Beta-coalescents and continuous stable random trees. Ann. Prob. 35, 1835–1887.CrossRef Google Scholar

Cereda, G. and Corradi, F. (2023). Learning the two parameters of the Poisson–Dirichlet distribution with a forensic application. Scand. J. Statist. 50, 120–141.CrossRef Google Scholar

Chegini, S. and Zarepour, M. (2023). Random discrete probability measures based on negative binomial process. Available at arXiv:2307.00176.Google Scholar

Covo, S. (2009). On approximations of small jumps of subordinators with particular emphasis on a Dickman-type limit. J. Appl. Prob. 46, 732–755.CrossRef Google Scholar

Covo, S. (2009). One-dimensional distributions of subordinators with upper truncated Lévy measure, and applications. Adv. Appl. Prob. 41, 367–392.CrossRef Google Scholar

Dolera, E. and Favaro, S. (2020). A Berry–Esseen theorem for Pitman’s

$\alpha$ -diversity. Ann. Appl. Prob. 30, 847–869.CrossRef Google Scholar

Ewens, W. (1972). The sampling theory of selectively neutral alleles. Theoret. Pop. Biol. 3, 87–112.CrossRef Google Scholar PubMed

Favaro, S. and Feng, S. (2014). Asymptotics for the number of blocks in a conditional Ewens–Pitman sampling model. Electron. J. Prob. 19, 21, 1–15.CrossRef Google Scholar

Feng, S. (2007). Large deviations associated with Poisson–Dirichlet distribution and Ewens sampling formula. Ann. Appl. Prob. 17, 1570–1595.CrossRef Google Scholar

Feng, S. (2010). The Poisson–Dirichlet Distribution and Related Topics: Models and Asymptotic Behaviours (Probability and its Applications). Springer.CrossRef Google Scholar

Freund, F. and and Möhle, M. (2009). On the number of allelic types for samples taken from exchangeable coalescents with mutation. Adv. Appl. Prob. 41, 1082–1101.CrossRef Google Scholar

Gnedenko, B. V. and Kolmogorov, A. N. (1968). Limit Distributions for Sums of Independent Random Variables. Addison-Wesley.Google Scholar

Gregoire, G. (1984). Negative binomial distributions for point processes. Stoch. Process. Appl. 16, 179–188.CrossRef Google Scholar

Griffiths, R. C. (1979). On the distribution of allele frequencies in a diffusion model. Theoret. Pop. Biol. 15, 140–158.CrossRef Google Scholar

Griffiths, R. C. (2003). The frequency spectrum of a mutation, and its age, in a general diffusion model. Theoret. Pop. Biol. 64, 241–251.CrossRef Google Scholar

Grote, M. N. and Speed, T. P. (2002). Approximate Ewens formulae for symmetric overdominance selection. Ann. Appl. Prob. 12, 637–663.CrossRef Google Scholar

Handa, K. (2009). The two-parameter Poisson–Dirichlet point process. Bernoulli 15, 1082–1116.CrossRef Google Scholar

Hansen, J. (1994). Order statistics for decomposable combinatorial structures. Random Structures Algorithms 5I, 517–533.CrossRef Google Scholar

Hensley, D. (1982). The convolution powers of the Dickman function. J. London Math. Soc. s2-33, 395–406.Google Scholar

Ipsen, Y. F. and Maller, R. A. (2017). Negative binomial construction of random discrete distributions on the infinite simplex. Theory Stoch. Process. 22, 34–46.Google Scholar

Ipsen, Y. F., Maller, R. A. and Shemehsavar, S. (2020). Limiting distributions of generalised Poisson–Dirichlet distributions based on negative binomial processes. J. Theoret. Prob. 33, 1974–2000.CrossRef Google Scholar

Ipsen, Y. F., Maller, R. A. and Shemehsavar, S. (2020). Size biased sampling from the Dickman subordinator. Stoch. Process. Appl. 130, 6880–6900.CrossRef Google Scholar

Ipsen, Y. F., Maller, R. A. and Shemehsavar, S. (2021). A generalised Dickman distribution and the number of species in a negative binomial process model. Adv. Appl. Prob. 53, 370–399.CrossRef Google Scholar

Ipsen, Y. F., Shemehsavar, S. and Maller, R. A. (2018). Species sampling models generated by negative binomial processes. Available at arXiv:1904.13046.Google Scholar

James, L. F. (2008). Large sample asymptotics for the two-parameter Poisson–Dirichlet process. Pushing the Limits of Contemporary Statistics: Contributions in Honor of Jayanta K. Ghosh, vol. 3, pp. 187–199. Institute of Mathematical Statistics.Google Scholar

Joyce, P., Krone, S. M. and Kurtz, T.G. (2002). Gaussian limits associated with the Poisson–Dirichlet distribution and the Ewens sampling formula. Ann. Appl. Prob. 12, 101–124.CrossRef Google Scholar

Keith, T. P., Brooks, L. D., Lewontin, R. C., Martinez-Cruzado, J. C. and Rigby, D. L. (1985). Nearly identical allelic distributions of xanthine dehydrogenase in two populations of Drosophila pseudoobscural . Mol. Biol. Evol. 2, 206–216.Google Scholar

Kingman, J. F. C. (1975). Random discrete distributions. J. R. Statist. Soc. B 37, 1–22.CrossRef Google Scholar

Kingman, J. F. C. (1982). The coalescent. Stoch. Process. Appl. 13, 235–248.CrossRef Google Scholar

Koriyama, T., Matsuda, T. and Komaki, F. (2023). Asymptotic analysis of parameter estimation for the Ewens–Pitman partition. Available at arXiv:2207.01949v3.Google Scholar

Lijoi, A., Mena, R. H. and Prunster, I. (2005). Mixture modeling with normalized inverse-Gaussian priors. J. Amer. Statist. Assoc. 100, 1278–1291.CrossRef Google Scholar

Maller, R. A. and Shemehsavar, S. (2023). Generalized Poisson–Dirichlet distributions based on the Dickman subordinator. Theory Prob. Appl. 67, 593–612.CrossRef Google Scholar

Mas-Sandoval, A., Pope, N. S., Nielsen, K. N., Altinkaya, I., Fumagalli, M. and Korneliussen, T. S. (2022). Fast and accurate estimation of multidimensional site frequency spectra from low-coverage high-throughput sequencing data. Gigascience 11, giac032.CrossRef Google Scholar PubMed

Möhle, M. (2015). The Mittag–Leffler process and a scaling limit for the block counting process of the Bolthausen–Sznitman coalescent. ALEA 12, 35–53.Google Scholar

Perman, M. (1993). Order statistics for jumps of normalised subordinators. Stoch. Process. Appl. 46, 267–281.CrossRef Google Scholar

Perman, M., Pitman, J. and Yor, M. (1992). Size-biased sampling of Poisson point processes and excursions. Prob. Theory Related Fields 92, 21–39.CrossRef Google Scholar

Pitman, J. (1995). Exchangeable and partially exchangeable random partitions. Prob. Theory Related Fields 102, 145–158.CrossRef Google Scholar

Pitman, J. (1997). Partition structures derived from Brownian motion and stable subordinators. Bernoulli 3, 79–96.CrossRef Google Scholar

Pitman, J. (2006). Combinatorial Stochastic Processes. Springer, Berlin.Google Scholar

Pitman, J. and Yor, M. (1997). The two-parameter Poisson–Dirichlet distribution derived from a stable subordinator. Ann. Prob. 25, 855–900.CrossRef Google Scholar

Ruggiero, M., Walker, S. G. and Favaro, S. (2013). Alpha-diversity processes and normalized inverse-Gaussian diffusions. Ann. Appl. Prob. 23, 386–425.CrossRef Google Scholar

Watterson, G. A. (1974). The sampling theory of selectively neutral alleles. Adv. Appl. Prob. 6, 463–468.CrossRef Google Scholar

Zhang, J. and Dassios, A. (2024). Truncated two-parameter Poisson–Dirichlet approximation for Pitman–Yor process hierarchical models. Scand. J. Statist. 51, 590–611.CrossRef Google Scholar

Zhou, M., Favaro, S. and Walker, S. G. (2017). Frequency of frequencies distributions and size-dependent exchangeable random partitions. J. Amer. Statist. Assoc. 112, 1623–1635.CrossRef Google Scholar

Maller and Shemehsavar supplementary material 1

Maller and Shemehsavar supplementary material

File 360.1 KB

Maller and Shemehsavar supplementary material 2

Maller and Shemehsavar supplementary material

File 53.3 KB

Maller and Shemehsavar supplementary material 3

Maller and Shemehsavar supplementary material

File 131.2 KB

Article contents

Asymptotics of the allele frequency spectrum and the number of alleles

Abstract

Keywords

MSC classification

Access options

Article purchase

Temporarily unavailable

References

Maller and Shemehsavar supplementary material 1

Maller and Shemehsavar supplementary material 2

Maller and Shemehsavar supplementary material 3

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests