Hostname: page-component-745bb68f8f-grxwn Total loading time: 0 Render date: 2025-01-25T02:47:01.313Z Has data issue: false hasContentIssue false

Asymptotic normality for $\boldsymbol{m}$-dependent and constrained $\boldsymbol{U}$-statistics, with applications to pattern matching in random strings and permutations

Published online by Cambridge University Press:  28 March 2023

Svante Janson*
Affiliation:
Uppsala University
*
*Postal address: Department of Mathematics, Uppsala University, PO Box 480, SE-751 06 Uppsala, Sweden. Email address: [email protected]

Abstract

We study (asymmetric) $U$-statistics based on a stationary sequence of $m$-dependent variables; moreover, we consider constrained $U$-statistics, where the defining multiple sum only includes terms satisfying some restrictions on the gaps between indices. Results include a law of large numbers and a central limit theorem, together with results on rate of convergence, moment convergence, functional convergence, and a renewal theory version.

Special attention is paid to degenerate cases where, after the standard normalization, the asymptotic variance vanishes; in these cases non-normal limits occur after a different normalization.

The results are motivated by applications to pattern matching in random strings and permutations. We obtain both new results and new proofs of old results.

Type
Original Article
Copyright
© The Author(s), 2023. Published by Cambridge University Press on behalf of Applied Probability Trust

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Alsmeyer, G. and Hoefs, V. (2001). Markov renewal theory for stationary $m$ -block factors. Markov Process. Relat. Fields 7, 325348.Google Scholar
Baldi, P. and Rinott, Y. (1989). On normal approximations of distributions in terms of dependency graphs. Ann. Prob. 17, 16461650.10.1214/aop/1176991178CrossRefGoogle Scholar
Barbour, A. D., Karoński, M. and Ruciński, A. (1989). A central limit theorem for decomposable random variables with applications to random graphs. J. Combinatorial Theory B 47, 125145.10.1016/0095-8956(89)90014-2CrossRefGoogle Scholar
Bender, E. A. and Kochman, F. (1993). The distribution of subword counts is usually normal. Europ. J. Combinatorics 14, 265275.10.1006/eujc.1993.1030CrossRefGoogle Scholar
Billingsley, P. (1956). The invariance principle for dependent random variables. Trans. Amer. Math. Soc. 83, 250268.10.1090/S0002-9947-1956-0090923-6CrossRefGoogle Scholar
Blom, G. (1976). Some properties of incomplete $U$ -statistics. Biometrika 63, 573580.10.1093/biomet/63.3.573CrossRefGoogle Scholar
Bóna, M. (2007). The copies of any permutation pattern are asymptotically normal. Preprint. Available at https://arxiv.org/0712.2792.Google Scholar
Bóna, M. (2008). Generalized descents and normality. Electron. J. Combinatorics 15, paper no. 21, 8 pp.10.37236/896CrossRefGoogle Scholar
Bóna, M. (2010). On three different notions of monotone subsequences. In Permutation Patterns, Cambridge University Press, pp. 89114.10.1017/CBO9780511902499.005CrossRefGoogle Scholar
Bourdon, J. and Vallée, B. (2002). Generalized pattern matching statistics. In Mathematics and Computer Science II: Algorithms, Trees, Combinatorics and Probabilities, Birkhäuser, Basel, pp. 249–265.Google Scholar
Bourdon, J. and Vallée, B. (2006). Pattern matching statistics on correlated sources. In LATIN 2006: Theoretical Informatics, Springer, Berlin, pp. 224237.10.1007/11682462_24CrossRefGoogle Scholar
Bradley, R. C. (2007). Introduction to Strong Mixing Conditions, Vol. 1–3. Kendrick Press, Heber City, UT.Google Scholar
Cameron, R. H. and Martin, W. T. (1945). Transformations of Wiener integrals under a general class of linear transformations. Trans. Amer. Math. Soc. 58, 184219.10.1090/S0002-9947-1945-0013240-1CrossRefGoogle Scholar
Chen, L. H. Y., Goldstein, L. and Shao, Q.-M. (2011). Normal Approximation by Stein’s Method. Springer, Berlin. 10.1007/978-3-642-15007-4CrossRefGoogle Scholar
Dehling, H. and Wendler, M. (2010). Central limit theorem and the bootstrap for $U$ -statistics of strongly mixing data. J. Multivariate Anal. 101, 126137.10.1016/j.jmva.2009.06.002CrossRefGoogle Scholar
Dynkin, E. B. and Mandelbaum, A. (1983). Symmetric statistics, Poisson point processes, and multiple Wiener integrals. Ann. Statist. 11, 739745.10.1214/aos/1176346241CrossRefGoogle Scholar
Euler, L. (1743). De summis serierum reciprocarum ex potestatibus numerorum naturalium ortarum dissertatio altera, in qua eaedem summationes ex fonte maxime diverso derivantur. Miscellanea Berolinensia 7, 172–192. Reprinted in Opera Omnia, Ser. 1, Vol. 14, Teubner, Leipzig, 1925, pp. 138–155.Google Scholar
Even-Zohar, C. (2020). Patterns in random permutations. Combinatorica 40, 775804.10.1007/s00493-020-4212-zCrossRefGoogle Scholar
Even-Zohar, C., Lakrec, T. and Tessler, R. J. (2021). Spectral analysis of word statistics. Sém. Lothar. Combin. 85B, paper no. 81, 12 pp.Google Scholar
Fang, X. (2016). A multivariate CLT for bounded decomposable random vectors with the best known rate. J. Theoret. Prob. 29, 15101523.10.1007/s10959-015-0619-7CrossRefGoogle Scholar
Fisher, N. I. and Lee, A. J. (1982). Nonparametric measures of angular–angular association. Biometrika 69, 315321.10.2307/2335405CrossRefGoogle Scholar
Flajolet, P. and Sedgewick, R. (2009). Analytic Combinatorics. Cambridge University Press.10.1017/CBO9780511801655CrossRefGoogle Scholar
Flajolet, P., Szpankowski, W. and Vallée, B. (2006). Hidden word statistics. J. Assoc. Comput. Mach. 53, 147183.10.1145/1120582.1120586CrossRefGoogle Scholar
Gut, A. (2009). Stopped Random Walks, 2nd edn. Springer, New York.10.1007/978-0-387-87835-5CrossRefGoogle Scholar
Gut, A. (2013). Probability: A Graduate Course, 2nd edn. Springer, New York.10.1007/978-1-4614-4708-5CrossRefGoogle Scholar
Han, F. and Qian, T. (2018). On inference validity of weighted U-statistics under data heterogeneity. Electron. J. Statist. 12, 26372708.10.1214/18-EJS1462CrossRefGoogle Scholar
Hoeffding, W. (1948). A class of statistics with asymptotically normal distribution. Ann. Math. Statist. 19, 293325.10.1214/aoms/1177730196CrossRefGoogle Scholar
Hoeffding, W. (1961). The strong law of large numbers for $U$ -statistics. Tech. Rep., University of North Carolina, Institute of Statistics, Mimeograph Series No. 302. Available at https://repository.lib.ncsu.edu/handle/1840.4/2128.Google Scholar
Hofer, L. (2018). A central limit theorem for vincular permutation patterns. Discrete Math. Theoret. Comput. Sci. 19, paper no. 9, 26 pp.Google Scholar
Hsing, T. and Wu, W. B. (2004). On weighted $U$ -statistics for stationary processes. Ann. Prob. 32, 16001631.10.1214/009117904000000333CrossRefGoogle Scholar
Jacquet, P. and Szpankowski, W. (2015). Analytic Pattern Matching: From DNA to Twitter. Cambridge University Press.10.1017/CBO9780511843204CrossRefGoogle Scholar
Jammalamadaka, S. R. and Janson, S. (1986). Limit theorems for a triangular scheme of $U$ -statistics with applications to inter-point distances. Ann. Prob. 14, 13471358.10.1214/aop/1176992375CrossRefGoogle Scholar
Janson, S. (1983). Renewal theory for $M$ -dependent variables. Ann. Prob. 11, 558568.10.1214/aop/1176993500CrossRefGoogle Scholar
Janson, S. (1988). Normal convergence by higher semi-invariants with applications to sums of dependent random variables and random graphs. Ann. Prob. 16, 305312.10.1214/aop/1176991903CrossRefGoogle Scholar
Janson, S. (1997). Gaussian Hilbert Spaces. Cambridge University Press.10.1017/CBO9780511526169CrossRefGoogle Scholar
Janson, S. (2015). On degenerate sums of $m$ -dependent variables. J. Appl. Prob. 52, 11461155.10.1239/jap/1450802758CrossRefGoogle Scholar
Janson, S. (2018). Renewal theory for asymmetric $U$ -statistics. Electron. J. Prob. 23, paper no. 129, 27 pp.10.1214/18-EJP252CrossRefGoogle Scholar
Janson, S. (2022). The number of occurrences of patterns in a random tree or forest permutation. Preprint. Available at https://arxiv.org/2203.04182.Google Scholar
Janson, S., Nakamura, B. and Zeilberger, D. (2015). On the asymptotic statistics of the number of occurrences of multiple permutation patterns. J. Combinatorics 6, 117143.10.4310/JOC.2015.v6.n1.a8CrossRefGoogle Scholar
Janson, S. and Szpankowski, W. (2021). Hidden words statistics for large patterns. Electron. J. Combinatorics 28, paper no. P2.36, 26 pp.10.37236/9452CrossRefGoogle Scholar
Kallenberg, O. (2002). Foundations of Modern Probability, 2nd edn. Springer, New York.10.1007/978-1-4757-4015-8CrossRefGoogle Scholar
Major, P. (1994). Asymptotic distributions for weighted $U$ -statistics. Ann. Prob. 22, 15141535.10.1214/aop/1176988610CrossRefGoogle Scholar
Malevich, T. L. and Abdalimov, B. (1982). Refinement of the central limit theorem for $U$ -statistics of $m$ -dependent variables. Teor. Veroyat. Primen. 27, 369–373 (in Russian). English translation: Theory Prob. Appl. 27, 391396.10.1137/1127045CrossRefGoogle Scholar
Miller, R. G. Jr. and Sen, P. K. (1972). Weak convergence of $U$ -statistics and von Mises’ differentiable statistical functions. Ann. Math. Statist. 43, 3141.10.1214/aoms/1177692698CrossRefGoogle Scholar
Nicodème, P., Salvy, B. and Flajolet, P. (2002). Motif statistics. Theoret. Comput. Sci. 287, 593617.10.1016/S0304-3975(01)00264-XCrossRefGoogle Scholar
Olver, F. W. J., Lozier, D. W., Boisvert, R. F. and Clark, C. W. (eds) (2010). NIST Handbook of Mathematical Functions. Cambridge University Press.Google Scholar
O’Neil, K. A. and Redner, R. A. (1993). Asymptotic distributions of weighted $U$ -statistics of degree 2. Ann. Prob. 21, 11591169.Google Scholar
Orey, S. (1958). A central limit theorem for $m$ -dependent random variables. Duke Math. J. 25, 543546.10.1215/S0012-7094-58-02548-1CrossRefGoogle Scholar
Peligrad, M. (1996). On the asymptotic normality of sequences of weak dependent random variables. J. Theoret. Prob. 9, 703715.10.1007/BF02214083CrossRefGoogle Scholar
Pike, J. (2011). Convergence rates for generalized descents. Electron. J. Combinatorics 18, paper no. 236, 14 pp.10.37236/723CrossRefGoogle Scholar
Raič, M. (2004). A multivariate CLT for decomposable random vectors with finite second moments. J. Theoret. Prob. 17, 573603.10.1023/B:JOTP.0000040290.44087.68CrossRefGoogle Scholar
Régnier, M. and Szpankowski, W. (1998). On pattern frequency occurrences in a Markovian sequence. Algorithmica 22, 631649.10.1007/PL00009244CrossRefGoogle Scholar
Revuz, D. and Yor, M. (1999). Continuous Martingales and Brownian Motion, 3rd edn. Springer, Berlin.10.1007/978-3-662-06400-9CrossRefGoogle Scholar
Rinott, Y. (1994). On normal approximation rates for certain sums of dependent random variables. J. Comput. Appl. Math. 55, 135143.10.1016/0377-0427(94)90016-7CrossRefGoogle Scholar
Rinott, Y. and Rotar, V. (1997). On coupling constructions and rates in the CLT for dependent summands with applications to the antivoter model and weighted $U$ -statistics. Ann. Appl. Prob. 7, 10801105.10.1214/aoap/1043862425CrossRefGoogle Scholar
Rubin, H. and Vitale, R. A. (1980). Asymptotic distribution of symmetric statistics. Ann. Statist. 8, 165170.10.1214/aos/1176344898CrossRefGoogle Scholar
Sen, P. K. (1960). On some convergence properties of $U$ -statistics. Calcutta Statist. Assoc. Bull. 10, 118.10.1177/0008068319600101CrossRefGoogle Scholar
Sen, P. K. (1963). On the properties of $U$ -statistics when the observations are not independent. I. Estimation of non-serial parameters in some stationary stochastic process. Calcutta Statist. Assoc. Bull. 12, 6992.10.1177/0008068319630301CrossRefGoogle Scholar
Sen, P. K. (1972). Limiting behavior of regular functionals of empirical distributions for stationary mixing processes. Z. Wahrscheinlichkeitsth. 25, 7182.10.1007/BF00533337CrossRefGoogle Scholar
Sen, P. K. (1974). Weak convergence of generalized $U$ -statistics. Ann. Prob. 2, 90102.10.1214/aop/1176996754CrossRefGoogle Scholar
Shapiro, C. P. and Hubert, L. (1979). Asymptotic normality of permutation statistics derived from weighted sums of bivariate functions. Ann. Statist. 7, 788794.10.1214/aos/1176344728CrossRefGoogle Scholar
Szpankowski, W. (2001). Average Case Analysis of Algorithms on Sequences. John Wiley, New York.10.1002/9781118032770CrossRefGoogle Scholar
Utev, S. A. (1990). Central limit theorem for dependent random variables. In Probability Theory and Mathematical Statistics: Proc. Fifth Vilnius Conference, June 25–July 1, 1990, Vol. II, Mokslas, Vilnius, pp. 519528.Google Scholar
Yoshihara, K. (1976). Limiting behavior of $U$ -statistics for stationary, absolutely regular processes. Z. Wahrscheinlichkeitsth. 35, 237252.10.1007/BF00532676CrossRefGoogle Scholar
Yoshihara, K. (1992). Limiting behavior of $U$ -statistics for strongly mixing sequences. Yokohama Math. J. 39, 107113.Google Scholar
Zhou, Z. (2014). Inference of weighted $V$ -statistics for nonstationary time series and its applications. Ann. Statist. 42, 87114.10.1214/13-AOS1184CrossRefGoogle Scholar