Moderate Deviations for Word Counts in Biological Sequences

Sarah Behrens; Matthias Löwe

doi:10.1239/jap/1261670686

Moderate Deviations for Word Counts in Biological Sequences

Part of: Stochastic processes Limit theorems Markov processes

Published online by Cambridge University Press: 14 July 2016

Sarah Behrens and

Matthias Löwe

Show author details

Sarah Behrens*: Affiliation:
Max Planck Institute for Molecular Genetics
Matthias Löwe*: Affiliation:
University of Münster
*: ∗Postal address: Max Planck Institute for Molecular Genetics, Department for Computational Molecular Biology, Ihnestraβe 63-73, 14195 Berlin, Germany. Email address: [email protected]
∗∗Postal address: Fachbereich Mathematik und Informatik, Universität Münster, Einsteinstr. 62, 48149, Münster, Germany. Email address: [email protected]

Article contents

Abstract
References

Rights & Permissions

Abstract

Core share and HTML view are not available for this content. However, as you have access to this content, a full PDF is available via the ‘Save PDF’ action button.

We derive a moderate deviation principle for word counts (which is extended to counts of multiple patterns) in biological sequences under different models: independent and identically distributed letters, homogeneous Markov chains of order 1 and m, and, in view of the codon structure of DNA sequences, Markov chains with three different transition matrices. This enables us to approximate P-values for the number of word occurrences in DNA and protein sequences in a new manner.

Keywords

Moderate deviations Markov chain word counts motifs biological sequence analysis

MSC classification

Secondary: 60F99: None of the above, but in this section 92D20: Protein sequences, DNA sequences 60J10: Markov chains (discrete-time Markov processes on discrete state spaces) 60J20: Applications of Markov chains and discrete-time Markov processes on general state spaces (social mobility, learning theory, industrial processes, etc.) 60G50: Sums of independent random variables; random walks

Type: Research Article
Information: Journal of Applied Probability , Volume 46 , Issue 4 , December 2009 , pp. 1020 - 1037

DOI: https://doi.org/10.1239/jap/1261670686 [Opens in a new window]
Copyright: Copyright © Applied Probability Trust 2009

References

[1] Behrens, S. (2008). Moderate und gross e abweichungen zur statistischen analyse biologischer sequenzen. , Universität Münster.Google Scholar

[2] Blaisdell, B. E. (1985). Markov chain analysis finds a significant influence of neighboring bases on the occurrence of a base in eucaryotic nuclear DNA sequences both protein-coding and noncoding. J. Molec. Evol. 21, 278–288.Google Scholar

[3] Chen, X. (1999). Limit theorems for functionals of ergodic Markov chains with general state space. Mem. Amer. Math. Soc. 139.Google Scholar

[4] Chung, K. L. (1967). Markov Chains With Stationary Transition Probabilities, 2nd edn. Springer, New York.Google Scholar

[5] Dembo, A. and Zeitouni, O. (1998). Large Deviations Techniques and Applications, 2nd edn. Springer, New York.Google Scholar

[6] Djellout, H. and Guillin, A. (2001). Moderate deviations for Markov chains with atom. Stoch. Process. Appl. 95, 203–217.Google Scholar

[7] Durbin, R., Eddy, S., Krogh, A. and Mitchison, G. (1998). Biological Sequence Analysis. Cambridge University Press.Google Scholar

[8] Hunter, J. J. (2008). Variances of first passage times in a Markov chain with applications to mixing times. Linear Algebra Appl. 429, 1135–1162.Google Scholar

[9] Kleffe, J. and Borodovsky, M. (1992). First and second moment of counts of words in random texts generated by Markov chains. Comput. Appl. Biosci. 8, 433–441.Google Scholar

[10] Kleffe, J. and Langbecker, U. (1990). Exact computation of pattern probabilities in random sequences generated by Markov chains. Comput. Appl. Biosci. 6, 347–353.Google Scholar

[11] Ledoux, M. and Talagrand, M. (1991). Probability in Banach Spaces. Springer, Berlin.Google Scholar

[12] Meyn, S. P. and Tweedie, R. L. (1993). Markov Chains and Stochastic Stability. Springer, London.Google Scholar

[13] Nuel, G. (2001). Grandes déviations et chaînes de Markov pour l'étude des occurrences de mots dans les séquences biologiques. , Université d'Essonne.Google Scholar

[14] Nuel, G. (2006). Numerical solutions for patterns statistics on Markov chains. Statist. Appl. Genet. Molec. Biol. 5, 45 pp.Google Scholar

[15] Nussinov, R. (1981). The universal dinucleotide asymmetry rules in DNA and the amino acid codon choice. J. Molec. Evol. 17, 237–244.Google Scholar

[16] Pitman, J. W. (1974). Uniform rates of convergence for Markov chain transition probabilities. Z. Wahrscheinlichkeitsth. 29, 193–227.Google Scholar

[17] Prum, B., Rodolphe, F. and de Turckheim, È. (1995). Finding words with unexpected frequencies in desoxyribonucleic acid sequences. J. R. Statist. Soc. B 57, 205–220.Google Scholar

[18] Régnier, M. (2000). A unified approach to word occurrence probabilities. Discrete Appl. Math. 104, 259–280.Google Scholar

[19] Reinert, G., Schbath, S. and Waterman, M. S. (2005). Probabilistic and statistical properties of finite words in finite sequences. In Applied Combinatorics on Words, eds Berstel, J. and Perrin, D., Cambridge University Press.Google Scholar

[20] Robin, S. and Daudin, J. J. (1999). Exact distributions of word occurrences in a random sequence of letters. J. Appl. Prob. 36, 179–193.Google Scholar

[21] Schbath, S. (1995). Compound poisson approximation of word counts in DNA sequences. ESAIM Prob. Statist. 1, 1–16.Google Scholar

[22] Schbath, S. (1995). Étude asymptotique du nombre d'occurrences d'un mot dans une chaîne de Markov et application à la recherche de mots de fréquence exceptionnelle dans les séquences d'ADN. , Université René Descartes, Paris V.Google Scholar

Article contents

Moderate Deviations for Word Counts in Biological Sequences

Abstract

Keywords

MSC classification

References

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests