Article contents
Moderate Deviations for Word Counts in Biological Sequences
Published online by Cambridge University Press: 14 July 2016
Abstract
Core share and HTML view are not available for this content. However, as you have access to this content, a full PDF is available via the ‘Save PDF’ action button.
We derive a moderate deviation principle for word counts (which is extended to counts of multiple patterns) in biological sequences under different models: independent and identically distributed letters, homogeneous Markov chains of order 1 and m, and, in view of the codon structure of DNA sequences, Markov chains with three different transition matrices. This enables us to approximate P-values for the number of word occurrences in DNA and protein sequences in a new manner.
MSC classification
Secondary:
60F99: None of the above, but in this section
92D20: Protein sequences, DNA sequences
60J10: Markov chains (discrete-time Markov processes on discrete state spaces)
60J20: Applications of Markov chains and discrete-time Markov processes on general state spaces (social mobility, learning theory, industrial processes, etc.)
60G50: Sums of independent random variables; random walks
- Type
- Research Article
- Information
- Copyright
- Copyright © Applied Probability Trust 2009
References
[1]
Behrens, S. (2008). Moderate und gross e abweichungen zur statistischen analyse biologischer sequenzen. , Universität Münster.Google Scholar
[2]
Blaisdell, B. E. (1985). Markov chain analysis finds a significant influence of neighboring bases on the occurrence of a base in eucaryotic nuclear DNA sequences both protein-coding and noncoding. J. Molec. Evol.
21, 278–288.Google Scholar
[3]
Chen, X. (1999). Limit theorems for functionals of ergodic Markov chains with general state space. Mem. Amer. Math. Soc.
139.Google Scholar
[4]
Chung, K. L. (1967). Markov Chains With Stationary Transition Probabilities, 2nd edn.
Springer, New York.Google Scholar
[5]
Dembo, A. and Zeitouni, O. (1998). Large Deviations Techniques and Applications, 2nd edn.
Springer, New York.Google Scholar
[6]
Djellout, H. and Guillin, A. (2001). Moderate deviations for Markov chains with atom. Stoch. Process. Appl.
95, 203–217.Google Scholar
[7]
Durbin, R., Eddy, S., Krogh, A. and Mitchison, G. (1998). Biological Sequence Analysis. Cambridge University Press.Google Scholar
[8]
Hunter, J. J. (2008). Variances of first passage times in a Markov chain with applications to mixing times. Linear Algebra Appl. 429, 1135–1162.Google Scholar
[9]
Kleffe, J. and Borodovsky, M. (1992). First and second moment of counts of words in random texts generated by Markov chains. Comput. Appl. Biosci.
8, 433–441.Google Scholar
[10]
Kleffe, J. and Langbecker, U. (1990). Exact computation of pattern probabilities in random sequences generated by Markov chains. Comput. Appl. Biosci.
6, 347–353.Google Scholar
[11]
Ledoux, M. and Talagrand, M. (1991). Probability in Banach Spaces. Springer, Berlin.Google Scholar
[12]
Meyn, S. P. and Tweedie, R. L. (1993). Markov Chains and Stochastic Stability. Springer, London.Google Scholar
[13]
Nuel, G. (2001). Grandes déviations et chaînes de Markov pour l'étude des occurrences de mots dans les séquences biologiques. , Université d'Essonne.Google Scholar
[14]
Nuel, G. (2006). Numerical solutions for patterns statistics on Markov chains. Statist. Appl. Genet. Molec. Biol.
5, 45 pp.Google Scholar
[15]
Nussinov, R. (1981). The universal dinucleotide asymmetry rules in DNA and the amino acid codon choice. J. Molec. Evol.
17, 237–244.Google Scholar
[16]
Pitman, J. W. (1974). Uniform rates of convergence for Markov chain transition probabilities. Z. Wahrscheinlichkeitsth.
29, 193–227.Google Scholar
[17]
Prum, B., Rodolphe, F. and de Turckheim, È. (1995). Finding words with unexpected frequencies in desoxyribonucleic acid sequences. J. R. Statist. Soc. B
57, 205–220.Google Scholar
[18]
Régnier, M. (2000). A unified approach to word occurrence probabilities. Discrete Appl. Math.
104, 259–280.Google Scholar
[19]
Reinert, G., Schbath, S. and Waterman, M. S. (2005). Probabilistic and statistical properties of finite words in finite sequences. In Applied Combinatorics on Words, eds Berstel, J. and Perrin, D., Cambridge University Press.Google Scholar
[20]
Robin, S. and Daudin, J. J. (1999). Exact distributions of word occurrences in a random sequence of letters. J. Appl. Prob.
36, 179–193.Google Scholar
[21]
Schbath, S. (1995). Compound poisson approximation of word counts in DNA sequences. ESAIM Prob. Statist.
1, 1–16.Google Scholar
[22]
Schbath, S. (1995). Étude asymptotique du nombre d'occurrences d'un mot dans une chaîne de Markov et application à la recherche de mots de fréquence exceptionnelle dans les séquences d'ADN. , Université René Descartes, Paris V.Google Scholar
You have
Access
- 1
- Cited by