Published online by Cambridge University Press: 14 July 2016
We derive a moderate deviation principle for word counts (which is extended to counts of multiple patterns) in biological sequences under different models: independent and identically distributed letters, homogeneous Markov chains of order 1 and m, and, in view of the codon structure of DNA sequences, Markov chains with three different transition matrices. This enables us to approximate P-values for the number of word occurrences in DNA and protein sequences in a new manner.