Markov Additive Processes and Repeats in Sequences

John L. Spouge

doi:10.1239/jap/1183667418

Markov Additive Processes and Repeats in Sequences

Part of: Stochastic processes Special processes

Published online by Cambridge University Press: 14 July 2016

John L. Spouge

Show author details

John L. Spouge*: Affiliation:
National Library of Medicine
*: ∗ Postal address: National Center for Biotechnology Information, National Library of Medicine, Bethesda, MD 20894, USA. Email address: [email protected]

Article contents

Abstract
References

Rights & Permissions

Abstract

Core share and HTML view are not available for this content. However, as you have access to this content, a full PDF is available via the ‘Save PDF’ action button.

Computer analysis of biological sequences often detects deviations from a random model. In the usual model, sequence letters are chosen independently, according to some fixed distribution over the relevant alphabet. Real biological sequences often contain simple repeats, however, which can be broadly characterized as multiple contiguous copies (usually inexact) of a specific word. This paper quantifies inexact simple repeats as local sums in a Markov additive process (MAP). The maximum of the local sums has an asymptotic distribution with two parameters (λ and k), which are given by general MAP formulas. The general MAP formulas are usually computationally intractable, but an essential simplification in the case of repeats permits λ and k to be computed from matrices whose dimension equals the size of the relevant alphabet. The simplification applies to some MAPs where the summand distributions do not depend on consecutive pairs of Markov states as usual, but on pairs with a fixed time-lag larger than one.

Keywords

Markov additive process biological sequence analysis repeats

MSC classification

Primary: 60K15: Markov renewal processes, semi-Markov processes 60K20: Applications of Markov renewal processes (reliability, queueing networks, etc.) 60G51: Processes with independent increments; L'evy processes 60G70: Extreme value theory; extremal processes

Type: Research Article
Information: Journal of Applied Probability , Volume 44 , Issue 2 , June 2007 , pp. 514 - 527

DOI: https://doi.org/10.1239/jap/1183667418 [Opens in a new window]
Copyright: Copyright © Applied Probability Trust 2007

References

Achaz, G. et al. (2007). Repseek, a tool to retrieve approximate repeats from large DNA sequences. Bioinformatics 23, 119–121.CrossRef Google Scholar PubMed

Arratia, R., Martin, D., Reinert, G. and Waterman, M. S. (1996). Poisson process approximation for sequence repeats, and sequencing by hybridization. J. Comput. Biol. 3, 425–463.CrossRef Google Scholar PubMed

Asmussen, S. (2003). Applied Probability and Queues. Springer, New York.Google Scholar

Biggins, J. D. (1987). A note on repeated sequences in Markov chains. Adv. Appl. Prob. 19, 739–742.CrossRef Google Scholar

Biggins, J. D. and Cannings, C. (1987). Markov renewal processes, counters and repeated sequences in Markov chains. Adv. Appl. Prob. 19, 521–545.Google Scholar

Boeva, V., Regnier, M., Papatsenko, D. and Makeev, V. (2006). Short fuzzy tandem repeats in genomic sequences, identification, and possible role in regulation of gene expression. Bioinformatics 22, 676–684.Google Scholar

Dembo, A. and Karlin, S. (1991). Strong limit theorems of empirical distributions for large segmental exceedances of partial sums of Markov variables. Ann. Prob. 19, 1756–1767.Google Scholar

Karlin, S. and Altschul, S. F. (1990). Methods for assessing the statistical significance of molecular sequence features by using general scoring schemes. Proc. Nat. Acad. Sci. USA 87, 2264–2268.Google Scholar

Karlin, S. and Dembo, A. (1992). Limit distributions of maximal segmental score among Markov dependent partial sums. Adv. Appl. Prob. 24, 113–140.CrossRef Google Scholar

Karlin, S. and Ghandour, G. (1985). Comparative statistics for DNA and protein sequences. Multiple sequence analysis. Proc. Nat. Acad. Sci. USA 82, 6186–6190.Google Scholar

Karlin, S. and Ghandour, G. (1985). Comparative statistics for DNA and protein sequences. Single sequence analysis. Proc. Nat. Acad. Sci. USA 82, 5800–5804.Google Scholar

Karlin, S. et al. (1983). New approaches for computer analysis of nucleic-acid sequences. Proc. Nat. Acad. Sci. USA 80, 5660–5664.Google Scholar

Kingman, J. F. C. (1961). A convexity property of positive matrices. Quart. J. Math. Oxford 12, 283–284.Google Scholar

Lancaster, P. and Tismenetsky, M. (1985). The Theory of Matrices. Academic Press, New York.Google Scholar

Regnier, M. (2000). A unified approach to word occurrence probabilities. Discrete Appl. Math. 104, 259–280.Google Scholar

Ross, S. (1996). Stochastic Processes. John Wiley, New York.Google Scholar

Ruzzo, W. L. and Tompa, M. (1999). A linear time algorithm for finding all maximal scoring subsequences. In Proc. Seventh Internat. Conf. Intelligent Systems Molec. Biol., Heidelberg, pp. 234–241.Google Scholar

Shpaer, E. G. et al. (1996). Sensitivity and selectivity in protein similarity searches: a comparison of Smith–Waterman in hardware to BLAST and FASTA. Genomics 38, 179–191.Google Scholar

Smit, A., Hubley, R. and Green, P. (1996). RepeatMasker. Available at http://www.repeatmasker.org.Google Scholar

States, D. J. and Agarwal, P. (1996). Compact encoding strategies for DNA sequence similarity search. In Proc. Internat. Conf. Intelligent Systems Molec. Biol., Heidelberg, pp. 211–217.Google Scholar

Wootton, J. C. (1994). Sequences with ‘unusual’ amino acid composition. Current Opinion Structural Biol. 4, 413–421.Google Scholar

Wootton, J. C. and Federhen, S. (1993). Statistics of local complexity in amino acid sequences and sequence databases. Comput. Chem. 17, 149–163.Google Scholar

Wootton, J. C. and Federhen, S. (1996). Analysis of compositionally biased regions in sequence databases. Meth. Enzymol. 266, 554–571.Google Scholar

Article contents

Markov Additive Processes and Repeats in Sequences

Abstract

Keywords

MSC classification

References

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests