A stationary distribution associated to a set of laws whose initial states are grouped into classes. An application in genomics

Servet Martínez

doi:10.1017/jpr.2016.2

A stationary distribution associated to a set of laws whose initial states are grouped into classes. An application in genomics

Part of: Markov processes

Published online by Cambridge University Press: 21 June 2016

Servet Martínez

Show author details

Servet Martínez*: Affiliation:
Universidad de Chile
*: * Postal address: Departamento Ingeniería Matemática and Centro Modelamiento Matemático, Universidad de Chile, UMI 2807 CNRS, Casilla 170-3, Correo 3, Santiago, Chile. Email address: [email protected]

Article contents

Abstract
References

Get access

Rights & Permissions

Abstract

Let I be a finite set and S be a nonempty strict subset of I which is partitioned into classes, and let C(s) be the class containing s ∈ S. Let (Ps: s ∈ S) be a family of distributions on IN, where each Ps applies to sequences starting with the symbol s. To this family, we associate a class of distributions P(π) on IN which depends on a probability vector π. Our main results assume that, for each s ∈ S, Ps regenerates with distribution Ps' when it encounters s' ∈ S ∖ C(s). From semiregenerative theory, we determine a simple condition on π for P(π) to be time stationary. We give a similar result for the following more complex model. Once a symbol s' ∈ S ∖ C(s) has been encountered, there is a decision to be made: either a new region of type C(s') governed by Ps' starts or the region continues to be a C(s) region. This decision is modeled as a random event and its probability depends on s and s'. The aim in studying these kinds of models is to attain a deeper statistical understanding of bacterial DNA sequences. Here I is the set of codons and the classes (C(s): s ∈ S) identify codons that initiate similar genomic regions. In particular, there are two classes corresponding to the start and stop codons which delimit coding and noncoding regions in bacterial DNA sequences. In addition, the random decision to continue the current region or begin a new region of a different class reflects the well-known fact that not every appearance of a start codon marks the beginning of a new coding region.

Keywords

Markov chain stationary distribution regenerative process Palm theory genomics

MSC classification

Primary: 60J10: Markov chains (discrete-time Markov processes on discrete state spaces) 60J20: Applications of Markov chains and discrete-time Markov processes on general state spaces (social mobility, learning theory, industrial processes, etc.) 92D10: Genetics 92D20: Protein sequences, DNA sequences

Type: Research Papers
Information: Journal of Applied Probability , Volume 53 , Issue 2 , June 2016 , pp. 315 - 326

DOI: https://doi.org/10.1017/jpr.2016.2 [Opens in a new window]
Copyright: Copyright © Applied Probability Trust 2016

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Article purchase

Temporarily unavailable

References

[1]Albrecht-Buehler, G. (2006).Asymptotically increasing compliance of genomes with Chargaff's second parity rules through inversions and inverted transpositions.Proc. Nat. Acad. Sci. USA 103, 17828–17833.CrossRef Google Scholar PubMed

[2]Allegrini, P., Buiatti, M., Grigolini, P. and West, B. J. (1998).Fractional Brownian motion as a nonstationary process: an alternative paradigm for DNA sequences.Phys. Rev. E 57, 4558–4567.Google Scholar

[3]Asmussen, S. (2003).Applied Probability and Queues, 2nd edn.Springer, New York.Google Scholar

[4]Bell, S. J. and Forsdyke, D. R. (1999).Deviations from Chargaff's second parity rule correlate with direction of transcription.J. Theoret. Biol. 197, 63–76.CrossRef Google Scholar PubMed

[5]Bouaynaya, N. and Schonfeld, D. (2007).Non-stationary analysis of DNA sequences. In Proc. IEEE Statistical Signal Processing Workshop, IEEE, New York, pp.200–204.Google Scholar

[6]Bouaynaya, N. and Schonfeld, D. (2008).Emergence of new structure from non-stationary analysis of genomic sequences. In Proc. IEEE International Workshop on Genomic Signal Processing and Statistics, IEEE, New York, pp.1–4.Google Scholar

[7]Bouaynaya, N. and Schonfeld, D. (2008).Nonstationary analysis of coding and noncoding regions in nucleotide sequences.IEEE J. Selected Topics Signal Process. 2, 357–364.Google Scholar

[8]Forsdyke, D. R. (2011).Evolutionary Bioinformatics, 2nd edn.Springer, New York.Google Scholar

[9]Hart, A. and Martínez, S. (2011).Statistical testing of Chargaff's second parity rule in bacterial genome sequences.Stoch. Models 27, 272–317.Google Scholar

[10]Hart, A. and Martínez, S. (2014).Markovianness and conditional independence in annotated bacterial DNA.Statist. Appl. Genetics Molec. Biol. 13, 693–716.Google Scholar

[11]Hart, A. G., Martínez, S. and Videla, L. (2006).A simple maximization model inspired by algorithms for the organization of genetic candidates in bacterial DNA.Adv. Appl. Prob. 38, 1071–1097.Google Scholar

[12]Karlin, S. and Brendel, V. (1993).Patchiness and correlations in DNA sequences.Science 259, 677–680.CrossRef Google Scholar PubMed

[13]Li, W. and Kaneko, K. (1992).Long-range correlation and partial 1/f ^α spectrum in a noncoding DNA sequence.Europhys. Lett. 17, 655–660.Google Scholar

[14]Milenkovic, O. (2008).Data storage and processing in cells: an information theoretic approach. In Advances in Information Recording (DIMACS Ser. Discrete Math. Theoret. Comput. Sci.73), American Mathematical Society, Providence, RI, pp.105–146.Google Scholar

[15]Mitchell, D. and Bridge, R. (2006).A test of chargaff's second rule.Biochem. Biophys. Res. Commun. 340, 90–94.CrossRef Google Scholar PubMed

[16]Peng, C.-K.et al. (1992).Long-range correlations in nucleotide sequences.Nature 356, 168–170.CrossRef Google Scholar PubMed

[17]Prabhu, V. V. (1993).Symmetry observations in long nucleotide sequences.Nucleic Acids Res. 21, 2797–2800.Google Scholar

[18]Richardson, E. J. and Watson, M. (2013).The automatic annotation of bacterial genomes.Briefings Bioinformatics 14, 1–12.Google Scholar

[19]Rudner, R., Karkas, J. D. and Chargaff, E. (1968).Separation of B. subtilis DNA into complementary strands. 3. Direct analysis.Proc. Nat. Acad. Sci. USA 60, 921–922.CrossRef Google Scholar PubMed

[20]Zhang, S.-H. and Huang, Y.-Z. (2010).Limited contribution of stem-loop potential to symmetry of single-stranded genomic DNA.Bioinformatics 26, 478–485.CrossRef Google Scholar PubMed

[21]Zielinski, J. S., Bouaynaya, N., Schonfeld, D. and O'Neill, W. (2008).Time-dependent ARMA modeling of genomic sequences.BMC Bioinformatics 9, S14.Google Scholar

Article contents

A stationary distribution associated to a set of laws whose initial states are grouped into classes. An application in genomics

Abstract

Keywords

MSC classification

Access options

Article purchase

Temporarily unavailable

References

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests