Hostname: page-component-78c5997874-4rdpn Total loading time: 0 Render date: 2024-11-08T19:30:29.875Z Has data issue: false hasContentIssue false

A stationary distribution associated to a set of laws whose initial states are grouped into classes. An application in genomics

Published online by Cambridge University Press:  21 June 2016

Servet Martínez*
Affiliation:
Universidad de Chile
*
* Postal address: Departamento Ingeniería Matemática and Centro Modelamiento Matemático, Universidad de Chile, UMI 2807 CNRS, Casilla 170-3, Correo 3, Santiago, Chile. Email address: [email protected]

Abstract

Let I be a finite set and S be a nonempty strict subset of I which is partitioned into classes, and let C(s) be the class containing sS. Let (Ps: sS) be a family of distributions on IN, where each Ps applies to sequences starting with the symbol s. To this family, we associate a class of distributions P(π) on IN which depends on a probability vector π. Our main results assume that, for each sS, Ps regenerates with distribution Ps' when it encounters s' ∈ SC(s). From semiregenerative theory, we determine a simple condition on π for P(π) to be time stationary. We give a similar result for the following more complex model. Once a symbol s' ∈ SC(s) has been encountered, there is a decision to be made: either a new region of type C(s') governed by Ps' starts or the region continues to be a C(s) region. This decision is modeled as a random event and its probability depends on s and s'. The aim in studying these kinds of models is to attain a deeper statistical understanding of bacterial DNA sequences. Here I is the set of codons and the classes (C(s): sS) identify codons that initiate similar genomic regions. In particular, there are two classes corresponding to the start and stop codons which delimit coding and noncoding regions in bacterial DNA sequences. In addition, the random decision to continue the current region or begin a new region of a different class reflects the well-known fact that not every appearance of a start codon marks the beginning of a new coding region.

Type
Research Papers
Copyright
Copyright © Applied Probability Trust 2016 

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

[1]Albrecht-Buehler, G. (2006).Asymptotically increasing compliance of genomes with Chargaff's second parity rules through inversions and inverted transpositions.Proc. Nat. Acad. Sci. USA 103, 1782817833.CrossRefGoogle ScholarPubMed
[2]Allegrini, P., Buiatti, M., Grigolini, P. and West, B. J. (1998).Fractional Brownian motion as a nonstationary process: an alternative paradigm for DNA sequences.Phys. Rev. E 57, 45584567.Google Scholar
[3]Asmussen, S. (2003).Applied Probability and Queues, 2nd edn.Springer, New York.Google Scholar
[4]Bell, S. J. and Forsdyke, D. R. (1999).Deviations from Chargaff's second parity rule correlate with direction of transcription.J. Theoret. Biol. 197, 6376.CrossRefGoogle ScholarPubMed
[5]Bouaynaya, N. and Schonfeld, D. (2007).Non-stationary analysis of DNA sequences. In Proc. IEEE Statistical Signal Processing Workshop, IEEE, New York, pp.200204.Google Scholar
[6]Bouaynaya, N. and Schonfeld, D. (2008).Emergence of new structure from non-stationary analysis of genomic sequences. In Proc. IEEE International Workshop on Genomic Signal Processing and Statistics, IEEE, New York, pp.14.Google Scholar
[7]Bouaynaya, N. and Schonfeld, D. (2008).Nonstationary analysis of coding and noncoding regions in nucleotide sequences.IEEE J. Selected Topics Signal Process. 2, 357364.Google Scholar
[8]Forsdyke, D. R. (2011).Evolutionary Bioinformatics, 2nd edn.Springer, New York.Google Scholar
[9]Hart, A. and Martínez, S. (2011).Statistical testing of Chargaff's second parity rule in bacterial genome sequences.Stoch. Models 27, 272317.Google Scholar
[10]Hart, A. and Martínez, S. (2014).Markovianness and conditional independence in annotated bacterial DNA.Statist. Appl. Genetics Molec. Biol. 13, 693716.Google Scholar
[11]Hart, A. G., Martínez, S. and Videla, L. (2006).A simple maximization model inspired by algorithms for the organization of genetic candidates in bacterial DNA.Adv. Appl. Prob. 38, 10711097.Google Scholar
[12]Karlin, S. and Brendel, V. (1993).Patchiness and correlations in DNA sequences.Science 259, 677680.CrossRefGoogle ScholarPubMed
[13]Li, W. and Kaneko, K. (1992).Long-range correlation and partial 1/f α spectrum in a noncoding DNA sequence.Europhys. Lett. 17, 655660.Google Scholar
[14]Milenkovic, O. (2008).Data storage and processing in cells: an information theoretic approach. In Advances in Information Recording (DIMACS Ser. Discrete Math. Theoret. Comput. Sci.73), American Mathematical Society, Providence, RI, pp.105146.Google Scholar
[15]Mitchell, D. and Bridge, R. (2006).A test of chargaff's second rule.Biochem. Biophys. Res. Commun. 340, 9094.CrossRefGoogle ScholarPubMed
[16]Peng, C.-K.et al. (1992).Long-range correlations in nucleotide sequences.Nature 356, 168170.CrossRefGoogle ScholarPubMed
[17]Prabhu, V. V. (1993).Symmetry observations in long nucleotide sequences.Nucleic Acids Res. 21, 27972800.Google Scholar
[18]Richardson, E. J. and Watson, M. (2013).The automatic annotation of bacterial genomes.Briefings Bioinformatics 14, 112.Google Scholar
[19]Rudner, R., Karkas, J. D. and Chargaff, E. (1968).Separation of B. subtilis DNA into complementary strands. 3. Direct analysis.Proc. Nat. Acad. Sci. USA 60, 921922.CrossRefGoogle ScholarPubMed
[20]Zhang, S.-H. and Huang, Y.-Z. (2010).Limited contribution of stem-loop potential to symmetry of single-stranded genomic DNA.Bioinformatics 26, 478485.CrossRefGoogle ScholarPubMed
[21]Zielinski, J. S., Bouaynaya, N., Schonfeld, D. and O'Neill, W. (2008).Time-dependent ARMA modeling of genomic sequences.BMC Bioinformatics 9, S14.Google Scholar