Forbidden Factors and Fragment Assembly

F. Mignosi; A. Restivo; M. Sciortino

doi:10.1051/ita:2001132

Forbidden Factors and Fragment Assembly

Published online by Cambridge University Press: 15 July 2002

F. Mignosi ,

A. Restivo and

M. Sciortino

Show author details

F. Mignosi: Affiliation:
University of Palermo, Dipartimento di Matematica ed Applicazioni, Via Archirafi 34, 90123 Palermo, Italy; ([email protected])
A. Restivo: Affiliation:
University of Palermo, Dipartimento di Matematica ed Applicazioni, Via Archirafi 34, 90123 Palermo, Italy; ([email protected])
M. Sciortino: Affiliation:
University of Palermo, Dipartimento di Matematica ed Applicazioni, Via Archirafi 34, 90123 Palermo, Italy; ([email protected])

Article contents

Abstract
References

Get access

Abstract

In this paper methods and results related to the notion of minimal forbidden words are applied to the fragment assembly problem. The fragment assembly problem can be formulated, in its simplest form, as follows: reconstruct a word w from a given set I of substrings (fragments) of a word w. We introduce an hypothesis involving the set of fragments I and the maximal length m(w) of the minimal forbidden factors of w. Such hypothesis allows us to reconstruct uniquely the word w from the set I in linear time. We prove also that, if w is a word randomly generated by a memoryless source with identical symbol probabilities, m(w) is logarithmic with respect to the size of w. This result shows that our reconstruction algorithm is suited to approach the above problem in several practical applications e.g. in the case of DNA sequences.

Keywords

Factor automaton minimal forbidden factor fragment assembly.

Type: Research Article
Information: RAIRO - Theoretical Informatics and Applications , Volume 35 , Issue 6: A tribute to Aldo de Luca , November 2001 , pp. 565 - 577

DOI: https://doi.org/10.1051/ita:2001132 [Opens in a new window]
Copyright: © EDP Sciences, 2001

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Article purchase

Temporarily unavailable

References

A.V. Aho, J.E. Hopcroft and J.D. Ullman, Data Structures and Algorithms. Addison Wesley, Reading, Mass (1983).

Béal, M.-P., Mignosi, F., Restivo, A. and Sciortino, M., Forbidden Words in Symbolic Dynamics. Adv. in Appl. Math. 25 (2000) 163-193. CrossRef

Blumer, A., Blumer, J., Ehrenfeucht, A., Haussler, D., Chen, M.T. and Seiferas, J., The smallest automaton recognizing the subwords of a text. Theoret. Comput. Sci. 40 (1985) 31-55. CrossRef

A. Carpi, A. de Luca and S. Varricchio, Words, univalent factors, and boxes. Acta Inform. (to appear).

M. Crochemore and C. Hancart, Automata for matching patterns, in Handbook of Formal Languages, Vol. 2, Chap. 9, edited by G. Rosenberg and A. Salomaan. Springer (1997) 399-462.

Crochemore, M., Mignosi, F. and Restivo, A., Automata and forbidden words. Inform. Process. Lett. 67 (1998) 111-117. CrossRef

M. Crochemore, F. Mignosi, A. Restivo and S. Salemi, Data compression using antidictionaries, in Proc. of the IEEE, Special Issue on Lossless Data Compression, Vol. 88, edited by J.A. Storer (2000) 1756-1768.

A. Frieze and B.V. Halldórsson, Optimal Sequencing by Hybridization in Rounds, in Proc. of RECOMB 2001, edited by T. Lengauer, D. Sankoff, S. Istrail, P. Pevzner and M. Waterman. ACM Press (2001) 141-148

D. Gusfield, Algorithms on strings, trees, and sequences: Computer science and computational biology. Cambridge University Press (1997).

Idury, R. and Waterman, M., A new algorithm for DNA sequence assembly. J. Comput. Biol. 2 (1995) 291-306. CrossRef

F. Mignosi and A. Restivo, Periodicity, in M. Lothaire, Algebraic Combinatorics on Words, Chap. 8. Cambridge University Press (to appear) 237-274. Also available at url: http://www-igm.univ-mlv.fr/ berstel/Lothaire/index.html

F. Mignosi, A. Restivo and M. Sciortino, Forbidden Factors and Fragment Assembly. Lecture Notes in Comput. Sci. (2001). Proceedings of DLT'01.

Mignosi, F., Restivo, A. and Sciortino, M., Words and Forbidden Factors. Theoret. Comput. Sci. 273 (2002) 99-117. CrossRef

F. Mignosi, A. Restivo, M. Sciortino and J. Storer, On Sequence Assembly, Technical Report cs-00-210. Brandeis University (2000).

S. Muthukrishnan and S.C. Sahinalp, Approximate nearest neighbors and sequence comparison with block operations. ACM Press (2000). Proceedings of STOC 2000.

Myers, G., Whole-Genome DNA Sequencing, IEEE Comput. Engrg. Sci. 3 (1999) 33-43. CrossRef

Peltola, H., Soderlund, H. and Ukkonen, E., SEQAID: A DNA Sequence Assembly Program Based on a Mathematical Model. Nucl. Acids Res. 12 (1984) 307-321. CrossRef

M. Peltola, H. Soderlund, J. Tarhio and E. Ukkonen, Algorithms for some string matching problems arising in molecular genetics, in Proc. of the 9th IFIP World Computer Congress (1983) 59-64.

P.A. Pevzner, H. Tang and M. Waterman, A New Approach Fragment Assembly in DNA Sequencing, in Proc. of RECOMB 2001, edited by T. Lengauer, D. Sankoff, S. Istrail, P. Pevzner and M. Waterman. ACM Press (2001) 141-148.

R. Shamir and D. Tsur, Large Scale Sequencing by Hybridization, in Proc. of RECOMB 2001, edited by T. Lengauer, D. Sankoff, S. Istrail, P. Pevzner and M. Waterman. ACM Press (2001) 269-278.

M.S. Waterman, Introduction to computational biology: Maps, sequences and genomes. Chapman & Hall (1995).

Article contents

Forbidden Factors and Fragment Assembly

Abstract

Keywords

Access options

Article purchase

Temporarily unavailable

References

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests