Published online by Cambridge University Press: 15 July 2002
In this paper methods and results related to the notion of minimal forbidden words are applied to the fragment assembly problem. The fragment assembly problem can be formulated, in its simplest form, as follows: reconstruct a word w from a given set I of substrings (fragments) of a word w. We introduce an hypothesis involving the set of fragments I and the maximal length m(w) of the minimal forbidden factors of w. Such hypothesis allows us to reconstruct uniquely the word w from the set I in linear time. We prove also that, if w is a word randomly generated by a memoryless source with identical symbol probabilities, m(w) is logarithmic with respect to the size of w. This result shows that our reconstruction algorithm is suited to approach the above problem in several practical applications e.g. in the case of DNA sequences.