Published online by Cambridge University Press: 14 July 2016
Given a finite collection of strings of letters from a fixed alphabet, it is of interest, in the contexts of data compression and DNA sequencing, to find the length of the shortest string which contains each of the given strings as a consecutive substring. In order to analyze the average behavior of the optimal superstring length, substrings of specified lengths are considered with the letters selected independently at random. An asymptotic expression is obtained for the savings from compression, i.e. the difference between the uncompressed (concatenated) length and the optimal superstring length.
Research supported by NSF grant DMS-9206139.