Hostname: page-component-78c5997874-t5tsf Total loading time: 0 Render date: 2024-11-04T19:47:26.452Z Has data issue: false hasContentIssue false

Efficient validation and constructionof border arrays and validationof string matching automata

Published online by Cambridge University Press:  04 December 2008

Jean-Pierre Duval
Affiliation:
University of Rouen, LITIS EA 4108, Avenue de l'Université, Technopôle du Madrillet, 76801 Saint-Étienne-du-Rouvray Cedex, France; Jean-Pierre.Duval@univ_rouen.fr; Thierry.Lecroq@univ_rouen.fr; Arnaud.Lefebvre@univ_rouen.fr
Thierry Lecroq
Affiliation:
University of Rouen, LITIS EA 4108, Avenue de l'Université, Technopôle du Madrillet, 76801 Saint-Étienne-du-Rouvray Cedex, France; Jean-Pierre.Duval@univ_rouen.fr; Thierry.Lecroq@univ_rouen.fr; Arnaud.Lefebvre@univ_rouen.fr
Arnaud Lefebvre
Affiliation:
University of Rouen, LITIS EA 4108, Avenue de l'Université, Technopôle du Madrillet, 76801 Saint-Étienne-du-Rouvray Cedex, France; Jean-Pierre.Duval@univ_rouen.fr; Thierry.Lecroq@univ_rouen.fr; Arnaud.Lefebvre@univ_rouen.fr
Get access

Abstract

We present an on-line linear time and space algorithmto check if an integer array f is the border array of at least one string w built on a boundedor unbounded size alphabet Σ. First of all, we show a bijection between the border array of a string w and the skeleton of the DFA recognizing Σ*ω, called a string matching automaton (SMA).Different strings can have the same border array but the originality of the presented method is that the correspondence between a border array anda skeleton of SMA is independent from the underlying strings. This enables to design algorithms for validating and generating border arrays that outperform existing ones.The validating algorithm lowers the delay (maximal number of comparisons onone element of the array) from O(|w|) to 1 + min{|Σ|,1 + log2|ω|}compared to existing algorithms.We then give results on the numbers of distinct border arrays depending on the alphabet size.We also present an algorithm that checks if a given directed unlabeled graph G is the skeleton of aSMA on an alphabet of size s in linear time.Along the process the algorithm can build one string w for which G is the SMA skeleton.

Type
Research Article
Copyright
© EDP Sciences, 2008

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

A.V. Aho, J.E. Hopcroft and J.D. Ullman, The design and analysis of computer algorithms. Addison-Wesley (1974).
M. Crochemore, C. Hancart and T. Lecroq, Algorithms on Strings. Cambridge University Press (2007).
Duval, J.-P., Lecroq, T. and Lefebvre, A., Border array on bounded alphabet. J. Autom. Lang. Comb. 10 (2005) 5160.
Franěk, F., Gao, S., Lu, W., Ryan, P.J., Smyth, W.F., Sun, Y. and Yang, L., Verifying a border array in linear time. J. Combin. Math. Combin. Comput. 42 (2002) 223236.
C. Hancart, Analyse exacte et en moyenne d'algorithmes de recherche d'un motif dans un texte. Ph.D. thesis. Université Paris 7, France (1993).
Knuth, D.E., Morris, J.H. and Pratt Jr, V.R., Fast pattern matching in strings. SIAM J. Comput. 6 (1977) 323350. CrossRef
Moore, D., Smyth, W.F. and Miller, D., Counting distinct strings. Algorithmica 23 (1999) 113. CrossRef
J.H. Morris and V.R. Pratt Jr, A linear pattern-matching algorithm. Technical Report 40, University of California, Berkeley (1970).
M. Naylor, Abacaba-dabacaba. http://www.ac.wwu.edu/ mnaylor/abacaba/abacaba.html.
I. Simon, String matching algorithms and automata, in Proceedings of the First South American Workshop on String Processing, edited by R. Baeza-Yates and N. Ziviani, Belo Horizonte, Brazil (1993) 151–157
W.F. Smyth, Computing Pattern in Strings. Addison Wesley Pearson (2003).