Strengths and weaknesses of finite-state technology: a case study in morphological grammar development

SHULY WINTNER

doi:10.1017/S1351324907004676

Strengths and weaknesses of finite-state technology: a case study in morphological grammar development

Published online by Cambridge University Press: 01 October 2008

SHULY WINTNER

Show author details

SHULY WINTNER*: Affiliation:
Department of Computer Science, University of Haifa, 31905 Haifa, Israel e-mail: [email protected]

Article contents

Abstract
References

Get access

Rights & Permissions

Abstract

Finite-state technology is considered the preferred model for representing the phonology and morphology of natural languages. The attractiveness of this technology for natural language processing stems from four sources: modularity of the design, due to the closure properties of regular languages and relations; the compact representation that is achieved through minimization; efficiency, which is a result of linear recognition time with finite-state devices; and reversibility, resulting from the declarative nature of such devices. However, when wide-coverage morphological grammars are considered, finite-state technology does not scale up well, and the benefits of this technology can be overshadowed by the limitations it imposes as a programming environment for language processing. This paper investigates the strengths and weaknesses of existing technology, focusing on various aspects of large-scale grammar development. Using a real-world case study, we compare a finite-state implementation with an equivalent Java program with respect to ease of development, modularity, maintainability of the code, and space and time efficiency. We identify two main problems, abstraction and incremental development, which are currently not addressed sufficiently well by finite-state technology, and which we believe should be the focus of future research and development.

Type: Papers
Information: Natural Language Engineering , Volume 14 , Issue 4 , October 2008 , pp. 457 - 469

DOI: https://doi.org/10.1017/S1351324907004676 [Opens in a new window]
Copyright: Copyright © Cambridge University Press 2007

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Article purchase

Temporarily unavailable

References

Amtrup, J. W. (2003) Morphology in machine translation systems: efficient integration of finite state transducers and feature structure descriptions. Machine Translation 18 (3): 217–238.CrossRef Google Scholar

Beesley, K. R. (1996) Arabic finite-state morphological analysis and generation. In Proceedings of COLING-96, the 16th International Conference on Computational Linguistics, Copenhagen.CrossRef Google Scholar

Beesley, K. R. (1998) Arabic morphology using only finite-state operations. In M. Rosner (eds.), Proceedings of the Workshop on Computational Approaches to Semitic languages, pp. 50–57, Montreal, Quebec. COLING-ACL'98.CrossRef Google Scholar

Beesley, K. R. and Karttunen, L. (2003) Finite-State Morphology: Xerox Tools and Techniques. Stanford: CSLI.Google Scholar

Buckwalter, T. (2004) Buckwalter Arabic Morphological Analyzer Version 2.0. Philadelphia: Linguistic Data Consortium.Google Scholar

Carrasco, R. C. and Forcada, M. L. (2002) Incremental construction and maintenance of minimal finite-state automata. Computational Linguistics 28 (2): 207–216.CrossRef Google Scholar

Chanod, J.-P. and Tapanainen, P. (1996). A robust finite-state grammar for French. In ESSLLI'96 Workshop on Robust Parsing, pp. 16–25, Prague.Google Scholar

Cohen-Sygal, Y. and Wintner, S. (2005) XFST2FSA: comparing two finite-state toolboxes. In Proceedings of the ACL-2005 Workshop on Software, Ann Arbor, MI.CrossRef Google Scholar

Cohen-Sygal, Y. and Wintner, S. (2006) Finite-state registered automata for non-concatenative morphology. Computational Linguistics 32 (1): 49–82.CrossRef Google Scholar

Daciuk, J., Mihov, S., Watson, B. W. and Watson, R. E. (2000) Incremental construction of minimal acyclic finite-state automata. Computational Linguistics 26 (1): 3–16.CrossRef Google Scholar

Forsberg, M. and Ranta, A. (2004) Functional morphology. In Proceedings of the Ninth ACM SIGPLAN International Conference on Functional Programming (ICFP'04), pp. 213–223, New York: AACM Press.CrossRef Google Scholar

Görz, G. and Paulus, D. (1988) A finite state approach to German verb morphology. In Proceedings of the 12th Conference on Computational Linguistics (COLING-88), pp. 212–215, Budapest.CrossRef Google Scholar

Holzer, M. and Kutrib, M. (2002) State complexity of basic operations on nondeterministic finite automata. In Implementation and Application of Automata (CIAA '02), pp. 151–160.Google Scholar

Huet, G. (2005). A functional toolkit for morphological and phonological processing, application to a Sanskrit tagger. Journal of Functional Programming 15 (4): 573–614.CrossRef Google Scholar

Itai, A., Wintner, S. and Yona, S. (2006) A computational lexicon of contemporary Hebrew. In Proceedings of The Fifth International Conference on Language Resources and Evaluation (LREC-2006), Genoa, Italy.Google Scholar

Johnson, C. D. (1972) Formal Aspects of Phonological Description. Mouton, The Hague.CrossRef Google Scholar

Kanthak, S. and Ney, H. (2004) FSA: an efficient and flexible C++ toolkit for finite state automata using on-demand computation. In Proceedings of the 42nd Annual Meeting of the Association for Computational Linguistics (ACL 2004), pp. 510–517.CrossRef Google Scholar

Kaplan, R. M. and Kay, M. (1994) Regular models of phonological rule systems. Computational Linguistics 20 (3): 331–378.Google Scholar

Karttunen, L. (1995). The replace operator. In Proceedings of the Annual Meeting of the Association for Computational Linguistics, pp. 16–23.CrossRef Google Scholar

Koskenniemi, K. (1983). Two-Level Morphology: A General Computational Model for Word-Form Recognition and Production. The Department of General Linguistics, University of Helsinki.Google Scholar

Mohri, M. (1997) Finite-state transducers in language and speech processing. Computational Linguistics 23 (2): 269–312.Google Scholar

Mohri, M. (2000) Minimization algorithms for sequential transducers. Theoretical Computer Science 234: 177–201.CrossRef Google Scholar

Mohri, M., Pereira, F., and Riley, M. (2000) The design principles of a weighted finite-state transducer library. Theoretical Computer Science 231 (1): 17–32.CrossRef Google Scholar

Oflazer, K. (1994) Two-level description of Turkish morphology. Literary and Linguistic Computing 9 (2): 137–48.CrossRef Google Scholar

Roche, E. and Schabes, Y. (eds.) (1997) Finite-State Language Processing. Language, Speech and Communication. Cambridge, MA: MIT Press.CrossRef Google Scholar

Schmid, H. (2005) A programming language for finite state transducers. In Proceedings of the 5th Workshop on Finite State Methods in Natural Language Processing, Helsinki, Finland. University of Helsinki.Google Scholar

Shapira, M. and Choueka, Y. (1964) Mechanographic analysis of Hebrew morphology: possibilities and achievements. Leshonenu 28 (4): 354–372. In Hebrew.Google Scholar

Silberztein, M. (1993) Dictionnaires électroniques et analyse automatique de textes : le système INTEX Paris: Masson.Google Scholar

Trost, H. (1990) The application of two-level morphology to non-concatenative German morphology. In COLING-90, pp. 371–376.Google Scholar

van Noord, G. and Gerdemann, D. (2001) An extendible regular expression compiler for finite-state approaches in natural language processing. In Boldt, O. and Jürgensen, H. (eds.), Automata Implementation, number 2214. Lecture Notes in Computer Science. Springer.Google Scholar

Wintner, S. (2007) Finite-state technology as a programming environment. In Gelbukh, A. (eds.), Proceedings of the Conference on Computational Linguistics and Intelligent Text Processing (CICLing-2007), vol. 4394. Lecture Notes in Computer Science, pp. 97–106. Berlin and Heidelberg: Springer.CrossRef Google Scholar

Yona, S. and Wintner, S. (2008). A finite-state morphological grammar of Hebrew. Natural Language Engineering.CrossRef Google Scholar

Zajac, R. (1998) Feature structures, unification and finite-state transducers. In FSMNLP'98: The International Workshop on Finite-state Methods in Natural Language Processing, Ankara, Turkey.Google Scholar

Article contents

Strengths and weaknesses of finite-state technology: a case study in morphological grammar development

Abstract

Access options

Article purchase

Temporarily unavailable

References

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests