Hostname: page-component-cd9895bd7-gbm5v Total loading time: 0 Render date: 2024-12-24T04:01:25.457Z Has data issue: false hasContentIssue false

Russian morphology: An engineering approach

Published online by Cambridge University Press:  12 September 2008

Andrei Mikheev
Affiliation:
HCRC, Language Technology Group, University of Edinburgh2 Buccleuch Place, Edinburgh EH8 9LW, Scotland, UK e-mail: [email protected]
Liubov Liubushkina
Affiliation:
Institute for Informatics Problems (IPI RAN), Russian Academy of Sciences 30/6 Vavilova str., Moscow 117311, Russia e-mail: [email protected]

Abstract

Morphological analysis, which is at the heart of the processing of natural language requires computationally effective morphological processors. In this paper an approach to the organization of an inflectional morphological model and its application for the Russian language are described. The main objective of our morphological processor is not the classification of word constituents, but rather an efficient computational recognition of morpho-syntactic features of words and the generation of words according to requested morpho-syntactic features. Another major concern that the processor aims to address is the ease of extending the lexicon. The templated word-paradigm model used in the system has an engineering flavour: paradigm formation rules are of a bottom-up (word specific) nature rather than general observations about the language, and word formation units are segments of words rather than proper morphemes. This approach allows us to handle uniformly both general cases and exceptions, and requires extremely simple data structures and control mechanisms which can be easily implemented as a finite-state automata. The morphological processor described in this paper is fully implemented for a substantial subset of Russian (more then 1,500,000 word-tokens – 95,000 word paradigms) and provides an extensive list of morpho-syntactic features together with stress positions for words utilized in its lexicon. Special dictionary management tools were built for browsing, debugging and extension of the lexicon. The actual implementation was done in C and C++, and the system is available for the MS-DOS, MS-Windows and UNIX platforms.

Type
Articles
Copyright
Copyright © Cambridge University Press 1995

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Anderson, S. R. (1982) Where's morphology. Linguistic Inquiry 13: 571612.Google Scholar
Anderson, S. R. (1992) A-Morphous Morphology. Cambridge University Press.CrossRefGoogle Scholar
Ashmanov, I. (1995) Grammar and style checker for Russian texts. Proceedings of Dialog'95 International Workshop on Computational Linguistics and its Applications.Kazan, Russia.Google Scholar
Baker, M. (1985) The mirror principle of morphosyntactic explanation. Linguistic Inquiry 16: 373415.Google Scholar
Belonogov, G. G. and Zelenkov, Y. G. (1989) An algorithm for morphological analysis of Russian words. Issues of Information Theory and Practice. Moskva (in Russian).Google Scholar
Bolshakov, I. A. (1990) A large Russian morphological vocabulary for IBM compatibles and methods of its compression. Proceedings of the 13th International Conference on Computational Linguistics COLING-90.Helsinki, Finland.CrossRefGoogle Scholar
Carter, D. 1995 Rapid development of morphological descriptions for full language processing systems. Proceedings of the 7th Conference of the European Chapter of the Association for Computational Linguistics EACL-95.Dublin, Ireland.CrossRefGoogle Scholar
Karttunen, L. (1983) KIMMO: a general morphological processor. In Dalrymple, et al. , (eds.), Technical Report, Center for Study of Language and Information, Stanford University.Google Scholar
Karttunen, L., Koskenniemi, K. and Kaplan, R. (1987) A compiler for two-level phonological rules. In Dalrymple, et al. (eds.), Texas Linguistic Forum 22, Department of Linguistics, University of Texas at Austin.Google Scholar
Knuth, D. (1973) The Art of Computer Programming. Volume 3: Sorting and Searching. Reading, Mass.: Addison-Wesley.Google Scholar
Koskenniemi, K. (1985) An application of the two-level model to Finnish. In Karlsson, , (eds.), Computational Morphosyntax. University of Helsinki, Finland.Google Scholar
Lazurskiy, A. A. 1990 Morphological component for linguistic processors. Proceedings of the 2nd Annual Conference of the Association for Artificial Intelligence of the USSR. (In Russian.)Google Scholar
Lieber, R. (1980) On the Organization of the Lexicon. PhD Theses, MIT. (Distributed by Indiana University Linguistics Club.)Google Scholar
Malkov, M. G., Volkova, I. A. and Gratsianova, T. Y. (1983) Linguistic Processor ‘TULIPS-2’: morphological component. In Development and Application of Linguistic Processors. Novosibirsk VCSO AN USSR. (In Russian.)Google Scholar
Matthews, P. H. (1972) Inflectional Morphology: A theoretical study based on aspects of Latin verb conjugation. Cambridge Studies in Linguistics. Cambridge: Cambridge University Press.Google Scholar
Matthews, P. H. (1974) Morphology: An introduction to the theory of word structure. Cambridge Textbooks in Linguistics. Cambridge: Cambridge University Press.Google Scholar
Mikheev, A. S. (1988) Issues in development of large morphological dictionaries. Simulation and Artificial Intelligence. Moskva MIREA. (In Russian.)Google Scholar
Mikheev, A. S. 1990 A system for conceptual knowledge extraction from Russian texts. Proceedings of the 2nd Annual Conference of the Association for Artificial Intelligence of the USSR.. (In Russian.)Google Scholar
Mikheev, A. S. (1991) Multipurpose Russian language authoring system MARTINA. Models and Systems for Knowledge Representation. Moskva MIREA. (In Russian.)Google Scholar
Mikheev, A. S., Liubushkina, L. A. and Freidlin, J. I. (1991) Russian Morphological Dictionary: System Reference Manual. Internal Publication of the Institute for Informatics Issues. Moskva, Academy of Sciences. (In Russian.)Google Scholar
Ozhegov, S. I. (1953) Dictionary of Russian Language. Moskva. (In Russian.)Google Scholar
Ritche, G. D., Russel, G. J., Black, A. W. and Pulman, S. G. (1992) Computational Morphology: Practical mechanism for the English Lexicon. Cambridge, Mass.: A Bradford Book, MIT Press.Google Scholar
Segalovich, I. S. 1995 Indexing of large Russian texts with a dictionary built around the sparse hash table. Proceedings of Dialog'95 International Workshop on Computational Linguistics and its Applications.Kazan, Russia.Google Scholar
Shteinfeldt, E. A. (1963) Frequency Dictionary of Contemporary Russian Language. Tallin. NIIP Estonia. (In Russian.)Google Scholar
Simpson, J. and Withgott, M. (1986) Pronominal clitic clusters and templates. In Borer, (eds.). Somin, N. V., Liubushkina, L. A., Solovieva, N. S., Mikheev, A. S. and Freidlin, J. I. (1990) LOG - an NL dialogue program. Proceedings of the 2nd Annual Conference of the Association for Artificial Intelligence of the USSR. Pereyaslavl-Zalessky. (In Russian.)Google Scholar
Spencer, A. (1991) Morphological Theory: An Introduction to Word Structure in Generative Grammar. Cambridge: Cambridge University Press.Google Scholar
Stump, G. T. (1992) On theoretical status of position class restrictions on inflectional affixes. In Booji, G. and van Marle, J., (eds.), Yearbook of Morphology 221–241. Kluwer Academic Press.Google Scholar
Tzokermann, E. and Liberman, M. 1990 A finite-state morphological processor for Spanish. Proceedings of the 13th International Conference on Computational Linguistics COLING-90.Helsinki, Finland.CrossRefGoogle Scholar
Willams, E. (1981) Argument structure and morphology. Linguistic Review 1: 81114.Google Scholar
Young, R. and Morgan, W. (1980) The Navajo Language. University of New Mexico Press.Google Scholar
Zaleznyak, A. A. (1977) A grammatical dictionary of Russian language. Inflectioning Moskva. ‘Russkiy Yazik’ (In Russian.)Google Scholar