Hostname: page-component-cd9895bd7-dzt6s Total loading time: 0 Render date: 2024-12-25T20:33:08.320Z Has data issue: false hasContentIssue false

Detecting light verb constructions across languages

Published online by Cambridge University Press:  15 July 2019

István Nagy T.
Affiliation:
Black Swan Hungary, Budapest, Hungary
Anita Rácz
Affiliation:
Institute of Informatics, University of Szeged, Szeged, Hungary
Veronika Vincze*
Affiliation:
MTA-SZTE Research Group on Artificial Intelligence, Szeged, Hungary
*
*Corresponding author. Email: [email protected]

Abstract

Light verb constructions (LVCs) are verb and noun combinations in which the verb has lost its meaning to some degree and the noun is used in one of its original senses, typically denoting an event or an action. They exhibit special linguistic features, especially when regarded in a multilingual context. In this paper, we focus on the automatic detection of LVCs in raw text in four different languages, namely, English, German, Spanish, and Hungarian. First, we analyze the characteristics of LVCs from a linguistic point of view based on parallel corpus data. Then, we provide a standardized (i.e., language-independent) representation of LVCs that can be used in machine learning experiments. After, we experiment on identifying LVCs in different languages: we exploit language adaptation techniques which demonstrate that data from an additional language can be successfully employed in improving the performance of supervised LVC detection for a given language. As there are several annotated corpora from several domains in the case of English and Hungarian, we also investigate the effect of simple domain adaptation techniques to reduce the gap between domains. Furthermore, we combine domain adaptation techniques with language adaptation techniques for these two languages. Our results show that both out-domain and additional language data can improve performance. We believe that our language adaptation method may have practical implications in several fields of natural language processing, especially in machine translation.

Type
Article
Copyright
© Cambridge University Press 2019

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Alonso Ramos, M. (2000). Verbos de apoyo, funciones léxicas y traducción automática. Revista de lexicografía 6, 155177.CrossRefGoogle Scholar
Alonso Ramos, M. (2004). Las construcciones con verbo de apoyo. Madrid: Visor Libros.Google Scholar
Al Saied, H., Constant, M. and Candito, M. (2017). The ATILF-LLF system for parseme shared task: A transition-based verbal multiword expression tagger. In Proceedings of the 13th Workshop on Multiword Expressions (MWE 2017), Valencia, Spain: Association for Computational Linguistics, pp. 127132.CrossRefGoogle Scholar
Bannard, C. (2007). A measure of syntactic flexibility for automatically identifying multiword expressions in corpora. In Proceedings of MWE 2007, Morristown, NJ, USA: Association for Computational Linguistics, pp. 18.Google Scholar
Belvin, R. S. (1993). The two causative haves are the two possessive haves. In Papers from the Fifth Student Conference in Linguistics, vol. 20, Cambridge: MITWPL, pp. 1934.Google Scholar
Berk, G., Erden, B. and Güngör, T. (2018). Deep-BGT at PARSEME shared task 2018: Bidirectional LSTM-CRF model for verbal multiword expression identification. In Proceedings of the Joint Workshop on Linguistic Annotation, Multiword Expressions and Constructions (LAW-MWE-CxG-2018), Santa Fe, New Mexico, USA: Association for Computational Linguistics, pp. 248253.Google Scholar
Blanco Escoda, X. (2000). Verbos soporte y clases de predicados en español. LEA 22, 99117.Google Scholar
Bohnet, B. (2010). Top accuracy and fast dependency parsing is not a contradiction. In Proceedings of Coling 2010, Beijing, China: Coling 2010 Organizing Committee, pp. 8997.Google Scholar
Boroş, T. and Burtica, R. (2018). GBD-NER at PARSEME shared task 2018: Multi-word expression detection using bidirectional long-short-term memory networks and graph-based decoding. In Proceedings of the Joint Workshop on Linguistic Annotation, Multiword Expressions and Constructions (LAW-MWE-CxG-2018), Santa Fe, New Mexico, USA: Association for Computational Linguistics, pp. 254260.Google Scholar
Boroş, T., Pipa, S., Barbu Mititelu, V. and Tufiş, D. (2017). A data-driven approach to verbal multiword expression detection. PARSEME Shared Task system description paper. In Proceedings of the 13th Workshop on Multiword Expressions (MWE 2017), Valencia, Spain: Association for Computational Linguistics, pp. 121126.CrossRefGoogle Scholar
Bosque, I. (2001). On the weight of light verb predicates. In Zagona, K., Maléln, E. and Herschenson, J. (eds), Features and Interfaces in Romance, Amsterdam: Benjamins, pp. 2338.CrossRefGoogle Scholar
Brants, S., Dipper, S., Eisenberg, P., Hansen-Schirra, S., König, E., Lezius, W., Rohrer, C., Smith, G. and Uszkoreit, H. (2004). TIGER: Linguistic interpretation of a German corpus. Research on Language and Computation 2(4), 597620.CrossRefGoogle Scholar
Buckingham, L. (2009). Las construcciones con verbo soporte en un corpus de especialidad. Frankfurt am Main – Bern – Bruxelles – New York – Wien: Peter Lang.Google Scholar
Bußmann, H. (2002). Lexikon der Sprachwissenschaft. Stuttgart: Alfred Kröner.Google Scholar
Butt, M. and Lahiri, A. (2013). Diachronic Pertinacity of Light Verbs. Lingua 135, 729.CrossRefGoogle Scholar
Calzolari, N., Fillmore, C., Grishman, R., Ide, N., Lenci, A., MacLeod, C. and Zampolli, A. (2002). Towards best practice for multiword expressions in computational lexicons. In Proceedings of LREC 2002, Las Palmas, Spain: European Language Resources Association (ELRA), pp. 1934– 1940.Google Scholar
Daniels, K. (1963). Substantivierungstendenzen in der deutschen Gegenwartssprache: Nominaler Ausbau des verbalen Denkkreises. Düsseldorf: Schwann.Google Scholar
Danlos, L. (2010). Extension de la notion de verbe support. In Nakamura, T., Laporte, E., Dister, A. and Fairon, C. (eds), Les Tables, La grammaire par le menu, Volume d’ hommage à Christian Leclère, Louvain: Presses Universitaires de Louvain, pp. 8190.Google Scholar
, Duden. (2006). Der Duden in 12 Bänden. Das Standardwerk zur deutschen Sprache: Duden 06. Das Aussprachewörterbuch: Unerlässlich für die richtige Aussprache. Betonung. Namen: Bd 6 (Duden Series Volume 6): Band 6. Gebundene Ausgabe, Mannheim: Bibliographisches Institut (F.A. Brockhaus).Google Scholar
é. Kiss, K. (2002). The Syntax of Hungarian. Cambridge: Cambridge University Press.CrossRefGoogle Scholar
Ehren, R., Lichte, T. and Samih, Y. (2018). Mumpitz at PARSEME shared task 2018: A bidirectional LSTM for the identification of verbal multiword expressions. In Proceedings of the Joint Workshop on Linguistic Annotation, Multiword Expressions and Constructions (LAW-MWE-CxG-2018), Santa Fe, New Mexico, USA: Association for Computational Linguistics, pp. 261267.Google Scholar
Fazly, A. and Stevenson, S. (2007). Distinguishing subtypes of multiword expressions using linguistically-motivated statistical measures. In Proceedings of MWE 2007, Prague, Czech Republic: Association for Computational Linguistics, pp. 916.Google Scholar
Fleischer, W., Helbig, G. and Lerchner, G. (2001). Kleine Einzyklopädie. Deutsche Sprache. Frankfurt am Main – Berlin – Bruxelles – New York – Wien: Peter Lang.Google Scholar
Hale, K. and Keyser, S.J. (2002). Prolegomenon to a Theory of Argument Structure. Cambridge: MIT Press.CrossRefGoogle Scholar
Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P. and Witten, I.H. (2009). The WEKA data mining software: An update. SIGKDD Explorations 11(1), 1018.CrossRefGoogle Scholar
Häusermann, J. (1977). Hauptprobleme der deutschen Phraseologie auf der Basis sowjetischer Forschungsergebnisse. Tübingen: M. Niemeyer.CrossRefGoogle Scholar
Heine, A. (2006). Funktionsverbgefüge in System, Text und korpusbasierter (Lerner-)Lexikographie. Frankfurt am Main: Peter Lang.Google Scholar
Helbig, G. and Buscha, J. (2001). Deutsche Grammatik. Ein Handbuch für den Ausländerunterricht. Berlin and München: Langenscheidt.Google Scholar
Hwang, J.D., Bhatia, A., Bonial, C., Mansouri, A., Vaidya, A., Xue, N. and Palmer, M. (2010). PropBank annotation of multilingual light verb constructions. In Proceedings of the Fourth Linguistic Annotation Workshop, Uppsala, Sweden: Association for Computational Linguistics, pp. 8290.Google Scholar
Kearns, K. (2002). Light verbs in English. Manuscript.Google Scholar
Kim, S.N. (2008). Statistical Modeling of Multiword Expressions. PhD thesis, Melbourne: University of Melbourne.Google Scholar
Klyueva, N., Doucet, A. and Straka, M. (2017). Neural networks for multi-word expression detection. In Proceedings of the 13th Workshop on Multiword Expressions (MWE 2017), Valencia, Spain: Association for Computational Linguistics, pp. 6065.CrossRefGoogle Scholar
Kolesnikova, O. and Gelbukh, A. (2010). Supervised machine learning for predicting the meaning of verb-noun combinations in Spanish. In Advances in Soft Computing, Berlin – Heidelberg: Springer, pp. 196207.CrossRefGoogle Scholar
Krenn, B. (2008). Description of evaluation resource – German PP-verb data. In Proceedings of MWE 2008, Marrakech, Morocco: European Language Resources Association (ELRA), pp. 710.Google Scholar
Langer, S. (2005). A formal specification of support verb constructions. In Langer, S. and Schnorbusch, D. (eds), Semantik im Lexikon, Tübingen: Gunter Narr Verlag, pp. 179202.Google Scholar
de León, Leoni, J.A. (2014). Lexical-syntactic analysis model of Spanish multi-word expressions. In Nolan, B. and Periñán-Pascual, C. (eds), Language Processing and Grammars. The role of functionally oriented computational models, Amsterdam: Benjamins, pp. 3977.CrossRefGoogle Scholar
Maldonado, A., Han, L., Moreau, E., Alsulaimani, A., Chowdhury, K.D., Vogel, C. and Liu, Q. (2017). Detection of verbal multi-word expressions via conditional random fields with syntactic dependency features and semantic re-ranking. In Proceedings of the 13th Workshop on Multiword Expressions (MWE 2017), Valencia, Spain: Association for Computational Linguistics, pp. 114120.CrossRefGoogle Scholar
Marimon, M., Fisas, B., Bel, N., Arias, B., Vázquez, S., Vivaldi, J., Torner, S., Villegas, M. and Lorente, M. (2012). The IULA Treebank. In Proceedings of the Eight International Conference on Language Resources and Evaluation (LREC’ 12), Istanbul, Turkey: European Language Resources Association (ELRA), pp. 19201926.Google Scholar
McDonald, R., Nivre, J., Quirmbach-Brundage, Y., Goldberg, Y., Das, D., Ganchev, K., Hall, K., Petrov, S., Zhang, H., Täckström, O., Bedini, C., Bertomeu Castelló, N. and Lee, J. (2013). Universal dependency annotation for multilingual parsing. In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), Sofia, Bulgaria: Association for Computational Linguistics, pp. 9297.Google Scholar
Mel’čuk, I. (2005). Verbes supports sans peine. Lingvisticae Investigationes 27(2), 203217.Google Scholar
Mel’čuk, I. (1974). Esquisse d’un modèle linguistique du type “Sens<-> Texte”. In Problèmes actuels en psycholinguistique. Colloques inter. du CNRS, no. 206, Paris: CNRS, pp. 291317.Google Scholar
Mel’čuk, I., Clas, A. and Polguère, A. (1995). Introduction à lexicologie explicative et combinatoire. Louvain-la-Neuve, France: Duculot.Google Scholar
Meyers, A., Reeves, R. and Macleod, C. (2004). NP-External arguments: A study of argument sharing in English. In Proceedings of MWE 2004, Barcelona, Spain: Association for Computational Linguistics, pp. 96103.Google Scholar
Moreau, E., Alsulaimani, A., Maldonado, A. and Vogel, C. (2018). CRF-Seq and CRF-DepTree at PARSEME shared task 2018: Detecting verbal MWEs using sequential and dependency-based approaches. In Proceedings of the Joint Workshop on Linguistic Annotation, Multiword Expressions and Constructions (LAW-MWE-CxG-2018), Santa Fe, New Mexico, USA: Association for Computational Linguistics, pp. 241247.Google Scholar
Nagy, T. I. and Vincze, V. (2011). Identifying verbal collocations in Wikipedia articles. In Proceedings of the 14th International Conference on Text, Speech and Dialogue, Berlin, Heidelberg: Springer-Verlag, pp. 179186.CrossRefGoogle Scholar
Nagy, T. I., Vincze, V. and Berend, G. (2011). Domain-dependent identification of multiword expressions. In Proceedings of RANLP 2011, Hissar, Bulgaria: RANLP 2011 Organising Committee, pp. 622627.Google Scholar
Nagy, T. I., Vincze, V. and Farkas, R. (2013). Full-coverage identification of English light verb constructions. In Proceedings of the Sixth International Joint Conference on Natural Language Processing, Nagoya, Japan: Asian Federation of Natural Language Processing, pp. 329337.Google Scholar
Nerima, L., Foufi, V. and Wehrli, E. (2017). Parsing and MWE detection: Fips at the PARSEME shared Tas. In Proceedings of the 13th Workshop on Multiword Expressions (MWE 2017), Valencia, Spain: Association for Computational Linguistics, pp. 5459.CrossRefGoogle Scholar
Quinlan, R. (1993). C4.5: Programs for Machine Learning. San Mateo, CA: Morgan Kaufmann Publishers.Google Scholar
Rácz, A., Nagy, T. I. and Vincze, V. (2014). 4FX: Light verb constructions in a multilingual parallel corpus. In Calzolari, N., Choukri, K., Declerck, T., Loftsson, H., Maegaard, B., Mariani, J., Moreno, A., Odijk, J. and Piperidis, S. (eds), Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC’ 14), Reykjavik, Iceland: European Language Resources Association (ELRA).Google Scholar
Ramisch, C., Cordeiro, S.R., Savary, A., Vincze, V., Barbu Mititelu, V., Bhatia, A., Buljan, M., Candito, M., Gantar, P., Giouli, V., Güngör, T., Hawwari, A., Iñurrieta, U., Kovalevskaitė, J., Krek, S., Lichte, T., Liebeskind, C., Monti, J., Parra Escartín, C., QasemiZadeh, B., Ramisch, R., Schneider, N., Stoyanova, I., Vaidya, A. and Walsh, A. (2018). Edition 1.1 of the PARSEME Shared Task on Automatic Identification of Verbal Multiword Expressions. In Proceedings of the Joint Workshop on Linguistic Annotation, Multiword Expressions and Constructions (LAW-MWE-CxG-2018), Santa Fe, New Mexico, USA: Association for Computational Linguistics, pp. 222240.Google Scholar
Real Academia Española, Asociación de Academias de la Lengua Española. (2009). Nueva Gramätica de la Lengua Española. Madrid: Espasa Libros.Google Scholar
Sag, I.A., Baldwin, T., Bond, F., Copestake, A. and Flickinger, D. (2002). Multiword expressions: A pain in the neck for NLP. In Proceedings of CICLing 2002, Berlin – Heidelberg – New York: Springer, pp. 115.Google Scholar
Sanromán Vilas, B. (2009). Towards a semantically oriented selection of the values of Oper1. The case of golpeblow’ in Spanish. In Proceedings of MTT 2009, Montreal, Canada: Université de Montréal, pp. 327337.Google Scholar
Savary, A., Ramisch, C., Cordeiro, S., Sangati, F., Vincze, V., QasemiZadeh, B., Candito, M., Cap, F., Giouli, V., Stoyanova, I. and Doucet, A. (2017). The PARSEME shared task on automatic identification of verbal multiword expressions. In Proceedings of the 13th Workshop on Multiword Expressions (MWE 2017), Valencia, Spain: Association for Computational Linguistics, pp. 3147.CrossRefGoogle Scholar
Simkó, K.I., Kovács, V. and Vincze, V. (2017). USzeged: Identifying verbal multiword expressions with POS tagging and parsing techniques. In Proceedings of the 13th Workshop on Multiword Expressions (MWE 2017), Valencia, Spain: Association for Computational Linguistics, pp. 4853.CrossRefGoogle Scholar
Steinberger, R., Pouliquen, B., Widiger, A., Ignat, C., Erjavec, T. and Tufiş, D. (2006). The JRC-Acquis: A multilingual aligned parallel corpus with 20+ languages. In Proceedings of LREC 2006, Genova, Italy: European Language Resources Association (ELRA), pp. 21422147.Google Scholar
Stevenson, S., Fazly, A. and North, R. (2004). Statistical measures of the semi-productivity of light verb constructions. In MWE 2004, Barcelona, Spain: Association for Computational Linguistics, pp. 18.Google Scholar
Stodden, R., QasemiZadeh, B. and Kallmeyer, L. (2018). TRAPACC and TRAPACCS at PARSEME shared task 2018: Neural transition tagging of verbal multiword expressions. In Proceedings of the Joint Workshop on Linguistic Annotation, Multiword Expressions and Constructions (LAW-MWE-CxG-2018), Santa Fe, New Mexico, USA: Association for Computational Linguistics, pp. 268274.Google Scholar
Surdeanu, M., Johansson, R., Meyers, A., Màrquez, L. and Nivre, J. (2008). The CoNLL-2008 shared task on joint parsing of syntactic and semantic dependencies. In Proceedings of the Twelfth Conference on Computational Natural Language Learning, Association for Computational Linguistics, pp. 159177.CrossRefGoogle Scholar
Szarvas, G., Vincze, V., Farkas, R., Móra, G. and Gurevych, I. (2012). Cross-genre and cross-domain detection of semantic uncertainty. Computational Linguistics – Special Issue on Modality and Negation 38(2), 335367.CrossRefGoogle Scholar
Tan, Y.F., -Y, Kan M.. and Cui, H. (2006). Extending corpus-based identification of light verb constructions using a supervised learning framework. In Proceedings of MWE 2006, Trento, Italy: ACL, pp. 4956.Google Scholar
Tu, Y. and Roth, D. (2011). Learning English Light Verb Constructions: Contextual or Statistical. In Proceedings of MWE 2011, Portland, Oregon, USA: Association for Computational Linguistics, pp. 3139.Google Scholar
Varga, L. (2014). Verbe support et noms prédicatifs à l’accusatif du hongrois. In Kakoyianni-Doa, F. (ed), Penser le Lexique-Grammaire; Perspectives actuelles, Paris: Honoré Champion, pp. 249261.Google Scholar
Vincze, V. (2011). Semi-Compositional Noun + Verb Constructions: Theoretical Questions and Computational Linguistic Analyses. PhD thesis, Szeged, Hungary: University of Szeged.Google Scholar
Vincze, V. (2012). Light verb constructions in the SzegedParalellFX English–Hungarian parallel corpus. In Proceedings of LREC 2012, Istanbul, Turkey: European Language Resources Association (ELRA), pp. 23812388.Google Scholar
Vincze, V. and Csirik, J. (2010). Hungarian corpus of light verb constructions. In Proceedings of Coling 2010, Beijing, China: Coling 2010 Organizing Committee, pp. 11101118.Google Scholar
Vincze, V., Szauter, D., Almási, A., Móra, G., Alexin, Z. and Csirik, J. (2010). Hungarian dependency Treebank. In Proceedings of LREC 2010, Valletta, Malta: European Language Resources Association (ELRA), pp. 18551862.Google Scholar
Vincze, V., Nagy, T. I. and Berend, G. (2011a). Detecting noun compounds and light verb constructions: A contrastive study. In Proceedings of MWE 2011, Portland, Oregon, USA: Association for Computational Linguistics, pp. 116121.Google Scholar
Vincze, V., Nagy, T. I. and Berend, G. (2011b). Multiword expressions and named entities in the Wiki50 corpus. In Proceedings of RANLP 2011, Hissar, Bulgaria: RANLP 2011 Organising Committee, pp. 289295.Google Scholar
Vincze, V., Nagy, T. I. and Farkas, R. (2013a). Identifying English and Hungarian light verb constructions: A contrastive approach. In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), Association for Computational Linguistics, Sofia, Bulgaria, pp. 255261.Google Scholar
Vincze, V., Nagy, T. I. and Zsibrita, J. (2013 b). Learning to detect English and Hungarian light verb constructions. ACM Transactions on Speech and Language Processing (TSLP) 10(2). https://protect-eu.mimecast.com/s/xJMuC5747UM2yDwTzuXzz?domain=dl.acm.orgCrossRefGoogle Scholar
Waszczuk, J. (2018). TRAVERSAL at PARSEME shared task 2018: Identification of verbal multiword expressions using a discriminative tree-structured model. In Proceedings of the Joint Workshop on Linguistic Annotation, Multiword Expressions and Constructions (LAW-MWE-CxG-2018), Santa Fe, New Mexico, USA: Association for Computational Linguistics, pp. 275282.Google Scholar
Zampieri, N., Scholivet, M., Ramisch, C. and Favre, B. (2018). Veyn at PARSEME shared task 2018: Recurrent neural networks for VMWE identification. In Proceedings of the Joint Workshop on Linguistic Annotation, Multiword Expressions and Constructions (LAW-MWE-CxG-2018), Santa Fe, New Mexico, USA: Association for Computational Linguistics, pp. 290296.Google Scholar
Zsibrita, J., Vincze, V. and Farkas, R. (2013). magyarlanc: A toolkit for morphological and dependency parsing of Hungarian. In Proceedings of RANLP, Hissar, Bulgaria: RANLP 2013 Organizing Committee, pp. 763771.Google Scholar