Detecting light verb constructions across languages

István Nagy T.; Anita Rácz; Veronika Vincze

doi:10.1017/S1351324919000330

Detecting light verb constructions across languages

Published online by Cambridge University Press: 15 July 2019

István Nagy T. ,

Anita Rácz and

Veronika Vincze

Show author details

István Nagy T.: Affiliation:
Black Swan Hungary, Budapest, Hungary
Anita Rácz: Affiliation:
Institute of Informatics, University of Szeged, Szeged, Hungary
Veronika Vincze*: Affiliation:
MTA-SZTE Research Group on Artificial Intelligence, Szeged, Hungary
*: *Corresponding author. Email: [email protected]

Article contents

Abstract
References

Get access

Rights & Permissions

Abstract

Light verb constructions (LVCs) are verb and noun combinations in which the verb has lost its meaning to some degree and the noun is used in one of its original senses, typically denoting an event or an action. They exhibit special linguistic features, especially when regarded in a multilingual context. In this paper, we focus on the automatic detection of LVCs in raw text in four different languages, namely, English, German, Spanish, and Hungarian. First, we analyze the characteristics of LVCs from a linguistic point of view based on parallel corpus data. Then, we provide a standardized (i.e., language-independent) representation of LVCs that can be used in machine learning experiments. After, we experiment on identifying LVCs in different languages: we exploit language adaptation techniques which demonstrate that data from an additional language can be successfully employed in improving the performance of supervised LVC detection for a given language. As there are several annotated corpora from several domains in the case of English and Hungarian, we also investigate the effect of simple domain adaptation techniques to reduce the gap between domains. Furthermore, we combine domain adaptation techniques with language adaptation techniques for these two languages. Our results show that both out-domain and additional language data can improve performance. We believe that our language adaptation method may have practical implications in several fields of natural language processing, especially in machine translation.

Keywords

Semantics Machine learning Lexicography Multilinguality

Type: Article
Information: Natural Language Engineering , Volume 26 , Issue 3 , May 2020 , pp. 319 - 348

DOI: https://doi.org/10.1017/S1351324919000330 [Opens in a new window]
Copyright: © Cambridge University Press 2019

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Alonso Ramos, M. (2000). Verbos de apoyo, funciones léxicas y traducción automática. Revista de lexicografía 6, 155–177.CrossRef Google Scholar

Alonso Ramos, M. (2004). Las construcciones con verbo de apoyo. Madrid: Visor Libros.Google Scholar

Al Saied, H., Constant, M. and Candito, M. (2017). The ATILF-LLF system for parseme shared task: A transition-based verbal multiword expression tagger. In Proceedings of the 13th Workshop on Multiword Expressions (MWE 2017), Valencia, Spain: Association for Computational Linguistics, pp. 127–132.CrossRef Google Scholar

Bannard, C. (2007). A measure of syntactic flexibility for automatically identifying multiword expressions in corpora. In Proceedings of MWE 2007, Morristown, NJ, USA: Association for Computational Linguistics, pp. 1–8.Google Scholar

Belvin, R. S. (1993). The two causative haves are the two possessive haves. In Papers from the Fifth Student Conference in Linguistics, vol. 20, Cambridge: MITWPL, pp. 19–34.Google Scholar

Berk, G., Erden, B. and Güngör, T. (2018). Deep-BGT at PARSEME shared task 2018: Bidirectional LSTM-CRF model for verbal multiword expression identification. In Proceedings of the Joint Workshop on Linguistic Annotation, Multiword Expressions and Constructions (LAW-MWE-CxG-2018), Santa Fe, New Mexico, USA: Association for Computational Linguistics, pp. 248–253.Google Scholar

Blanco Escoda, X. (2000). Verbos soporte y clases de predicados en español. LEA 22, 99–117.Google Scholar

Bohnet, B. (2010). Top accuracy and fast dependency parsing is not a contradiction. In Proceedings of Coling 2010, Beijing, China: Coling 2010 Organizing Committee, pp. 89–97.Google Scholar

Boroş, T. and Burtica, R. (2018). GBD-NER at PARSEME shared task 2018: Multi-word expression detection using bidirectional long-short-term memory networks and graph-based decoding. In Proceedings of the Joint Workshop on Linguistic Annotation, Multiword Expressions and Constructions (LAW-MWE-CxG-2018), Santa Fe, New Mexico, USA: Association for Computational Linguistics, pp. 254–260.Google Scholar

Boroş, T., Pipa, S., Barbu Mititelu, V. and Tufiş, D. (2017). A data-driven approach to verbal multiword expression detection. PARSEME Shared Task system description paper. In Proceedings of the 13th Workshop on Multiword Expressions (MWE 2017), Valencia, Spain: Association for Computational Linguistics, pp. 121–126.CrossRef Google Scholar

Bosque, I. (2001). On the weight of light verb predicates. In Zagona, K., Maléln, E. and Herschenson, J. (eds), Features and Interfaces in Romance, Amsterdam: Benjamins, pp. 23–38.CrossRef Google Scholar

Brants, S., Dipper, S., Eisenberg, P., Hansen-Schirra, S., König, E., Lezius, W., Rohrer, C., Smith, G. and Uszkoreit, H. (2004). TIGER: Linguistic interpretation of a German corpus. Research on Language and Computation 2(4), 597–620.CrossRef Google Scholar

Buckingham, L. (2009). Las construcciones con verbo soporte en un corpus de especialidad. Frankfurt am Main – Bern – Bruxelles – New York – Wien: Peter Lang.Google Scholar

Bußmann, H. (2002). Lexikon der Sprachwissenschaft. Stuttgart: Alfred Kröner.Google Scholar

Butt, M. and Lahiri, A. (2013). Diachronic Pertinacity of Light Verbs. Lingua 135, 7–29.CrossRef Google Scholar

Calzolari, N., Fillmore, C., Grishman, R., Ide, N., Lenci, A., MacLeod, C. and Zampolli, A. (2002). Towards best practice for multiword expressions in computational lexicons. In Proceedings of LREC 2002, Las Palmas, Spain: European Language Resources Association (ELRA), pp. 1934– 1940.Google Scholar

Daniels, K. (1963). Substantivierungstendenzen in der deutschen Gegenwartssprache: Nominaler Ausbau des verbalen Denkkreises. Düsseldorf: Schwann.Google Scholar

Danlos, L. (2010). Extension de la notion de verbe support. In Nakamura, T., Laporte, E., Dister, A. and Fairon, C. (eds), Les Tables, La grammaire par le menu, Volume d’ hommage à Christian Leclère, Louvain: Presses Universitaires de Louvain, pp. 81–90.Google Scholar

, Duden. (2006). Der Duden in 12 Bänden. Das Standardwerk zur deutschen Sprache: Duden 06. Das Aussprachewörterbuch: Unerlässlich für die richtige Aussprache. Betonung. Namen: Bd 6 (Duden Series Volume 6): Band 6. Gebundene Ausgabe, Mannheim: Bibliographisches Institut (F.A. Brockhaus).Google Scholar

é. Kiss, K. (2002). The Syntax of Hungarian. Cambridge: Cambridge University Press.CrossRef Google Scholar

Ehren, R., Lichte, T. and Samih, Y. (2018). Mumpitz at PARSEME shared task 2018: A bidirectional LSTM for the identification of verbal multiword expressions. In Proceedings of the Joint Workshop on Linguistic Annotation, Multiword Expressions and Constructions (LAW-MWE-CxG-2018), Santa Fe, New Mexico, USA: Association for Computational Linguistics, pp. 261–267.Google Scholar

Fazly, A. and Stevenson, S. (2007). Distinguishing subtypes of multiword expressions using linguistically-motivated statistical measures. In Proceedings of MWE 2007, Prague, Czech Republic: Association for Computational Linguistics, pp. 9–16.Google Scholar

Fleischer, W., Helbig, G. and Lerchner, G. (2001). Kleine Einzyklopädie. Deutsche Sprache. Frankfurt am Main – Berlin – Bruxelles – New York – Wien: Peter Lang.Google Scholar

Hale, K. and Keyser, S.J. (2002). Prolegomenon to a Theory of Argument Structure. Cambridge: MIT Press.CrossRef Google Scholar

Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P. and Witten, I.H. (2009). The WEKA data mining software: An update. SIGKDD Explorations 11(1), 10–18.CrossRef Google Scholar

Häusermann, J. (1977). Hauptprobleme der deutschen Phraseologie auf der Basis sowjetischer Forschungsergebnisse. Tübingen: M. Niemeyer.CrossRef Google Scholar

Heine, A. (2006). Funktionsverbgefüge in System, Text und korpusbasierter (Lerner-)Lexikographie. Frankfurt am Main: Peter Lang.Google Scholar

Helbig, G. and Buscha, J. (2001). Deutsche Grammatik. Ein Handbuch für den Ausländerunterricht. Berlin and München: Langenscheidt.Google Scholar

Hwang, J.D., Bhatia, A., Bonial, C., Mansouri, A., Vaidya, A., Xue, N. and Palmer, M. (2010). PropBank annotation of multilingual light verb constructions. In Proceedings of the Fourth Linguistic Annotation Workshop, Uppsala, Sweden: Association for Computational Linguistics, pp. 82–90.Google Scholar

Kearns, K. (2002). Light verbs in English. Manuscript.Google Scholar

Kim, S.N. (2008). Statistical Modeling of Multiword Expressions. PhD thesis, Melbourne: University of Melbourne.Google Scholar

Klyueva, N., Doucet, A. and Straka, M. (2017). Neural networks for multi-word expression detection. In Proceedings of the 13th Workshop on Multiword Expressions (MWE 2017), Valencia, Spain: Association for Computational Linguistics, pp. 60–65.CrossRef Google Scholar

Kolesnikova, O. and Gelbukh, A. (2010). Supervised machine learning for predicting the meaning of verb-noun combinations in Spanish. In Advances in Soft Computing, Berlin – Heidelberg: Springer, pp. 196–207.CrossRef Google Scholar

Krenn, B. (2008). Description of evaluation resource – German PP-verb data. In Proceedings of MWE 2008, Marrakech, Morocco: European Language Resources Association (ELRA), pp. 7–10.Google Scholar

Langer, S. (2005). A formal specification of support verb constructions. In Langer, S. and Schnorbusch, D. (eds), Semantik im Lexikon, Tübingen: Gunter Narr Verlag, pp. 179–202.Google Scholar

de León, Leoni, J.A. (2014). Lexical-syntactic analysis model of Spanish multi-word expressions. In Nolan, B. and Periñán-Pascual, C. (eds), Language Processing and Grammars. The role of functionally oriented computational models, Amsterdam: Benjamins, pp. 39–77.CrossRef Google Scholar

Maldonado, A., Han, L., Moreau, E., Alsulaimani, A., Chowdhury, K.D., Vogel, C. and Liu, Q. (2017). Detection of verbal multi-word expressions via conditional random fields with syntactic dependency features and semantic re-ranking. In Proceedings of the 13th Workshop on Multiword Expressions (MWE 2017), Valencia, Spain: Association for Computational Linguistics, pp. 114–120.CrossRef Google Scholar

Marimon, M., Fisas, B., Bel, N., Arias, B., Vázquez, S., Vivaldi, J., Torner, S., Villegas, M. and Lorente, M. (2012). The IULA Treebank. In Proceedings of the Eight International Conference on Language Resources and Evaluation (LREC’ 12), Istanbul, Turkey: European Language Resources Association (ELRA), pp. 1920–1926.Google Scholar

McDonald, R., Nivre, J., Quirmbach-Brundage, Y., Goldberg, Y., Das, D., Ganchev, K., Hall, K., Petrov, S., Zhang, H., Täckström, O., Bedini, C., Bertomeu Castelló, N. and Lee, J. (2013). Universal dependency annotation for multilingual parsing. In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), Sofia, Bulgaria: Association for Computational Linguistics, pp. 92–97.Google Scholar

Mel’čuk, I. (2005). Verbes supports sans peine. Lingvisticae Investigationes 27(2), 203–217.Google Scholar

Mel’čuk, I. (1974). Esquisse d’un modèle linguistique du type “Sens<-> Texte”. In Problèmes actuels en psycholinguistique. Colloques inter. du CNRS, no. 206, Paris: CNRS, pp. 291–317.Google Scholar

Mel’čuk, I., Clas, A. and Polguère, A. (1995). Introduction à lexicologie explicative et combinatoire. Louvain-la-Neuve, France: Duculot.Google Scholar

Meyers, A., Reeves, R. and Macleod, C. (2004). NP-External arguments: A study of argument sharing in English. In Proceedings of MWE 2004, Barcelona, Spain: Association for Computational Linguistics, pp. 96–103.Google Scholar

Moreau, E., Alsulaimani, A., Maldonado, A. and Vogel, C. (2018). CRF-Seq and CRF-DepTree at PARSEME shared task 2018: Detecting verbal MWEs using sequential and dependency-based approaches. In Proceedings of the Joint Workshop on Linguistic Annotation, Multiword Expressions and Constructions (LAW-MWE-CxG-2018), Santa Fe, New Mexico, USA: Association for Computational Linguistics, pp. 241–247.Google Scholar

Nagy, T. I. and Vincze, V. (2011). Identifying verbal collocations in Wikipedia articles. In Proceedings of the 14th International Conference on Text, Speech and Dialogue, Berlin, Heidelberg: Springer-Verlag, pp. 179–186.CrossRef Google Scholar

Nagy, T. I., Vincze, V. and Berend, G. (2011). Domain-dependent identification of multiword expressions. In Proceedings of RANLP 2011, Hissar, Bulgaria: RANLP 2011 Organising Committee, pp. 622–627.Google Scholar

Nagy, T. I., Vincze, V. and Farkas, R. (2013). Full-coverage identification of English light verb constructions. In Proceedings of the Sixth International Joint Conference on Natural Language Processing, Nagoya, Japan: Asian Federation of Natural Language Processing, pp. 329–337.Google Scholar

Nerima, L., Foufi, V. and Wehrli, E. (2017). Parsing and MWE detection: Fips at the PARSEME shared Tas. In Proceedings of the 13th Workshop on Multiword Expressions (MWE 2017), Valencia, Spain: Association for Computational Linguistics, pp. 54–59.CrossRef Google Scholar

Quinlan, R. (1993). C4.5: Programs for Machine Learning. San Mateo, CA: Morgan Kaufmann Publishers.Google Scholar

Rácz, A., Nagy, T. I. and Vincze, V. (2014). 4FX: Light verb constructions in a multilingual parallel corpus. In Calzolari, N., Choukri, K., Declerck, T., Loftsson, H., Maegaard, B., Mariani, J., Moreno, A., Odijk, J. and Piperidis, S. (eds), Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC’ 14), Reykjavik, Iceland: European Language Resources Association (ELRA).Google Scholar

Ramisch, C., Cordeiro, S.R., Savary, A., Vincze, V., Barbu Mititelu, V., Bhatia, A., Buljan, M., Candito, M., Gantar, P., Giouli, V., Güngör, T., Hawwari, A., Iñurrieta, U., Kovalevskaitė, J., Krek, S., Lichte, T., Liebeskind, C., Monti, J., Parra Escartín, C., QasemiZadeh, B., Ramisch, R., Schneider, N., Stoyanova, I., Vaidya, A. and Walsh, A. (2018). Edition 1.1 of the PARSEME Shared Task on Automatic Identification of Verbal Multiword Expressions. In Proceedings of the Joint Workshop on Linguistic Annotation, Multiword Expressions and Constructions (LAW-MWE-CxG-2018), Santa Fe, New Mexico, USA: Association for Computational Linguistics, pp. 222–240.Google Scholar

Real Academia Española, Asociación de Academias de la Lengua Española. (2009). Nueva Gramätica de la Lengua Española. Madrid: Espasa Libros.Google Scholar

Sag, I.A., Baldwin, T., Bond, F., Copestake, A. and Flickinger, D. (2002). Multiword expressions: A pain in the neck for NLP. In Proceedings of CICLing 2002, Berlin – Heidelberg – New York: Springer, pp. 1–15.Google Scholar

Sanromán Vilas, B. (2009). Towards a semantically oriented selection of the values of Oper₁. The case of golpe ‘blow’ in Spanish. In Proceedings of MTT 2009, Montreal, Canada: Université de Montréal, pp. 327–337.Google Scholar

Savary, A., Ramisch, C., Cordeiro, S., Sangati, F., Vincze, V., QasemiZadeh, B., Candito, M., Cap, F., Giouli, V., Stoyanova, I. and Doucet, A. (2017). The PARSEME shared task on automatic identification of verbal multiword expressions. In Proceedings of the 13th Workshop on Multiword Expressions (MWE 2017), Valencia, Spain: Association for Computational Linguistics, pp. 31–47.CrossRef Google Scholar

Simkó, K.I., Kovács, V. and Vincze, V. (2017). USzeged: Identifying verbal multiword expressions with POS tagging and parsing techniques. In Proceedings of the 13th Workshop on Multiword Expressions (MWE 2017), Valencia, Spain: Association for Computational Linguistics, pp. 48–53.CrossRef Google Scholar

Steinberger, R., Pouliquen, B., Widiger, A., Ignat, C., Erjavec, T. and Tufiş, D. (2006). The JRC-Acquis: A multilingual aligned parallel corpus with 20+ languages. In Proceedings of LREC 2006, Genova, Italy: European Language Resources Association (ELRA), pp. 2142–2147.Google Scholar

Stevenson, S., Fazly, A. and North, R. (2004). Statistical measures of the semi-productivity of light verb constructions. In MWE 2004, Barcelona, Spain: Association for Computational Linguistics, pp. 1–8.Google Scholar

Stodden, R., QasemiZadeh, B. and Kallmeyer, L. (2018). TRAPACC and TRAPACCS at PARSEME shared task 2018: Neural transition tagging of verbal multiword expressions. In Proceedings of the Joint Workshop on Linguistic Annotation, Multiword Expressions and Constructions (LAW-MWE-CxG-2018), Santa Fe, New Mexico, USA: Association for Computational Linguistics, pp. 268–274.Google Scholar

Surdeanu, M., Johansson, R., Meyers, A., Màrquez, L. and Nivre, J. (2008). The CoNLL-2008 shared task on joint parsing of syntactic and semantic dependencies. In Proceedings of the Twelfth Conference on Computational Natural Language Learning, Association for Computational Linguistics, pp. 159–177.CrossRef Google Scholar

Szarvas, G., Vincze, V., Farkas, R., Móra, G. and Gurevych, I. (2012). Cross-genre and cross-domain detection of semantic uncertainty. Computational Linguistics – Special Issue on Modality and Negation 38(2), 335–367.CrossRef Google Scholar

Tan, Y.F., -Y, Kan M.. and Cui, H. (2006). Extending corpus-based identification of light verb constructions using a supervised learning framework. In Proceedings of MWE 2006, Trento, Italy: ACL, pp. 49–56.Google Scholar

Tu, Y. and Roth, D. (2011). Learning English Light Verb Constructions: Contextual or Statistical. In Proceedings of MWE 2011, Portland, Oregon, USA: Association for Computational Linguistics, pp. 31–39.Google Scholar

Varga, L. (2014). Verbe support et noms prédicatifs à l’accusatif du hongrois. In Kakoyianni-Doa, F. (ed), Penser le Lexique-Grammaire; Perspectives actuelles, Paris: Honoré Champion, pp. 249–261.Google Scholar

Vincze, V. (2011). Semi-Compositional Noun + Verb Constructions: Theoretical Questions and Computational Linguistic Analyses. PhD thesis, Szeged, Hungary: University of Szeged.Google Scholar

Vincze, V. (2012). Light verb constructions in the SzegedParalellFX English–Hungarian parallel corpus. In Proceedings of LREC 2012, Istanbul, Turkey: European Language Resources Association (ELRA), pp. 2381–2388.Google Scholar

Vincze, V. and Csirik, J. (2010). Hungarian corpus of light verb constructions. In Proceedings of Coling 2010, Beijing, China: Coling 2010 Organizing Committee, pp. 1110–1118.Google Scholar

Vincze, V., Szauter, D., Almási, A., Móra, G., Alexin, Z. and Csirik, J. (2010). Hungarian dependency Treebank. In Proceedings of LREC 2010, Valletta, Malta: European Language Resources Association (ELRA), pp. 1855–1862.Google Scholar

Vincze, V., Nagy, T. I. and Berend, G. (2011a). Detecting noun compounds and light verb constructions: A contrastive study. In Proceedings of MWE 2011, Portland, Oregon, USA: Association for Computational Linguistics, pp. 116–121.Google Scholar

Vincze, V., Nagy, T. I. and Berend, G. (2011b). Multiword expressions and named entities in the Wiki50 corpus. In Proceedings of RANLP 2011, Hissar, Bulgaria: RANLP 2011 Organising Committee, pp. 289–295.Google Scholar

Vincze, V., Nagy, T. I. and Farkas, R. (2013a). Identifying English and Hungarian light verb constructions: A contrastive approach. In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), Association for Computational Linguistics, Sofia, Bulgaria, pp. 255–261.Google Scholar

Vincze, V., Nagy, T. I. and Zsibrita, J. (2013 b). Learning to detect English and Hungarian light verb constructions. ACM Transactions on Speech and Language Processing (TSLP) 10(2). https://protect-eu.mimecast.com/s/xJMuC5747UM2yDwTzuXzz?domain=dl.acm.org CrossRef Google Scholar

Waszczuk, J. (2018). TRAVERSAL at PARSEME shared task 2018: Identification of verbal multiword expressions using a discriminative tree-structured model. In Proceedings of the Joint Workshop on Linguistic Annotation, Multiword Expressions and Constructions (LAW-MWE-CxG-2018), Santa Fe, New Mexico, USA: Association for Computational Linguistics, pp. 275–282.Google Scholar

Zampieri, N., Scholivet, M., Ramisch, C. and Favre, B. (2018). Veyn at PARSEME shared task 2018: Recurrent neural networks for VMWE identification. In Proceedings of the Joint Workshop on Linguistic Annotation, Multiword Expressions and Constructions (LAW-MWE-CxG-2018), Santa Fe, New Mexico, USA: Association for Computational Linguistics, pp. 290–296.Google Scholar

Zsibrita, J., Vincze, V. and Farkas, R. (2013). magyarlanc: A toolkit for morphological and dependency parsing of Hungarian. In Proceedings of RANLP, Hissar, Bulgaria: RANLP 2013 Organizing Committee, pp. 763–771.Google Scholar

Article contents

Detecting light verb constructions across languages

Abstract

Keywords

Access options

References

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests