Construction Grammar and Language Models

doi:10.1017/9781009049139.023

22 - Construction Grammar and Language Models

from Part VI - Constructional Applications

Published online by Cambridge University Press: 30 January 2025

Harish Tayyar Madabushi ,

Laurence Romain ,

Petar Milin and

Dagmar Divjak

Edited by

Mirjam Fried and

Kiki Nikiforidou

Show author details

Mirjam Fried: Affiliation:
Univerzita Karlova
Kiki Nikiforidou: Affiliation:
University of Athens, Greece

Book contents

Get access

Summary

Recent progress in deep learning and natural language processing has given rise to powerful models that are primarily trained on a cloze-like task and show some evidence of having access to substantial linguistic information, including some constructional knowledge. This groundbreaking discovery presents an exciting opportunity for a synergistic relationship between computational methods and Construction Grammar research. In this chapter, we explore three distinct approaches to the interplay between computational methods and Construction Grammar: (i) computational methods for text analysis, (ii) computational Construction Grammar, and (iii) deep learning models, with a particular focus on language models. We touch upon the first two approaches as a contextual foundation for the use of computational methods before providing an accessible, yet comprehensive overview of deep learning models, which also addresses reservations construction grammarians may have. Additionally, we delve into experiments that explore the emergence of constructionally relevant information within these models while also examining the aspects of Construction Grammar that may pose challenges for these models. This chapter aims to foster collaboration between researchers in the fields of natural language processing and Construction Grammar. By doing so, we hope to pave the way for new insights and advancements in both these fields.

Keywords

deep learning natural language processing language models Construction Grammar

Type: Chapter
Information: The Cambridge Handbook of Construction Grammar , pp. 572 - 595

DOI: https://doi.org/10.1017/9781009049139.023 [Opens in a new window]

Publisher: Cambridge University Press

Print publication year: 2025

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Book purchase

Temporarily unavailable

References

Bahdanau, D., Cho, K., & Bengio, Y. (2016). Neural machine translation by jointly learning to align and translate. arXiv:1409.0473. https://doi.org/10.48550/arXiv.1409.0473.CrossRef Google Scholar

Balasubramanian, S., Jain, N., Jindal, G., Awasthi, A., & Sarawagi, S. (2020). What’s in a name? Are BERT named entity representations just as good for any other name? arXiv:2007.06897. https://doi.org/10.48550/arXiv.2007.06897.CrossRef Google Scholar

Baroni, M., Bernardini, S., Ferraresi, A. & Zanchetta, E. (2009). The WaCky wide web: A collection of very large linguistically processed web-crawled corpora. Language Resources and Evaluation, 43(3), 209–226.CrossRef Google Scholar

Beniaguev, D., Segev, I., & London, M. (2021). Single cortical neurons as deep artificial neural networks. Neuron, 109(17), 2727–2739.CrossRef Google Scholar PubMed

Bergen, B. K. & Chang, N. (2005). Embodied Construction Grammar in simulation-based language understanding. In Östman, J.-O. & Fried, M., eds., Construction Grammars: Cognitive Grounding and Theoretical Extensions. Amsterdam & Philadelphia: John Benjamins, pp. 147–190.CrossRef Google Scholar

Bonial, C. & Tayyar Madabushi, H. (2023). Proceedings of the First International Workshop on Construction Grammars and NLP (CxGs+NLP, GURT/SyntaxFest 2023). Washington, DC: Association for Computational Linguistics.Google Scholar

Borin, L., Dannélls, D., & Grūzītis, N. (2018). Linguistics vs. language use in constructicon building and use. In Lyngfelt, B., Borin, L., Ohara, K., & Torrent, T., eds., Constructicography. Amsterdam & Philadelphia: John Benjamins, pp. 229–253.CrossRef Google Scholar

Brown, T. B., Mann, B., Ryder, N., Subbiah, M., Kaplan, J., Dhariwal, P., Neelakantan, A., Shyam, P., Sastry, G., Askell, A., Agarwal, S., Herbert-Voss, A., Krueger, G., Henighan, T., Child, R., Ramesh, A., Ziegler, D. M., Wu, J., Winter, C., Hesse, C., Chen, M., Sigler, E., Litwin, M., Gray, S., Chess, B., Clark, J., Berner, C., McCandlish, S., Radford, A., Sutskever, I., & Amodei, D. (2020). Language models are few-shot learners. arXiv:2005.14165. https://doi.org/10.48550/arXiv.2005.14165.CrossRef Google Scholar

Chi, E. A., Hewitt, J., & Manning, C. D. (2020). Finding universal grammatical relations in multilingual BERT. arXiv:2005.04511. https://doi.org/10.18653/v1/2020.acl-main.493.CrossRef Google Scholar

Chomsky, N. (1959). Review of Skinner’s Verbal Behavior. Language, 35, 26–58.CrossRef Google Scholar

Church, K. W. (2020). Emerging trends: Subwords, seriously? Natural Language Engineering, 26(3), 375–382.CrossRef Google Scholar

Clark, K., Luong, M.-T., Le, Q. V., & Manning, C. D. (2020). Electra: Pre-training text encoders as discriminators rather than generators. arXiv:2003.10555. https://doi.org/10.48550/arXiv.2003.10555.CrossRef Google Scholar

Croft, W. & Cruse, D. A. (2004). Cognitive Linguistics. New York: Cambridge University Press.CrossRef Google Scholar

Dai, A. M. & Le, Q. V. (2015). Semi-supervised sequence learning. In Proceedings of the 28th International Conference on Neural Information Processing Systems, Volume 2, pp. 3079–3087.Google Scholar

Devlin, J., Chang, M.-W., Lee, K., & Toutanova, K. (2018). BERT: Pre-training of deep bidirectional transformers for language understanding. arXiv:1810.04805. https://doi.org/10.48550/arXiv.1810.04805.CrossRef Google Scholar

Diessel, H. (2013). Construction Grammar and first language acquisition. In Hoffmann, T. & Trousdale, G., eds., The Oxford Handbook of Construction Grammar. Oxford: Oxford University Press, pp. 346–364.Google Scholar

Dunn, J. (2017). Computational learning of Construction Grammars. Language and Cognition, 9(2), 254–292.CrossRef Google Scholar

Elman, J. L. (1990). Finding structure in time. Cognitive Science, 14(2), 179–211.CrossRef Google Scholar

Feldman, J. (2020). Advances in Embodied Construction Grammar. Constructions and Frames, 12(1), 149–169.CrossRef Google Scholar

Fillmore, C. J. (1977). The case for case reopened. In Cole, P., ed., Grammatical Relations. New York: Academic Press, pp. 59–81.CrossRef Google Scholar

Firth, J. (1957). A synopsis of linguistic theory 1930–1955. In Studies in Linguistic Analysis (Special Volume of the Philological Society). Oxford: Basil Blackwell, pp. 1–32.Google Scholar

Garí Soler, A. & Apidianaki, M. (2021). Let’s play mono-poly: BERT can reveal words’ polysemy level and partitionability into senses. Transactions of the Association for Computational Linguistics, 9, 825–844.CrossRef Google Scholar

Goldberg, A. E. (1995). Constructions: A Construction Grammar Approach to Argument Structure. Chicago: University of Chicago Press.Google Scholar

Gow-Smith, E. & Tayyar Madabushi, H. (2022). Improving tokenisation by alternative treatment of spaces. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing (EMNLP 2022), Abu Dhabi. Association for Computational Linguistics, pp. 11430–11433.CrossRef Google Scholar

Gulordava, K., Bojanowski, P., Grave, E., Linzen, T., & Baroni, M. (2018). Colorless green recurrent networks dream hierarchically. Paper presented at the Conference of the NAACL-HLT, New Orleans, Louisiana, June 1–6, 2018.CrossRef Google Scholar

Haber, J. & Poesio, M. (2021). Patterns of lexical ambiguity in contextualised language models. arXiv:2109.13032. https://doi.org/10.48550/arXiv.2109.13032.CrossRef Google Scholar

Hart, B. & Risley, T. R. (1992). American parenting of language-learning children: Persisting differences in family–child interactions observed in natural home environments. Developmental Psychology, 28(6), 1096–1105.CrossRef Google Scholar

Hassan, H., Aue, A., Chen, C., Chowdhary, V., Clark, J., Federmann, C., Huang, X., Junczys-Dowmunt, M., Lewis, W., & Li, M. (2018). Achieving human parity on automatic Chinese to English news translation. arXiv:1803.05567. https://doi.org/10.48550/arXiv.1803.05567.CrossRef Google Scholar

He, P., Liu, X., Gao, J. & Chen, W. (2021). DeBERTa: Decoding-enhanced BERT with disentagled attention. arXiv:2006.03654v6. https://doi.org/10.48550/arXiv.2006.03654.CrossRef Google Scholar

Henderson, J. (2020). The unstoppable rise of computational linguistics in deep learning. Paper presented at the 58th Annual Meeting of the Association for Computational Linguistics, 6294–6306.CrossRef Google Scholar

Hewitt, J. & Manning, C. D. (2019). A structural probe for finding syntax in word representations. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pp. 4129–4138.Google Scholar

Hilpert, M. & Perek, F. (2015). Meaning change in a petri dish: Constructions, semantic vector spaces, and motion charts. Linguistic Vanguard, 1(1), 339–350.CrossRef Google Scholar

Hochreiter, S. & Schmidhuber, J. (1997). Long short-term memory. Neural Computation, 9(8), 1735–1780.CrossRef Google Scholar PubMed

Hoffmann, T. & Trousdale, G. (2013). Construction Grammar: Introduction. In Hoffmann, T. & Trousdale, G., eds., The Oxford Handbook of Construction Grammar. Oxford: Oxford University Press, pp. 1–14.Google Scholar

Janda, L. A., Endresen, A., Zhukova, V., Mordashova, D., & Rakhilina, E. (2020). How to build a constructicon in five years. Belgian Journal of Linguistics, 34(1), 161–173.CrossRef Google Scholar

Kim, T., Choi, J., Edmiston, D., & Lee, S.-G. (2020). Are pre-trained language models aware of phrases? Simple but strong baselines for grammar induction. arXiv:2002.00737. https://doi.org/10.48550/arXiv.2002.00737.CrossRef Google Scholar

Knight, K., Badarau, B., Baranescu, L., Bonial, C., Bardocz, M., Griffitt, K., Hermjakob, U., Marcu, D., Palmer, M., O’Gorman, T., & Schneider, N. (2021). Abstract Meaning Representation (AMR) Annotation Release 3.0. https://hdl.handle.net/11272.1/AB2/82CVJF.Google Scholar

Kudo, T. (2018). Subword regularization: Improving neural network translation models with multiple subword candidates. arXiv:1804.10959. https://doi.org/10.48550/arXiv.1804.10959.CrossRef Google Scholar

Lan, Z., Chen, M., Goodman, S., Gimpel, K., Sharma, P., & Soricut, R. (2019). Albert: A lite bert for self-supervised learning of language representations. arXiv:1909.11942. https://doi.org/10.48550/arXiv.1909.11942.CrossRef Google Scholar

Landauer, T. K. & Dumais, S. T. (1997). A solution to Plato’s problem: The latent semantic analysis of acquisition induction, and representation of knowledge. Psychological Review, 104(2), 211–240.CrossRef Google Scholar

Levshina, N. & Heylen, K. (2014). A radically data-driven Construction Grammar: Experiments with Dutch causative constructions. In Boogaart, R., Colleman, T., & Rutten, G., eds., Extending the Scope of Construction Grammar. Berlin: De Gruyter Mouton, pp. 17–46.CrossRef Google Scholar

Levy, O. & Goldberg, Y. (2014). Neural word embedding as implicit matrix factorization. Advances in Neural Information Processing Systems, 27, 2177–2185.Google Scholar

Levy, O., Goldberg, Y., & Dagan, I. (2015). Improving distributional similarity with lessons learned from word embeddings. Transactions of the Association for Computational Linguistics, 3, 211–225.CrossRef Google Scholar

Li, B., Zhu, Z., Thomas, G., Rudzizc, F., & Xu, Y. (2022). Neural reality of argument structure constructions. arXiv:2202.12246. https://doi.org/10.48550/arXiv.2202.12246.CrossRef Google Scholar

Linzen, T. & Baroni, M. (2021). Syntactic structure from deep learning. Annual Review of Linguistics, 7, 195–212.CrossRef Google Scholar

Liu, N. F., Gardner, M., Belinkov, Y., Peters, M. E., & Smith, N. A. (2019). Linguistic knowledge and transferability of contextual representations. arXiv:1903.08855. https://doi.org/10.48550/arXiv.1903.08855.CrossRef Google Scholar

Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., Levy, O., Lewis, M., Zettlemoyer, L., & Stoyanov, V. (2019). RoBERTa: A robustly optimized BERT pretraining approach. arXiv:1907.11692. https://doi.org/10.48550/arXiv.1907.11692.CrossRef Google Scholar

Loureiro, D. & Jorge, A. (2019). Language modelling makes sense: Propagating representations through WordNet for full-coverage word sense disambiguation. arXiv:1906.10007. https://doi.org/10.48550/arXiv.1906.10007.CrossRef Google Scholar

Loureiro, D., Jorge, A. M., & Camacho-Collados, J. (2022). LMMS reloaded: Transformer-based sense embeddings for disambiguation and beyond. Artificial Intelligence, 305, 103661. https://doi.org/10.1016/j.artint.2022.103661.CrossRef Google Scholar

Lund, K. & Burgess, C. (1996). Producing high-dimensional semantic spaces from lexical co-occurrence. Behavior Research Methods, Instrumentation, and Computers, 28, 203–208.CrossRef Google Scholar

Lyngfelt, B., Bäckström, L., Borin, L., Ehrlemark, A., & Rydstedt, R. (2018). Constructicography at work. In Lyngfelt, B. et al., eds., Constructicography: Constructicon Development across Languages. Amsterdam & Philadelphia: John Benjamins, pp. 41–106.CrossRef Google Scholar

Lyngfelt, B., Borin, L., Ohara, K., & Torrent, T. T., eds. (2018). Constructicography: Constructicon Development across Languages. Amsterdam & Philadelphia: John Benjamins.CrossRef Google Scholar

Mikolov, T., Chen, K., Corrado, G., & Dean, J. (2013). Efficient estimation of word representations in vector space. arXiv:1301.3781. https://doi.org/10.48550/arXiv.1301.3781.CrossRef Google Scholar

Nair, S., Srinivasan, M., & Meylan, S. (2020). Contextualized word embeddings encode aspects of human-like word sense knowledge. arXiv:2010.13057. https://doi.org/10.48550/arXiv.2010.13057.CrossRef Google Scholar

Nevens, J., Van Eecke, P., & Beuls, K. (2019). A practical guide to studying emergent communication through grounded language games. Paper presented at the AISB Language Learning for Artificial Agents Symposium, Falmouth, UK.Google Scholar

Nivre, J., De Marneffe, M.-C., Ginter, F., Goldberg, Y., Hajic, J., Manning, C. D., McDonald, R., Petrov, S., Pyysalo, S., & Silveira, N. (2016). Universal dependencies v1: A multilingual treebank collection. In Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC ’16), pp. 1659–1666.Google Scholar

Nivre, J., Hall, J., Nilsson, J., Chanev, A., Eryigit, G., Kübler, S., Marinov, S., & Marsi, E. (2007). MaltParser: A language-independent system for data-driven dependency parsing. Natural Language Engineering, 13(2), 95–135.CrossRef Google Scholar

Perek, F. (2016). Recent change in the productivity and schematicity of the way-construction: A distributional semantics analysis. Corpus Linguistics and Linguistic Theory, 14(1), 65–97.CrossRef Google Scholar

Piao, S., Bianchi, F., Dayrell, C., D’egidio, A., & Rayson, P. (2015). Development of the multilingual semantic annotation system. Paper presented at the Association for Computational Linguistics.CrossRef Google Scholar

Radford, A., Narasimhan, K., Salimans, T., & Sutskever, I. (2018). Improving language understanding by generative pre-training. PrePrint retrieved from https://bit.ly/3zFOl3M.Google Scholar

Radford, A., Wu, J., Child, R., Luan, D., Amodei, D., & Sutskever, I. (2019). Language models are unsupervised multitask learners. OpenAI blog, 1(8), 9.Google Scholar

Ribeiro, M. T., Wu, T., Guestrin, C., & Singh, S. (2020). Beyond accuracy: Behavioral testing of NLP models with CheckList. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, pp. 4902–4912.CrossRef Google Scholar

Rogers, A., Kovaleva, O., & Rumshisky, A. (2020). A primer in BERTology: What we know about how BERT works. Transactions of the Association for Computational Linguistics, 8, 842–866.CrossRef Google Scholar

Romain, L. (2022). Putting the argument back into argument structure construction. Cognitive Linguistics, 33(1), 35–64.CrossRef Google Scholar

Rosa, R. & Mareček, D. (2019). Inducing syntactic trees from BERT representations. arXiv:1906.11511. https://doi.org/10.48550/arXiv.1906.11511.CrossRef Google Scholar

Schmid, H. (1994). TreeTagger – a language independent part-of-speech tagger. Retrieved from www.ims.uni-stuttgart.de/projekte/corplex/TreeTagger/.Google Scholar

Sennrich, R., Haddow, B., & Birch, A. (2015). Neural machine translation of rare words with subword units. arXiv:1508.07909. https://doi.org/10.48550/arXiv.1508.07909.CrossRef Google Scholar

Skinner, B. F. (1957). Verbal Behavior. New York: Appleton-Century-Crofts.CrossRef Google Scholar

Steels, L. (2004). Constructivist development of Grounded Construction Grammar. In Daelemans, W. & Walker, M., eds., Proceedings of the 42nd Annual Meeting of the Association for Computational Linguistics. Barcelona: Association for Computational Linguistic Conference, pp. 9–19. https://doi.org/10.3115/1218955.1218957.Google Scholar

Sutskever, I., Vinyals, O., & Le, Q. V. (2014). Sequence to sequence learning with neural networks. In Proceedings of the 27th International Conference on Neural Information Processing Systems, pp. 3104–3112.Google Scholar

Tang, Y., Nyengaard, J. R., De Groot, D. M., & Gundersen, H. J. G. (2001). Total regional and global number of synapses in the human brain neocortex. Synapse, 41(3), 258–273.CrossRef Google Scholar PubMed

Tayyar Madabushi, H., Divjak, D. & Milin, P. (2022). Abstraction not memory: BERT and the English article system. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 924–931.CrossRef Google Scholar

Tayyar Madabushi, H., Romain, L., Divjak, D., & Milin, P. (2020). CxGBERT: BERT meets Construction Grammar. In Proceedings of the 28th International Conference on Computational Linguistics, pp. 4020–4032.CrossRef Google Scholar

Tenney, I., Xia, P., Chen, B., Wang, A., Poliak, A., McCoy, R. T., Kim, N., Van Durme, B., Bowman, S. R., & Das, D. (2019). What do you learn from context? Probing for sentence structure in contextualized word representations. arXiv:1905.06316. https://doi.org/10.48550/arXiv.1905.06316.CrossRef Google Scholar

Tomasello, M. (2000). First steps toward a usage-based theory of language acquisition. Cognitive Linguistics, 11(1–2), 61–82.CrossRef Google Scholar

Tseng, Y.-H., Shih, C.-F., Chen, P.-E., Chou, H.-Y., Ku, M.-C., & Hsieh, S.-K. (2022). CxLM: A construction and context-aware language model. In Proceedings of the 13th Conference on Language Resources and Evaluation, Marseille, 20–25 June 2022, pp. 6361–6369.Google Scholar

Van Eecke, P. & Beuls, K. (2018). Exploring the creative potential of computational Construction Grammar. Zeitschrift für Anglistik und Amerikanistik, 66(3), 341–355.CrossRef Google Scholar

van Trijp, R. (2017). A computational Construction Grammar for English. In The AAAI 2017 Spring Symposium on Computational Construction Grammar and Natural Language Understanding Technical Report SS-17-02. Stanford: AAAI, pp. 266–273.Google Scholar

van Trijp, R., Beuls, K., & Van Eecke, P. (2022). The FCG Editor: An innovative environment for engineering computational construction grammars. PLoS ONE, 17(6). https://doi.org/10.1371/journal.pone.0269708.CrossRef Google Scholar PubMed

Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, Ł., & Polosukhin, I. (2017). Attention is all you need. In Advances in Neural Information Processing Systems, 30, pp. 5998–6008.Google Scholar

Vilares, D., Strzyz, M., Søgaard, A., & Gómez-Rodríguez, C. (2020). Parsing as pretraining. In Proceedings of the AAAI Conference on Artificial Intelligence 34(5), pp. 9114–9121.CrossRef Google Scholar

Vulić, I., Ponti, E. M., Litschko, R., Glavaš, G., & Korhonen, A. (2020). Probing pretrained language models for lexical semantics. arXiv:2010.05731. https://doi.org/10.48550/arXiv.2010.05731.CrossRef Google Scholar

Wei, J., Tay, Y., Bommasani, R., Raffel, C., Zoph, B., Borgeaud, S., Yogatama, D., Bosma, M., Zhou, D., Metzler, D., Chi, E. H., Hashimoto, T., Vinyals, O., Liang, P., Dean, J., & Fedus, W. (2022). Emergent abilities of large language models. Transactions on Machine Learning Research. https://openreview.net/forum?id=yzkSU5zdwD.Google Scholar

Weissweiler, L., Hofmann, V., Köksal, A., & Schütze, H. (2022). The better your syntax, the better your semantics? Probing pretrained language models for the English comparative correlative. arXiv:2210.13181. https://doi.org/10.48550/arXiv.2210.13181.CrossRef Google Scholar

Yenicelik, D., Schmidt, F., & Kilcher, Y. (2020). How does BERT capture semantics? A closer look at polysemous words. In Proceedings of the Third Blackbox NLP Workshop on Analyzing and Interpreting Neural Networks for NLP, pp. 156–162.CrossRef Google Scholar

Zhang, Y., Warstadt, A., Li, H.-S., & Bowman, S. R. (2020). When do you need billions of words of pretraining data? arXiv:2011.04946. https://doi.org/10.48550/arXiv.2011.04946.CrossRef Google Scholar

Zhu, Y., Kiros, R., Zemel, R., Salakhutdinov, R., Urtasun, R., Torralba, A., & Fidler, S. (2015). Aligning books and movies: Towards story-like visual explanations by watching movies and reading books. In Proceedings of the IEEE International Conference on Computer Vision, pp. 19–27.CrossRef Google Scholar