Hostname: page-component-745bb68f8f-b95js Total loading time: 0 Render date: 2025-01-12T23:21:52.578Z Has data issue: false hasContentIssue false

On generalization of the sense retrofitting model

Published online by Cambridge University Press:  31 March 2023

Yang-Yin Lee*
Affiliation:
Department of Computer Science and Information Engineering, National Taiwan University, Taipei, Taiwan
Ting-Yu Yen
Affiliation:
Department of Computer Science and Information Engineering, National Taiwan University, Taipei, Taiwan
Hen-Hsen Huang
Affiliation:
Institute of Information Science, Academia Sinica, Taipei, Taiwan
Yow-Ting Shiue
Affiliation:
Department of Computer Science and Information Engineering, National Taiwan University, Taipei, Taiwan
Hsin-Hsi Chen
Affiliation:
Department of Computer Science and Information Engineering, National Taiwan University, Taipei, Taiwan
*
*Corresponding author: E-mail: [email protected]

Abstract

With the aid of recently proposed word embedding algorithms, the study of semantic relatedness has progressed rapidly. However, word-level representations are still lacking for many natural language processing tasks. Various sense-level embedding learning algorithms have been proposed to address this issue. In this paper, we present a generalized model derived from existing sense retrofitting models. In this generalization, we take into account semantic relations between the senses, relation strength, and semantic strength. Experimental results show that the generalized model outperforms previous approaches on four tasks: semantic relatedness, contextual word similarity, semantic difference, and synonym selection. Based on the generalized sense retrofitting model, we also propose a standardization process on the dimensions with four settings, a neighbor expansion process from the nearest neighbors, and combinations of these two approaches. Finally, we propose a Procrustes analysis approach that inspired from bilingual mapping models for learning representations that outside of the ontology. The experimental results show the advantages of these approaches on semantic relatedness tasks.

Type
Article
Copyright
© The Author(s), 2023. Published by Cambridge University Press

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Footnotes

These authors contributed equally to this work.

References

Artetxe, M., Labaka, G. and Agirre, E. (2016). Learning principled bilingual mappings of word embeddings while preserving monolingual invariance. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing (EMNLP), Austin, TX, USA. Association for Computational Linguistics, pp. 2289–2294.CrossRefGoogle Scholar
Azzini, A., da Costa Pereira, C., Dragoni, M. and Tettamanzi, A.G. (2011). A neuro-evolutionary corpus-based method for word sense disambiguation. IEEE Intelligent Systems 27(6), 2635.CrossRefGoogle Scholar
Bengio, Y., Delalleau, O. and Le Roux, N. (2006). Label propagation and quadratic criterion. In Semi-Supervised Learning.CrossRefGoogle Scholar
Bian, J., Gao, B. and Liu, T.-Y. (2014). Knowledge-powered deep learning for word embedding. In Proceedings of the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML-PKDD), Nancy, France. Springer, pp. 132148.CrossRefGoogle Scholar
Bojanowski, P., Grave, E., Joulin, A. and Mikolov, T. (2017). Enriching word vectors with subword information. Transactions of the Association for Computational Linguistics 5, 135146.CrossRefGoogle Scholar
Bollegala, D., Alsuhaibani, M., Maehara, T. and Kawarabayashi, K.-i. (2016). Joint word representation learning using a corpus and a semantic lexicon. In Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence (AAAI), Phoenix, AZ, USA. AAAI Press, pp. 2690–2696.CrossRefGoogle Scholar
Bolukbasi, T., Chang, K.-W., Zou, J.Y., Saligrama, V. and Kalai, A.T. (2016). Man is to computer programmer as woman is to homemaker? debiasing word embeddings. In Proceedings of the 30th International Conference on Neural Information Processing Systems (NIPS), vol. 29, Barcelona, Spain. Curran Associates, Inc.Google Scholar
Brunet, M.-E., Alkalay-Houlihan, C., Anderson, A. and Zemel, R. (2019). Understanding the origins of bias in word embeddings. In Proceedings of the 36th International Conference on Machine Learning (ICML), Long Beach, CA, USA. PMLR, pp. 803–811.Google Scholar
Bruni, E., Tran, N.-K. and Baroni, M. (2014). Multimodal distributional semantics. Journal of Artificial Intelligence Research 49, 147.CrossRefGoogle Scholar
Bullinaria, J.A. and Levy, J.P. (2007). Extracting semantic representations from word co-occurrence statistics: A computational study. Behavior Research Methods 39(3), 510526.CrossRefGoogle ScholarPubMed
Caliskan, A., Bryson, J.J. and Narayanan, A. (2017). Semantics derived automatically from language corpora contain human-like biases. Science 356(6334), 183186.CrossRefGoogle ScholarPubMed
Camacho-Collados, J. and Pilehvar, M.T. (2018). From word to sense embeddings: A survey on vector representations of meaning. Journal of Artificial Intelligence Research 63, 743788.CrossRefGoogle Scholar
Camacho-Collados, J., Pilehvar, M.T. and Navigli, R. (2015). Nasari: A novel approach to a semantically-aware representation of items. In Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT), Denver, CO, USA. Association for Computational Linguistics, pp. 567–577.CrossRefGoogle Scholar
Dai, Z., Yang, Z., Yang, Y., Carbonell, J.G., Le, Q. and Salakhutdinov, R. (2019). Transformer-xl: Attentive language models beyond a fixed-length context. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics (ACL), Florence, Italy. Association for Computational Linguistics, pp. 29782988.CrossRefGoogle Scholar
Deerwester, S., Dumais, S.T., Furnas, G.W., Landauer, T.K. and Harshman, R. (1990). Indexing by latent semantic analysis. Journal of the American Society for Information Science 41(6), 391407.3.0.CO;2-9>CrossRefGoogle Scholar
Devlin, J., Chang, M.-W., Lee, K. and Toutanova, K. (2019). Bert: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers) (NAACL-HLT), Minneapolis, MN, USA. Association for Computational Linguistics, pp. 41714186.Google Scholar
Dolan, B., Quirk, C. and Brockett, C. (2004). Unsupervised construction of large paraphrase corpora: Exploiting massively parallel news sources. In Proceedings of the 20th International Conference on Computational Linguistics (COLING), Geneva, Switzerland. COLING, pp. 350–356.CrossRefGoogle Scholar
Dragoni, M. and Petrucci, G. (2017). A neural word embeddings approach for multi-domain sentiment analysis. IEEE Transactions on Affective Computing 8(4), 457470.CrossRefGoogle Scholar
Ethayarajh, K. (2019). How contextual are contextualized word representations? comparing the geometry of bert, elmo, and gpt-2 embeddings. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), Hong Kong, China. Association for Computational Linguistics, pp. 55–65.CrossRefGoogle Scholar
Ettinger, A., Resnik, P. and Carpuat, M. (2016). Retrofitting sense-specific word vectors using parallel text. In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT), San Diego, CA, USA. Association for Computational Linguistics, pp. 13781383.CrossRefGoogle Scholar
Faruqui, M., Dodge, J., Jauhar, S.K., Dyer, C., Hovy, E. and Smith, N.A. (2015). Retrofitting word vectors to semantic lexicons. In Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT), Denver, CO. Association for Computational Linguistics, pp. 16061615.CrossRefGoogle Scholar
Finkelstein, L., Gabrilovich, E., Matias, Y., Rivlin, E., Solan, Z., Wolfman, G. and Ruppin, E. (2002). Placing search in context: The concept revisited. ACM Transactions on Information Systems 20(1), 116131.Google Scholar
Ganitkevitch, J., Van Durme, B. and Callison-Burch, C. (2013). PPDB: The paraphrase database. In Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT), Atlanta, GA, USA. Association for Computational Linguistics, pp. 758764.Google Scholar
Glavaš, G. and Vulić, I. (2018). Explicit retrofitting of distributional word vectors. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) ACL, Melbourne, Australia. Association for Computational Linguistics, pp. 3445.CrossRefGoogle Scholar
Hamilton, W.L., Leskovec, J. and Jurafsky, D. (2016). Diachronic word embeddings reveal statistical laws of semantic change. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) (ACL), Berlin, Germany. Association for Computational Linguistics, pp. 1489–1501.CrossRefGoogle Scholar
Huang, E.H., Socher, R., Manning, C.D. and Ng, A.Y. (2012). Improving word representations via global context and multiple word prototypes. In Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) (ACL), Jeju Island, Korea. Association for Computational Linguistics, pp. 873–882.Google Scholar
Iacobacci, I., Pilehvar, M.T. and Navigli, R. (2015). Sensembed: Learning sense embeddings for word and relational similarity. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers) (ACL-IJCNLP), Beijing, China. Association for Computational Linguistics, pp. 95105.CrossRefGoogle Scholar
Jarmasz, M. and Szpakowicz, S. (2004). Roget’s thesaurus and semantic similarity. In Recent Advances in Natural Language Processing III: Selected Papers from RANLP, 2003, 111.Google Scholar
Jauhar, S.K., Dyer, C. and Hovy, E. (2015). Ontologically grounded multi-sense representation learning for semantic vector space models. In Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT), Denver, CO, USA. Association for Computational Linguistics, pp. 683–693.CrossRefGoogle Scholar
Jonauskaite, D., Sutton, A., Cristianini, N. and Mohr, C. (2021). English colour terms carry gender and valence biases: A corpus study using word embeddings. PLoS ONE 16(6), e0251559.CrossRefGoogle ScholarPubMed
Joulin, A., Bojanowski, P., Mikolov, T., Jégou, H. and Grave, E. (2018). Loss in translation: Learning bilingual word mapping with a retrieval criterion. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing (EMNLP), Brussels, Belgium. Association for Computational Linguistics, pp. 2979–2984.CrossRefGoogle Scholar
Kipfer, B.A. (1993). Roget’s 21st Century Thesaurus in Dictionary Form: The Essential Reference for Home, School, or Office. Laurel.Google Scholar
Krebs, A. and Paperno, D. (2016). Capturing discriminative attributes in a distributional space: Task proposal. In Proceedings of the 1st Workshop on Evaluating Vector-Space Representations for NLP, Berlin, Germany. Association for Computational Linguistics, pp. 51–54.CrossRefGoogle Scholar
Landauer, T.K. and Dumais, S.T. (1997). A solution to plato’s problem: The latent semantic analysis theory of acquisition, induction, and representation of knowledge. Psychological Review 104(2), 211240.CrossRefGoogle Scholar
Leacock, C. and Chodorow, M. (1998). Combining local context and WordNet similarity for word sense identification. WordNet: An Electronic Lexical Database 49(2), 265283.Google Scholar
Lee, Y.-Y., Ke, H., Huang, H.-H. and Chen, H.-H. (2016). Combining word embedding and lexical database for semantic relatedness measurement. In Proceedings of the 25th International Conference Companion on World Wide Web (WWW), Montréal, Québec, Canada. International World Wide Web Conferences Steering Committee, pp. 73–74.CrossRefGoogle Scholar
Lee, Y.-Y., Yen, T.-Y., Huang, H.-H. and Chen, H.-H. (2017). Structural-fitting word vectors to linguistic ontology for semantic relatedness measurement. In Proceedings of the 2017 ACM on Conference on Information and Knowledge Management (CIKM), Singapore, Singapore. Association for Computing Machinery, pp. 2151–2154.CrossRefGoogle Scholar
Lee, Y.-Y., Yen, T.-Y., Huang, H.-H., Shiue, Y.-T. and Chen, H.-H. (2018). Gensense: A generalized sense retrofitting model. In Proceedings of the 27th International Conference on Computational Linguistics (COLING), Santa Fe, NM, USA. Association for Computational Linguistics, pp. 16621671.Google Scholar
Lengerich, B.J., Maas, A.L. and Potts, C. (2017). Retrofitting distributional embeddings to knowledge graphs with functional relations. In Proceedings of the 27th International Conference on Computational Linguistics (COLING), Santa Fe, NM, USA. Association for Computational Linguistics.Google Scholar
Li, J. and Jurafsky, D. (2015). Do multi-sense embeddings improve natural language understanding? In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing (EMNLP), Lisbon, Portugal. Association for Computational Linguistics, pp. 1722–1732.Google Scholar
Lin, D. (1998). An information-theoretic definition of similarity. In Proceedings of the Fifteenth International Conference on Machine Learning (ICML), vol. 98, San Francisco, CA, USA. Morgan Kaufmann Publishers Inc., pp. 296304.Google Scholar
Lin, D. and Pantel, P. (2001). Dirt – discovery of inference rules from text. In Proceedings of the Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (SIGKDD), San Francisco, CA, USA. Association for Computing Machinery, pp. 323–328.CrossRefGoogle Scholar
Liu, X., Nie, J.-Y. and Sordoni, A. (2016). Constraining word embeddings by prior knowledge–application to medical information retrieval. In Information Retrieval Technology. Beijing, China: Springer International Publishing, pp. 155167.CrossRefGoogle Scholar
Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., Levy, O., Lewis, M., Zettlemoyer, L. and Stoyanov, V. (2019). Roberta: A robustly optimized BERT pretraining approach. arXiv preprint arXiv:1907.11692.Google Scholar
Loureiro, D. and Jorge, A. (2019). Language modelling makes sense: Propagating representations through wordnet for full-coverage word sense disambiguation. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics (ACL), Florence, Italy. Association for Computational Linguistics, pp. 5682–5691.CrossRefGoogle Scholar
Luong, T., Socher, R. and Manning, C. (2013). Better word representations with recursive neural networks for morphology. In Proceedings of the Seventeenth Conference on Computational Natural Language Learning (CoNLL), Sofia, Bulgaria. Association for Computational Linguistics, pp. 104–113.Google Scholar
Mancini, M., Camacho-Collados, J., Iacobacci, I. and Navigli, R. (2017). Embedding words and senses together via joint knowledge-enhanced training. In Proceedings of the 21st Conference on Computational Natural Language Learning (CoNLL), Vancouver, Canada. Association for Computational Linguistics, pp. 100–111.CrossRefGoogle Scholar
Maneewongvatana, S. and Mount, D.M. (1999). It’s okay to be skinny, if your friends are fat. In Center for Geometric Computing 4th Annual Workshop on Computational Geometry, vol. 2, pp. 1–8.Google Scholar
Mikolov, T., Chen, K., Corrado, G. and Dean, J. (2013a). Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781.Google Scholar
Mikolov, T., Le, Q.V. and Sutskever, I. (2013b). Exploiting similarities among languages for machine translation. arXiv preprint arXiv:1309.4168.Google Scholar
Miller, G.A. (1998). WordNet: An Electronic Lexical DatabaseCambridge, MA: MIT Press.Google Scholar
Mrkšic, N., OSéaghdha, D., Thomson, B., Gašic, M., Rojas-Barahona, L., Su, P.-H., Vandyke, D., Wen, T.-H. and Young, S. (2016). Counter-fitting word vectors to linguistic constraints. In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT), San Diego, California. Association for Computational Linguistics, pp. 142–148.CrossRefGoogle Scholar
Pavlick, E., Rastogi, P., Ganitkevitch, J., Van Durme, B. and Callison-Burch, C. (2015). PPDB 2.0: Better paraphrase ranking, fine-grained entailment relations, word embeddings, and style classification. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers) (ACL-IJCNLP), Beijing, China. Association for Computational Linguistics, pp. 425–430.Google Scholar
Pennington, J., Socher, R. and Manning, C. (2014). GloVe: Global vectors for word representation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), Doha, Qatar. Association for Computational Linguistics, pp. 15321543.CrossRefGoogle Scholar
Peters, M.E., Neumann, M., Iyyer, M., Gardner, M., Clark, C., Lee, K. and Zettlemoyer, L. (2018). Deep contextualized word representations. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers) (NAACL), New Orleans, Louisiana. Association for Computational Linguistics.Google Scholar
Qiu, L., Tu, K. and Yu, Y. (2016). Context-dependent sense embedding. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing (EMNLP), Austin, Texas. Association for Computational Linguistics, pp. 183191.CrossRefGoogle Scholar
Quirk, C., Brockett, C. and Dolan, W.B. (2004). Monolingual machine translation for paraphrase generation. In Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing (EMNLP), Barcelona, Spain. Association for Computational Linguistics, pp. 142149.Google Scholar
Radford, A., Wu, J., Child, R., Luan, D., Amodei, D. and Sutskever, I. (2019). Language models are unsupervised multitask learners. OpenAI Blog 1(8), 9.Google Scholar
Radinsky, K., Agichtein, E., Gabrilovich, E. and Markovitch, S. (2011). A word at a time: Computing word relatedness using temporal semantic analysis. In Proceedings of the 20th International Conference on World Wide Web (WWW). New York, NY, USA: Association for Computing Machinery, pp. 337–346.CrossRefGoogle Scholar
Reisinger, J. and Mooney, R.J. (2010). Multi-prototype vector-space models of word meaning. In Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL-HLT), Los Angeles, CA, USA. Association for Computational Linguistics, pp. 109–117.Google Scholar
Remus, S. and Biemann, C. (2018). Retrofitting word representations for unsupervised sense aware word similarities. In Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC), Miyazaki, Japan. European Language Resources Association.Google Scholar
Sanh, V., Debut, L., Chaumond, J. and Wolf, T. (2019). Distilbert, a distilled version of BERT: smaller, faster, cheaper and lighter. arXiv preprint arXiv:1910.01108.Google Scholar
Santos, J., Consoli, B. and Vieira, R. (2020). Word embedding evaluation in downstream tasks and semantic analogies. In Proceedings of the Twelfth Language Resources and Evaluation Conference (LREC), Marseille, France. European Language Resources Association, pp. 4828–4834.Google Scholar
Scarlini, B., Pasini, T. and Navigli, R. (2020). Sensembert: Context-enhanced sense embeddings for multilingual word sense disambiguation. Proceedings of the AAAI Conference on Artificial Intelligence (AAAI) 34(05), 87588765.CrossRefGoogle Scholar
Shi, W., Chen, M., Zhou, P. and Chang, K.-W. (2019). Retrofitting contextualized word embeddings with paraphrases. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), Hong Kong, China. Association for Computational Linguistics.CrossRefGoogle Scholar
Smith, S.L., Turban, D.H., Hamblin, S. and Hammerla, N.Y. (2017). Offline bilingual word vectors, orthogonal transformations and the inverted softmax. In 5th International Conference on Learning Representations (ICLR), Toulon, France. OpenReview.net.Google Scholar
Sun, F., Guo, J., Lan, Y., Xu, J. and Cheng, X. (2016). Inside out: Two jointly predictive models for word representations and phrase representations. In Proceedings of the AAAI Conference on Artificial Intelligence (AAAI), vol. 30, Phoenix, AZ, USA. AAAI Press.CrossRefGoogle Scholar
Turney, P.D. (2001). Mining the web for synonyms: PMI-IR versus LSA on TOEFL. In Proceedings of the 12th European Conference on Machine Learning (ECML). Berlin, Heidelberg: Springer Berlin Heidelberg, pp. 491–502.CrossRefGoogle Scholar
Wiedemann, G., Remus, S., Chawla, A. and Biemann, C. (2019). Does BERT make any sense? Interpretable word sense disambiguation with contextualized embeddings. In Proceedings of the 15th Conference on Natural Language Processing (KONVENS), Erlangen, Germany. German Society for Computational Linguistics & Language Technology.Google Scholar
Wu, Z. and Palmer, M. (1994). Verb semantics and lexical selection. In Proceedings of the 32nd Annual Meeting of the Association for Computational Linguistics (ACL), Las Cruces, NM, USA. Association for Computational Linguistics, pp. 133138.CrossRefGoogle Scholar
Xing, C., Wang, D., Liu, C. and Lin, Y. (2015). Normalized word embedding and orthogonal transform for bilingual word translation. In Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT), Denver, CO, USA. Association for Computational Linguistics, pp. 1006–1011.CrossRefGoogle Scholar
Yen, T.-Y., Lee, Y.-Y., Huang, H.-H. and Chen, H.-H. (2018). That makes sense: Joint sense retrofitting from contextual and ontological information. In Companion Proceedings of the Web Conference 2018 (WWW), Lyon, France. International World Wide Web Conferences Steering Committee, pp. 15–16.CrossRefGoogle Scholar
Yin, Z. and Shen, Y. (2018). On the dimensionality of word embedding. In Proceedings of the 32nd International Conference on Neural Information Processing Systems (NIPS), vol. 31, Montréal, Canada. Curran Associates, Inc., pp. 895–906.Google Scholar
Yu, M. and Dredze, M. (2014). Improving lexical embeddings with semantic knowledge. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers) (ACL), Baltimore, MD, USA. Association for Computational Linguistics, pp. 545550.CrossRefGoogle Scholar