Hostname: page-component-745bb68f8f-hvd4g Total loading time: 0 Render date: 2025-01-24T08:34:31.680Z Has data issue: false hasContentIssue false

From unified phrase representation to bilingual phrase alignment in an unsupervised manner

Published online by Cambridge University Press:  01 August 2022

Jingshu Liu*
Affiliation:
LS2N – UMR CNRS 6004, Université de Nantes, Nantes, France Dictanova, 6 rue René Viviani, 44200 Nantes, France
Emmanuel Morin
Affiliation:
LS2N – UMR CNRS 6004, Université de Nantes, Nantes, France Dictanova, 6 rue René Viviani, 44200 Nantes, France
Sebastian Peña Saldarriaga
Affiliation:
Dictanova, 6 rue René Viviani, 44200 Nantes, France
Joseph Lark
Affiliation:
Dictanova, 6 rue René Viviani, 44200 Nantes, France
*
*Corresponding author. E-mail: [email protected]

Abstract

Significant advances have been achieved in bilingual word-level alignment, yet the challenge remains for phrase-level alignment. Moreover, the need for parallel data is a critical drawback for the alignment task. This work proposes a system that alleviates these two problems: a unified phrase representation model using cross-lingual word embeddings as input and an unsupervised training algorithm inspired by recent works on neural machine translation. The system consists of a sequence-to-sequence architecture where a short sequence encoder constructs cross-lingual representations of phrases of any length, then an LSTM network decodes them w.r.t their contexts. After training with comparable corpora and existing key phrase extraction, our encoder provides cross-lingual phrase representations that can be compared without further transformation. Experiments on five data sets show that our method obtains state-of-the-art results on the bilingual phrase alignment task and improves the results of different length phrase alignment by a mean of 8.8 points in MAP.

Type
Article
Copyright
© The Author(s), 2022. Published by Cambridge University Press

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Agerri, R., Bermudez, J. and Rigau, G. (2014). Ixa pipeline: Efficient and ready to use multilingual nlp tools. In Chair N.C.C., Choukri K., Declerck, T., Loftsson, H., Maegaard, B., Mariani, J., Moreno, A., Odijk, J. and Piperidis, S. (eds), Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC’14), Reykjavik, Iceland. European Language Resources Association (ELRA).Google Scholar
Artetxe, M., Labaka, G. and Agirre, E. (2016). Learning principled bilingual mappings of word embeddings while preserving monolingual invariance. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing (EMNLP’16), Austin, TX, USA, pp. 2289–2294.CrossRefGoogle Scholar
Artetxe, M., Labaka, G. and Agirre, E. (2018a). Generalizing and improving bilingual word embedding mappings with a multi-step framework of linear transformations. In Proceedings of the 32nd AAAI Conference on Artificial Intelligence (AAAI’18), New Orleans, LA, USA, pp. 5012–5019.CrossRefGoogle Scholar
Artetxe, M., Labaka, G. and Agirre, E. (2018b). Unsupervised statistical machine translation. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing (EMNLP’18), Brussels, Belgium, pp. 36323642.CrossRefGoogle Scholar
Artetxe, M., Labaka, G., Agirre, E. and Cho, K. (2018c). Unsupervised neural machine translation. In Proceedings of the 6th International Conference on Learning Representations (ICLR’18), Vancouver, Canada.CrossRefGoogle Scholar
Bahdanau, D., Cho, K. and Bengio, Y. (2014). Neural machine translation by jointly learning to align and translate. CoRR, abs/1409.0473.Google Scholar
Bromley, J., Guyon, I., LeCun, Y., Säckinger, E. and Shah, R. (1993). Signature verification using a “siames” time delay neural network. International Journal of Pattern Recognition and Artificial Intelligence 7(4), 669688.CrossRefGoogle Scholar
Bullinaria, J.A. and Levy, J.P. (2007). Extracting semantic representations from word co-occurrence statistics: A computational study. Behavior Research Methods 39(3), 510526.CrossRefGoogle ScholarPubMed
Camacho-Collados, J., Pilehvar, M.T., Collier, N. and Navigli, R. (2017). SemEval-2017 task 2: Multilingual and cross-lingual semantic word similarity. In Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017), Vancouver, Canada. Association for Computational Linguistics, pp. 15–26.CrossRefGoogle Scholar
Chen, Y., Liu, Y., Cheng, Y. and Li, V.O. (2017). A teacher-student framework for zero-resource neural machine translation. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (ACL’17), Vancouver, Canada, pp. 1925–1935.CrossRefGoogle Scholar
Cho, K., van Merrienboer, B., Gulcehre, C., Bahdanau, D., Bougares, F., Schwenk, H. and Bengio, Y. (2014). Learning phrase representations using rnn encoder–decoder for statistical machine translation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP’14), Doha, Qatar, pp. 1724–1734.CrossRefGoogle Scholar
Church, K.W. and Hanks, P. (1990). Word association norms, mutual information, and lexicography. Computational Linguistics 16(1), 2229.Google Scholar
Dagan, I., Pereira, F. and Lee, L. (1994). Similarity-based estimation of word cooccurrence probabilities. In Proceedings of the 32nd Annual Meeting on Association for Computational Linguistics (ACL ’94), Stroudsburg, PA, USA, pp. 272–278.CrossRefGoogle Scholar
Del, M., Tättar, A. and Fishel, M. (2018). Phrase-based unsupervised machine translation with compositional phrase embeddings. In Proceedings of the Third Conference on Machine Translation: Shared Task Papers, Belgium, Brussels, pp. 361–367.CrossRefGoogle Scholar
Devlin, J., Chang, M.-W., Lee, K. and Toutanova, K. (2018). Bert: Pre-training of deep bidirectional transformers for language understanding. CoRR, abs/1810.04805.Google Scholar
Elman, J.L. (1990). Finding structure in time. Cognitive Science 14(2), 179211.CrossRefGoogle Scholar
Faruqui, M. and Dyer, C. (2014). Improving vector space word representations using multilingual correlation. In Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics (EACL’14), Gothenburg, Sweden, pp. 462–471.CrossRefGoogle Scholar
Firat, O., Sankaran, B., Al-Onaizan, Y., Yarman Vural, F.T. and Cho, K. (2016). Zero-resource translation with multi-lingual neural machine translation. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing (EMNLP’16), Austin, TeX, USA, pp. 268–277.CrossRefGoogle Scholar
Fung, P. (1995). Compiling bilingual lexicon entries from a non-parallel english-chinese corpus. In Proceedings of the 3rd Annual Workshop on Very Large Corpora (VLC’95), Cambridge, MA, USA, pp. 173–183.Google Scholar
Garten, J., Sagae, K., Ustun, V. and Dehghani, M. (2015). Combining distributed vector representations for words. In Proceedings of the 1st Workshop on Vector Space Modeling for Natural Language Processing, Denver, CO, USA, pp. 95101.CrossRefGoogle Scholar
Gehring, J., Auli, M., Grangier, D., Yarats, D. and Dauphin, Y.N. (2017). Convolutional sequence to sequence learning. In Proceedings of the 34th International Conference on Machine Learning (ICML’17), Sydney, Australia, pp. 12431252.Google Scholar
Goikoetxea, J., Agirre, E., and Soroa, A. (2016). Single or multiple? combining word representations independently learned from text and wordnet. In Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence (AAAI’16), Phoenix, AZ, USA, pp. 2608–2614.CrossRefGoogle Scholar
Goller, C. and Küchler, A. (1996). Learning task-dependent distributed representations by backpropagation through structure. In Proceedings of International Conference on Neural Networks (ICNN’96), Washington, DC, USA, pp. 347–352.CrossRefGoogle Scholar
Grave, E., Bojanowski, P., Gupta, P., Joulin, A. and Mikolov, T. (2018). Learning word vectors for 157 languages. In Proceedings of 11th Edition of the Language Resources and Evaluation Conference (LREC’18), Miyazaki, Japan, pp. 3483–3487.Google Scholar
Grefenstette, G. (1999). The world wide web as a resource for example-based machine translation tasks. In Proceedings of the ASLIB Conference on Translating and the Computer 21, London, UK.Google Scholar
Harris, Z. (1954). Distributional structure. Word 10(2–3), 146162.CrossRefGoogle Scholar
Hazem, A. and Daille, B. (2018). Word embedding approach for synonym extraction of multi-word terms. In Proceedings of the 11th edition of the Language Resources and Evaluation Conference (LREC’18), Miyazaki, Japan, pp. 297–303.Google Scholar
Hazem, A. and Morin, E. (2016). Efficient data selection for bilingual terminology extraction from comparable corpora. In Proceedings of the 26th International Conference on Computational Linguistics (COLING’16), Osaka, Japan, pp. 3401–3411.Google Scholar
Hazem, A. and Morin, E. (2017). Bilingual word embeddings for bilingual terminology extraction from specialized comparable corpora. In Proceedings of the 8th International Joint Conference on Natural Language Processing (IJCNLP’17), Taipei, Taiwan, pp. 685–693.Google Scholar
He, D., Xia, Y., Qin, T., Wang, L., Yu, N., Liu, T.-Y. and Ma, W.-Y. (2016). Dual learning for machine translation. In Advances in Neural Information Processing Systems 29 (NIPS’16), pp. 820828.Google Scholar
Huang, J., Cai, X. and Church, K. (2020). Improving bilingual lexicon induction for low frequency words. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), Online. Association for Computational Linguistics, pp. 1310–1314.CrossRefGoogle Scholar
Irsoy, O. and Cardie, C. (2014). Deep recursive neural networks for compositionality in language. In Advances in Neural Information Processing Systems 27 (NIPS’14), pp. 20962104.Google Scholar
Johnson, M., Schuster, M., Le, Q.V., Krikun, M., Wu, Y., Chen, Z., Thorat, N., Viégas, F., Wattenberg, M., Corrado, G., Hughes, M. and Dean, J. (2017). Google’s multilingual neural machine translation system: Enabling zero-shot translation. Transactions of the Association for Computational Linguistics 5, 339351.CrossRefGoogle Scholar
Korkontzelos, I., Zesch, T., Zanzotto, F.M. and Biemann, C. (2013). Semeval-2013 task 5: Evaluating phrasal semantics. In Second Joint Conference on Lexical and Computational Semantics (SEM), Volume 2: Proceedings of the Seventh International Workshop on Semantic Evaluation (SemEval 2013), Atlanta, Georgia, USA. Association for Computational Linguistics, pp. 39–47.Google Scholar
Kudo, T. and Richardson, J. (2018). SentencePiece: A simple and language independent subword tokenizer and detokenizer for neural text processing. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, Brussels, Belgium. Association for Computational Linguistics, pp. 66–71.CrossRefGoogle Scholar
Lample, G. and Conneau, A. (2019). Cross-lingual language model pretraining. CoRR, abs/1901.07291.Google Scholar
Lample, G., Conneau, A., Denoyer, L. and Ranzato, M. (2018). Unsupervised machine translation using monolingual corpora only. In Proceedings of the 6th International Conference on Learning Representations (ICLR’18), Vancouver, Canada.Google Scholar
Laville, M., Hazem, A., Morin, E. and Langlais, P. (2020). Data selection for bilingual lexicon induction from specialized comparable corpora. In Proceedings of the 28th International Conference on Computational Linguistics, Barcelona, Spain (Online). International Committee on Computational Linguistics, pp. 6002–6012.CrossRefGoogle Scholar
Lazaridou, A., Dinu, G. and Baroni, M. (2015). Hubness and pollution: Delving into cross-space mapping for zero-shot learning. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (ACL-IJCNLP’15), Beijing, China, pp. 270–280.CrossRefGoogle Scholar
Le, P. and Zuidema, W. (2014). The inside-outside recursive neural network model for dependency parsing. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP’14), Doha, Qatar, pp. 729–739.CrossRefGoogle Scholar
Le, P. and Zuidema, W. (2015). Compositional distributional semantics with long short term memory. In Proceedings of the Fourth Joint Conference on Lexical and Computational Semantics, Denver, CO, USA, pp. 10–19.CrossRefGoogle Scholar
Lee, J., Cho, K. and Hofmann, T. (2017). Fully character-level neural machine translation without explicit segmentation. Transactions of the Association for Computational Linguistics 5, 365378.CrossRefGoogle Scholar
Lin, Z., Feng, M., dos Santos, C.N., Yu, M., Xiang, B., Zhou, B. and Bengio, Y. (2017). A structured self-attentive sentence embedding. CoRR, abs/1703.03130.Google Scholar
Liu, J., Morin, E. and Peña Saldarriaga, S. (2018). Towards a unified framework for bilingual terminology extraction of single-word and multi-word terms. In Proceedings of the 27th International Conference on Computational Linguistics (COLING’18), Santa Fe, NM, USA, pp. 2855–2866.Google Scholar
Liu, J., Morin, E., Peña Saldarriaga, S. and Lark, J. (2020). A unified and unsupervised framework for bilingual phrase alignment on specialized comparable corpora. In 24th European Conference on Artificial Intelligence (ECAI), Santiago de Compostela, Spain.Google Scholar
Luong, M.-T., Pham, H. and Manning, C.D. (2015). Effective approaches to attention-based neural machine translation. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing (EMNLP’15), Lisbon, Portugal, pp. 1412–1421.CrossRefGoogle Scholar
Manning, C.D., Raghavan, P. and Schütze, H. (2008). An Introduction to Information Retrieval. New York: Cambridge University Press.CrossRefGoogle Scholar
Mikolov, T., Le, Q.V. and Sutskever, I. (2013a). Exploiting similarities among languages for machine translation. CoRR, abs/1309.4168.Google Scholar
Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S. and Dean, J. (2013b). Distributed representations of words and phrases and their compositionality. In Advances Neural Information Processing Systems 26 (NIPS’13), pp. 31113119.Google Scholar
Mitchell, J. and Lapata, M. (2009). Language models based on semantic composition. In Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing (EMNLP’09), Singapore, pp. 430–439.CrossRefGoogle Scholar
Morin, E. and Daille, B. (2012). Revising the compositional method for terminology acquisition from comparable corpora. In Proceedings of the 24rd International Conference on Computational Linguistics (COLING’12), Mumbai, India, pp. 1797–1810.Google Scholar
Niwa, Y. and Nitta, Y. (1994). Co-occurrence vectors from corpora vs. distance vectors from dictionaries. In Proceedings of the 15th Conference on Computational Linguistics (COLING’94), Kyoto, Japan, pp. 304–309.CrossRefGoogle Scholar
Paulus, R., Socher, R. and Manning, C.D. (2014). Global belief recursive neural networks. In Advances in Neural Information Processing Systems 27 (NIPS’14), pp. 28882896.Google Scholar
Peng, X., Lin, C. and Stevenson, M. (2021). Cross-lingual word embedding refinement by $\ell_{1}$ norm optimisation. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Online. Association for Computational Linguistics, pp. 2690–2701.Google Scholar
Pennington, J., Socher, R. and Manning, C. (2014). Glove: Global vectors for word representation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP’14), Doha, Qatar, pp. 1532–1543.CrossRefGoogle Scholar
Peters, M., Neumann, M., Iyyer, M., Gardner, M., Clark, C., Lee, K. and Zettlemoyer, L. (2018). Deep contextualized word representations. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL’18), New Orleans, LA, USA, pp. 22272237.CrossRefGoogle Scholar
Radford, A., Wu, J., Child, R., Luan, D., Amodei, D. and Sutskever, I. (2019). Language models are unsupervised multitask learners. https://github.com/openai/gpt-2.Google Scholar
Rajpurkar, P., Zhang, J., Lopyrev, K. and Liang, P. (2016). Squad: 100,000+ questions for machine comprehension of text. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing (EMNLP’16), Austin, TX, USA, pp. 2383–2392.CrossRefGoogle Scholar
Rapp, R. (1999). Automatic identification of word translations from unrelated english and german corpora. In Proceedings of the 37th Annual Meeting of the Association for Computational Linguistics (ACL’99), College Park, Maryland, USA, pp. 519–526.CrossRefGoogle Scholar
Robitaille, X., Sasaki, Y., Tonoike, M., Sato, S. and Utsuro, T. (2006). Compiling french-japanese terminologies from the web. In Proceedings of the 11th Conference of the European Chapter of the Association for Computational Linguistics (EACL’06), Trento, Italy, pp. 225232.Google Scholar
Saha, A., Khapra, M.M., Chandar, S., Rajendran, J. and Cho, K. (2016). A correlational encoder decoder architecture for pivot based sequence generation. In Proceedings of the 26th International Conference on Computational Linguistics (COLING’16), Osaka, Japan, pp. 109–118.Google Scholar
Sennrich, R., Haddow, B. and Birch, A. (2016). Improving neural machine translation models with monolingual data. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (ACL’16), Berlin, Germany, pp. 86–96.CrossRefGoogle Scholar
Shigeto, Y., Suzuki, I., Hara, K., Shimbo, M. and Matsumoto, Y. (2015). Ridge regression, hubness, and zero-shot learning. In Proceedings of the Joint European Conference on Machine Learning and Knowledge Discovery in Databases (ECML-PKDD’15), Porto, Portugal, pp. 135151.CrossRefGoogle Scholar
Smith, S.L., Turban, D.H.P., Hamblin, S. and Hammerla, N.Y. (2017). Offline bilingual word vectors, orthogonal transformations and the inverted softmax. In Proceedings of the 5th International Conference on Learning Representations (ICLR’17), Toulon, France.Google Scholar
Socher, R., Bauer, J., Manning, C.D. and Andrew, Y. N. (2013a). Parsing with compositional vector grammars. In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (ACL’13), Sofia, Bulgaria, pp. 455465.Google Scholar
Socher, R., Manning, C.D. and Ng, A.Y. (2010). Learning continuous phrase representations and syntactic parsing with recursive neural networks. In Proceedings of the NIPS-2010 Deep Learning and Unsupervised Feature Learning Workshop, Vancouver, Canada, pp. 1–9.Google Scholar
Socher, R., Perelygin, A., Wu, J., Chuang, J., Manning, C.D., Ng, A. and Potts, C. (2013b). Recursive deep models for semantic compositionality over a sentiment treebank. In Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing (EMNLP’13), Seattle, WA, USA, pp. 1631–1642.Google Scholar
Sutskever, I., Vinyals, O. and Le, Q.V. (2014). Sequence to sequence learning with neural networks. In Advances in Neural Information Processing Systems 27 (NIPS’14), pp. 31043112.Google Scholar
Tanaka, T. (2002). Measuring the similarity between compound nouns in different languages using non-parallel corpora. In Proceedings of the 19th International Conference on Computational Linguistics (COLING’02), Stroudsburg, PA, USA, pp. 1–7.CrossRefGoogle Scholar
Turney, P.D. and Pantel, P. (2010). From frequency to meaning: Vector space models of semantics. Journal of Artificial Intelligence Research 37(1), 141188.CrossRefGoogle Scholar
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, L.u. and Polosukhin, I. (2017). Attention is all you need. In Advances in Neural Information Processing Systems 30 (NIPS’17), pp. 5998–6008.Google Scholar
Vincent, P., Larochelle, H., Bengio, Y. and Manzagol, P.-A. (2008). Extracting and composing robust features with denoising autoencoders. In Proceedings of the 25th International Conference on Machine Learning (ICML’08), New York, NY, USA, pp. 1096–1103.CrossRefGoogle Scholar
Wang, A., Singh, A., Michael, J., Hill, F., Levy, O. and Bowman, S. (2018). GLUE: A multi-task benchmark and analysis platform for natural language understanding. In Proceedings of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP, Brussels, Belgium, pp. 353–355.CrossRefGoogle Scholar
Wang, L., Li, Y. and Lazebnik, S. (2016). Learning deep structure-preserving image-text embeddings. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR’16), Las Vegas, NV, USA, pp. 50055013.CrossRefGoogle Scholar
Williams, A., Nangia, N. and Bowman, S. (2018). A broad-coverage challenge corpus for sentence understanding through inference. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics (ACL’18), New Orleans, LA, USA, pp. 1112–1122.CrossRefGoogle Scholar
Wu, J., Wang, X. and Wang, W.Y. (2019). Extract and edit: An alternative to back-translation for unsupervised neural machine translation. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL’19), Minneapolis, Minnesota, pp. 1173–1183.CrossRefGoogle Scholar
Xing, C., Wang, D., Liu, C. and Lin, Y. (2015). Normalized word embedding and orthogonal transform for bilingual word translation. In Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL’15), Denver, CO, USA, pp. 1006–1011.CrossRefGoogle Scholar
Yang, Z., Chen, W., Wang, F. and Xu, B. (2018). Unsupervised neural machine translation with weight sharing. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (ACL’18), Melbourne, Australia, pp. 46–55.CrossRefGoogle Scholar
Zagoruyko, S. and Komodakis, N. (2015). Learning to compare image patches via convolutional neural networks. In Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR’15), Boston, MA, USA, pp. 885–894.CrossRefGoogle Scholar
Zellers, R., Bisk, Y., Schwartz, R. and Choi, Y. (2018). Swag: A large-scale adversarial dataset for grounded commonsense inference. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing (EMNLP’18), Brussels, Belgium, pp. 93–104.CrossRefGoogle Scholar
Zhang, J. and Zong, C. (2016). Exploiting source-side monolingual data in neural machine translation. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing (EMNLP’16), Austin, TX, USA, pp. 1535–1545.CrossRefGoogle Scholar
Zhang, Y., Gaddy, D., Barzilay, R. and Jaakkola, T. (2016). Ten pairs to tag – multilingual pos tagging via coarse mapping between embeddings. In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL’16), San Diego, CA, USA, pp. 1307–1317.CrossRefGoogle Scholar