Hostname: page-component-cd9895bd7-fscjk Total loading time: 0 Render date: 2024-12-26T17:34:37.322Z Has data issue: false hasContentIssue false

Improved feature decay algorithms for statistical machine translation

Published online by Cambridge University Press:  22 September 2020

Alberto Poncelas*
Affiliation:
ADAPT Centre, Dublin City University, Glasnevin, Dublin 9, Ireland
Gideon Maillette de Buy Wenniger
Affiliation:
ADAPT Centre, Dublin City University, Glasnevin, Dublin 9, Ireland
Andy Way
Affiliation:
ADAPT Centre, Dublin City University, Glasnevin, Dublin 9, Ireland
*
*Corresponding author. E-mail: [email protected]

Abstract

In machine-learning applications, data selection is of crucial importance if good runtime performance is to be achieved. In a scenario where the test set is accessible when the model is being built, training instances can be selected so they are the most relevant for the test set. Feature Decay Algorithms (FDA) are a technique for data selection that has demonstrated excellent performance in a number of tasks. This method maximizes the diversity of the n-grams in the training set by devaluing those ones that have already been included. We focus on this method to undertake deeper research on how to select better training data instances. We give an overview of FDA and propose improvements in terms of speed and quality. Using German-to-English parallel data, first we create a novel approach that decreases the execution time of FDA when multiple computation units are available. In addition, we obtain improvements on translation quality by extending FDA using information from the parallel corpus that is generally ignored.

Type
Article
Copyright
© The Author(s), 2020. Published by Cambridge University Press

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Ambati, V., Vogel, S. and Carbonell, J.G. (2011). Multi-strategy approaches to active learning for statistical machine translation. In Proceedings of the 13th Machine Translation Summit, Xiamen, China. Carnegie Mellon University,pp. 122129.Google Scholar
Axelrod, A., He, X. and Gao, J. (2011). Domain adaptation via pseudo in-domain data selection. In Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing, Edinburgh, Scotland, UK. Association for Computational Linguistics, pp. 355362.Google Scholar
Banerjee, S. and Lavie, A. (2005). Meteor: An automatic metric for MT evaluation with improved correlation with human judgments. In Proceedings of the ACL Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and/or Summarization, Ann Arbor, Michigan. Association for Computational Linguistics, pp. 6572.Google Scholar
Biçici, E., Liu, Q. and Way, A. (2015). Parfda for fast deployment of accurate statistical machine translation systems, benchmarks, and statistics. In Proceedings of the Tenth Workshop on Statistical Machine Translation, Lisbon, Portugal. Association for Computational Linguistics, pp. 7478.CrossRefGoogle Scholar
Biçici, E. and Yuret, D. (2011). Instance selection for machine translation using feature decay algorithms. In Proceedings of the Sixth Workshop on Statistical Machine Translation, Edinburgh, Scotland. Association for Computational Linguistics, pp. 272283.Google Scholar
Biçici, E. and Yuret, D. (2015). Optimizing instance selection for statistical machine translation with feature decay algorithms. IEEE/ACM Transactions on Audio, Speech, and Language Processing 23(2), 339350.CrossRefGoogle Scholar
Bojar, O., Chatterjee, R., Federmann, C., Haddow, B., Huck, M., Hokamp, C., Koehn, P., Logacheva, V., Monz, C., Negri, M., Post, M., Scarton, C., Specia, L. and Turchi, M. (2015). Findings of the 2015 workshop on statistical machine translation. In Proceedings of the Tenth Workshop on Statistical Machine Translation, Lisbon, Portugal. Association for Computational Linguistics, pp. 146.CrossRefGoogle Scholar
Callison-Burch, C., Bannard, C. and Schroeder, J. (2005). Scaling phrase-based statistical machine translation to larger corpora and longer phrases. In Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics, Ann Arbor, USA. The Association for Machine Translation in the Americas, pp. 255262.CrossRefGoogle Scholar
Clark, J.H., Dyer, C., Lavie, A. and Smith, N.A. (2011). Better hypothesis testing for statistical machine translation: Controlling for optimizer instability. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies (Volume 2: Short Papers), Portland, Oregon. Association for Computational Linguistics,pp. 176181.Google Scholar
Dean, J. and Ghemawat, S. (2008). Mapreduce: Simplified data processing on large clusters. Communications of the ACM 51(1), 107113.CrossRefGoogle Scholar
Doddington, G. (2002). Automatic evaluation of machine translation quality using n-gram co-occurrence statistics. In Proceedings of the Second International Conference on Human Language Technology Research, San Diego, CA, pp. 138145.CrossRefGoogle Scholar
Eck, M., Vogel, S. and Waibel, A. (2005a). Low cost portability for statistical machine translation based on n-gram coverage. In Proceedings of MT Summit X, Phuket, Thailand. Citeseer, pp. 227234.Google Scholar
Eck, M., Vogel, S. and Waibel, A. (2005b). Low cost portability for statistical machine translation based on n-gram frequency and TF-IDF. In 2005 International Workshop on Spoken Language Translation, IWSLT, Pittsburgh, PA, USA, pp. 6167.Google Scholar
Eetemadi, S., Lewis, W., Toutanova, K. and Radha, H. (2015). Survey of data selection methods in statistical machine translation. Machine Translation 29(3–4), 189223.CrossRefGoogle Scholar
Freitag, M. and Al-Onaizan, Y. (2016). Fast domain adaptation for neural machine translation. arXiv preprint arXiv:1612.06897.Google Scholar
Gascó, G., Rocha, M.-A., Sanchis-Trilles, G., Andrés-Ferrer, J. and Casacuberta, F. (2012). Does more data always yield better translations? In Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics, Avignon, France. Association for Computational Linguistics, pp. 152161.Google Scholar
Germann, U. (2014). Dynamic phrase tables for machine translation in an interactive post-editing scenario. In Proceedings of the Workshop on Interactive and Adaptive Machine Translation, pp. 2031.Google Scholar
Germann, U. (2015). Sampling phrase tables for the moses statistical machine translation system. The Prague Bulletin of Mathematical Linguistics 104(1), 3950.CrossRefGoogle Scholar
Haffari, G., Roy, M. and Sarkar, A. (2009). Active learning for statistical phrase-based machine translation. In Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Boulder, Colorado. Association for Computational Linguistics, pp. 415423.CrossRefGoogle Scholar
Heafield, K. (2011). KenLM: Faster and smaller language model queries. In Proceedings of the Sixth Workshop on Statistical Machine Translation, Edinburgh, Scotland. Association for Computational Linguistics, pp. 187197.Google Scholar
Hildebrand, A.S., Eck, M., Vogel, S. and Waibel, A. (2005). Adaptation of the translation model for statistical machine translation based on information retrieval. In Proceedings of the 10th Annual Conference of the European Association for Machine Translation, Budapest, Hungary. European Association for Machine Translation, pp. 133142.Google Scholar
Hoang, C. and Simaan, K. (2014). Latent domain translation models in mix-of-domains haystack. In Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers, Dublin, Ireland. Dublin City University and Association for Computational Linguistics, pp. 1928–1939.Google Scholar
Johnson, H., Martin, J., Foster, G. and Kuhn, R. (2007). Improving translation quality by discarding most of the phrasetable. In Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL), Prague, Czech Republic. Association for Computational Linguistics, pp. 967975.Google Scholar
Khadivi, S. and Ney, H. (2005). Automatic filtering of bilingual corpora for statistical machine translation. In International Conference on Application of Natural Language to Information Systems, Alicante, Spain, pp. 263274.Google Scholar
Kirchhoff, K. and Bilmes, J. (2014). Submodularity for data selection in machine translation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), Doha, Qatar. Association for Computational Linguistics, pp. 131141.CrossRefGoogle Scholar
Klein, G., Kim, Y., Deng, Y., Senellart, J. and Rush, A.M. (2017). Opennmt: Open-source toolkit for neural machine translation. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics-System Demonstrations, Vancouver, Canada. Association for Computational Linguistics, pp. 6772.CrossRefGoogle Scholar
Kneser, R. and Ney, H. (1995). Improved backing-off for m-gram language modeling. In Proceedings of the International Conference on Acoustics, Speech, and Signal Processing, Detroit, MI. IEEE, pp. 181184.CrossRefGoogle Scholar
Koehn, P. (2004). Statistical significance tests for machine translation evaluation. In Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing, Barcelona, Spain, pp. 388395.Google Scholar
Koehn, P., Hoang, H., Birch, A., Callison-Burch, C., Federico, M., Bertoldi, N., Cowan, B., Shen, W., Moran, C., Zens, R.,Dyer, C., Bojar, O., Constantin, A. and Herbst, E. (2007). Moses: Open source toolkit for SMT. In Proceedings of 45th Annual Meeting of the ACL on Interactive Poster & Demonstration Sessions, Prague, Czech Republic. Association for Computational Linguistics, pp. 177180.Google Scholar
Lopez, A.D. (2008). Machine Translation by Pattern Matching. PhD Thesis, University of Maryland, College Park, MD, USA.Google Scholar
Luong, M.-T. and Manning, C.D. (2015). Stanford neural machine translation systems for spoken language domains. In Proceedings of the International Workshop on Spoken Language Translation, Da Nang, Vietnam, pp. 7679.Google Scholar
Manber, U. and Myers, G. (1993). Suffix arrays: A new method for on-line string searches. SIAM Journal on Computing 22(5), 935948.CrossRefGoogle Scholar
Mandal, A., Vergyri, D., Wang, W., Zheng, J., Stolcke, A., Tur, G., Hakkani-Tur, D. and Ayan, N.F. (2008). Efficient data selection for machine translation. In Spoken Language Technology Workshop, 2008, Goa, India. IEEE, pp. 261264.CrossRefGoogle Scholar
Moore, R.C. and Lewis, W. (2010). Intelligent selection of language model training data. In Proceedings of the ACL 2010 Conference Short Papers, Uppsala, Sweden. Association for Computational Linguistics, pp. 220224.Google Scholar
Och, F. (2003). Minimum error rate training in statistical machine translation. In ACL-2003: 41st Annual Meeting of the Association for Computational Linguistics, Proceedings, Sapporo, Japan. Association for Computational Linguistics, pp. 160167.CrossRefGoogle Scholar
Och, F. and Ney, H. (2003). A systematic comparison of various statistical alignment models. Computational Linguistics 29(1), 1951.CrossRefGoogle Scholar
Ozdowska, S. and Way, A. (2009). Optimal bilingual data for French-English PB-SMT. In Proceedings of the 13th Annual Meeting of the European Association for Machine Translation, Barcelona, Spain. European Association for Machine Translation, pp. 96103.Google Scholar
Papineni, K., Roukos, S., Ward, T. and Zhu, W.-J. (2002). Bleu: A method for automatic evaluation of machine translation. In Proceedings of 40th Annual Meeting of the Association for Computational Linguistics, Philadelphia, Pennsylvania, USA. Association for Computational Linguistics, pp. 311318.Google Scholar
Parcheta, Z., Sanchis-Trilles, G. and Casacuberta, F. (2018). Data selection for NMT using infrequent n-gram recovery. In Proceedings of the 21st Annual Conference of the European Association for Machine Translation, Alicante, Spain. European Association for Machine Translation, pp. 219227.Google Scholar
Poncelas, A. (2019). Improving Transductive Data Selection Algorithms for Machine Translation. PhD Thesis, Dublin City University.Google Scholar
Poncelas, A., de Buy Wenniger, G.M. and Way, A. (2018). Data selection with feature decay algorithms using an approximated target side. In 15th International Workshop on Spoken Language Translation (IWSLT 2018), Bruges, Belgium, pp. 173180.Google Scholar
Poncelas, A., de Buy Wenniger, G.M. and Way, A. (2019a). Adaptation of machine translation models with back-translated data using transductive data selection methods. In 20th International Conference on Computational Linguistics and Intelligent Text Processing, La Rochelle, France.Google Scholar
Poncelas, A., de Buy Wenniger, G.M. and Way, A. (2019b). Transductive data selection algorithms for fine-tuning neural machine translation. In Proceedings of The 8th Workshop on Patent and Scientific Literature Translation, Dublin, Ireland. European Association for Machine Translation, pp. 1323.Google Scholar
Poncelas, A., Maillette de Buy Wenniger, G. and Way, A. (2017). Applying n-gram alignment entropy to improve feature decay algorithms. The Prague Bulletin of Mathematical Linguistics 108(1), 245256.CrossRefGoogle Scholar
Poncelas, A., Maillette de Buy Wenniger, G. and Way, A. (2018). Feature decay algorithms for neural machine translation. In Proceedings of the 21st Annual Conference of the European Association for Machine Translation, Alicante, Spain. European Association for Machine Translation, pp. 239248.Google Scholar
Poncelas, A. and Way, A. (2019). Selecting artificially-generated sentences for fine-tuning neural machine translation. In Proceedings of the 12th International Conference on Natural Language Generation, Tokyo, Japan. Association for Computational Linguistics.CrossRefGoogle Scholar
Poncelas, A., Way, A. and Sarasola, K. (2018). The ADAPT system description for the IWSLT 2018 Basque to English translation task. In International Workshop on Spoken Language Translation, Bruges, Belgium, pp. 7282.Google Scholar
Poncelas, A., Way, A. and Toral, A. (2016). Extending feature decay algorithms using alignment entropy. In International Workshop on Future and Emerging Trends in Language Technology, Seville, Spain. Springer, pp. 170182.Google Scholar
Popovic, M. (2015). chrF: Character n-gram F-score for automatic MT evaluation. In Proceedings of the Tenth Workshop on Statistical Machine Translation, Lisbon, Portugal. Association for Computational Linguistics, pp. 392395.CrossRefGoogle Scholar
Salton, G. and Yang, C.-S. (1973). On the specification of term values in automatic indexing. Journal of Documentation 29(4), 351372.CrossRefGoogle Scholar
Silva, C.C., Liu, C.-H., Poncelas, A. and Way, A. (2018). Extracting in-domain training corpora for neural machine translation using data selection methods. In Proceedings of the Third Conference on Machine Translation: Research Papers, Brussels, Belgium. Association for Computational Linguistics, pp. 224231.CrossRefGoogle Scholar
Snover, M., Dorr, B., Schwartz, R., Micciulla, L. and Makhoul, J. (2006). A study of translation edit rate with targeted human annotation. In Proceedings of the 7th Conference of the Association for Machine Translation in the Americas, Cambridge, Massachusetts, USA. Association for Machine Translation in the Americas, pp. 223231.Google Scholar
Soto, X., Shterionov, D., Poncelas, A. and Way, A. (2020). Selecting backtranslated data from multiple sources for improved neural machine translation. In Proceedings of The 58th Annual Conference of the Association for Computational Linguistics, ACL, Seattle, USA. Association for Computational Linguistics (accepted).CrossRefGoogle Scholar
Taghipour, K., Afhami, N., Khadivi, S. and Shiry, S. (2010). A discriminative approach to filter out noisy sentence pairs from bilingual corpora. In Proceedings of 5th International Symposium on Telecommunications (IST 2010), Tehran, Iran. IEEE, pp. 537541.CrossRefGoogle Scholar
van der Wees, M., Bisazza, A. and Monz, C. (2017). Dynamic data selection for neural machine translation. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, Copenhagen, Denmark. Association for Computational Linguistics, pp. 14001410.CrossRefGoogle Scholar
Vapnik, V.N. (1998). Statistical Learning Theory. Hoboken, NJ, USA: Wiley-Interscience.Google Scholar
Wang, L., Wong, D.F., Chao, L.S., Lu, Y. and Xing, J. (2014). A systematic comparison of data selection criteria for smt domain adaptation. The Scientific World Journal 2014, 110.Google ScholarPubMed
Zens, R., Stanton, D. and Xu, P. (2012). A systematic comparison of phrase table pruning techniques. In Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, Jeju Island, Korea. Association for Computational Linguistics, pp. 972983.Google Scholar