Hostname: page-component-586b7cd67f-dlnhk Total loading time: 0 Render date: 2024-11-27T19:54:20.510Z Has data issue: false hasContentIssue false

Backward and trigger-based language models for statistical machine translation

Published online by Cambridge University Press:  24 July 2013

DEYI XIONG
Affiliation:
School of Computer Science and Technology, Soochow University, Suzhou 215006, China email: [email protected], [email protected]
MIN ZHANG
Affiliation:
School of Computer Science and Technology, Soochow University, Suzhou 215006, China email: [email protected], [email protected]

Abstract

The language model is one of the most important knowledge sources for statistical machine translation. In this article, we present two extensions to standard n-gram language models in statistical machine translation: a backward language model that augments the conventional forward language model, and a mutual information trigger model which captures long-distance dependencies that go beyond the scope of standard n-gram language models. We introduce algorithms to integrate the two proposed models into two kinds of state-of-the-art phrase-based decoders. Our experimental results on Chinese/Spanish/Vietnamese-to-English show that both models are able to significantly improve translation quality in terms of BLEU and METEOR over a competitive baseline.

Type
Articles
Copyright
Copyright © Cambridge University Press 2013 

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Banerjee, S., and Lavie, A., 2005. METEOR: an automatic metric for MT evaluation with improved correlation with human judgments. In Proceedings of the ACL Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and/or Summarization, Ann Arbor, MI, pp. 6572.Google Scholar
Brants, T., Popat, A. C., Xu, P., Och, F. J., and Dean, J., 2007. Large language models in machine translation. In Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL), Prague, Czech Republic, pp. 858–67.Google Scholar
Callison-Burch, C., Koehn, P., Monz, C., Peterson, K., Przybocki, M., and Zaidan, O., 2010. Findings of the 2010 joint workshop on statistical machine translation and metrics for machine translation. In Proceedings of the Joint Fifth Workshop on Statistical Machine Translation and MetricsMATR, Uppsala, Sweden, pp. 1753.Google Scholar
Charniak, E., Knight, K., and Yamada, K., 2003. Syntax-based language models for statistical machine translation. In Proceedings of MT Summit IX, New Orleans, USA, pp. 4046.Google Scholar
Chen, B., Xiong, D., Zhang, M., Aw, A., and Li, H., 2008. I2r multi-pass machine translation system for iwslt 2008. In Proceeding of the International Workshop on Spoken Language Translation 2008, Hawaii, USA, pp. 4651.Google Scholar
Chiang, D., 2007. Hierarchical phrase-based translation. Computational Linguistics 33 (2): 201–28.Google Scholar
Church, K. W., and Hanks, P., 1990. Word association norms, mutual information, and lexicography. Computational Linguistics 16 (1): 22–9.Google Scholar
Clark, J. H., Dyer, C., Lavie, A., and Smith, N. A., 2011. Better hypothesis testing for statistical machine translation: controlling for optimizer instability. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, Portland, Oregon, USA, pp. 176–81.Google Scholar
Duchateau, J., Demuynck, K., and Wambacq, P., 2002. Confidence scoring based on backward language models. In Proceedings of ICASSP, Orlando, FL, pp. 221–4.Google Scholar
Emami, A., Papineni, K., and Sorensen, J., 2007. Large-scale distributed language modeling. In Proceedings of ICASSP, Honolulu, HI, pp. 3740.Google Scholar
Finch, A., and Sumita, E. 2009. Bidirectional phrase-based statistical machine translation. In Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing, Singapore, pp. 1124–32.Google Scholar
He, X., Yang, M., Gao, J., Nguyen, P., and Moore, R., 2008. Indirect-HMM-based hypothesis alignment for combining outputs from machine translation systems. In Proceedings of the 2008 Conference on Empirical Methods in Natural Language Processing, Honolulu, Hawaii, pp. 98107.Google Scholar
Koehn, P., 2005. Europarl: a parallel corpus for statistical machine translation. In the tenth Machine Translation Summit, Phuket, Thailand, pp. 7986.Google Scholar
Koehn, P., Och, F. J., and Marcu, D., 2003. Statistical phrase-based translation. In Proceedings of the 2003 Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics, Edmonton, Canada, pp. 58–54.Google Scholar
Marcus, M. P., Santorini, B., and Marcinkiewicz, M. A., 1993. Building a large annotated corpus of English: the penn treebank. Computational Linguistics 19 (2): 313–30.Google Scholar
Mauser, A., Hasan, S., and Ney, H. 2009. Extending statistical machine translation with discriminative and trigger-based lexicon models. In Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing, Singapore, pp. 210–18.Google Scholar
Och, F. J., 2003. Minimum error rate training in statistical machine translation. In Proceedings of the 41st Annual Meeting of the Association for Computational Linguistics, Sapporo, Japan, pp. 160–7.Google Scholar
Papineni, K., Roukos, S., Ward, T., and Zhu, W.-J., 2002. Bleu: a method for automatic evaluation of machine translation. In Proceedings of 40th Annual Meeting of the Association for Computational Linguistics, Philadelphia, PA, USA, pp. 311–18.Google Scholar
Post, M., and Gildea, D., 2008. Parsers as language models for statistical machine translation. In Proceedings of AMTA, Waikiki, Hawai'i, pp. 172181.Google Scholar
Raybaud, S., Lavecchia, C., Langlois, D., and Smaïli, K., 2009. New confidence measures for statistical machine translation. In Proceedings of the International Conference on Agents and Artificial Intelligence, Porto, Portugal, pp. 61–8.Google Scholar
Rosenfeld, R., Carbonell, J., and Rudnicky, A. 1994. Adaptive statistical language modeling: a maximum entropy approach. Technical Report, Carnegie Mellon University.Google Scholar
Shen, L., Xu, J., and Weischedel, R., 2008. A new string-to-dependency machine translation algorithm with a target dependency language model. In Proceedings of ACL-08: HLT, Columbus, Ohio, pp. 577–85.Google Scholar
Stolcke, A., 2002. Srilm–an extensible language modeling toolkit. In Proceedings of the 7th International Conference on Spoken Language Processing, Denver, Colorado, USA, pp. 901–4.Google Scholar
Talbot, D., and Osborne, M., 2007. Randomised language modelling for statistical machine translation. In Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics, Prague, Czech Republic, pp. 512–19.Google Scholar
Wu, D., 1996. A polynomial-time algorithm for statistical machine translation. In Proceedings of the 34th Annual Meeting of the Association for Computational Linguistics, Santa Cruz, California, USA, pp. 152–8.Google Scholar
Wu, D., 1997. Stochastic inversion transduction grammars and bilingual parsing of parallel corpora. Computational Linguistics 23 (3): 377403.Google Scholar
Xiong, D., Liu, Q., and Lin, S., 2006. Maximum entropy based phrase reordering model for statistical machine translation. In Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics, Sydney, Australia, pp. 521–8.Google Scholar
Xiong, D., Zhang, M., and Li, H., 2011. Enhancing language models in statistical machine translation with backward n-grams and mutual information triggers. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, Portland, Oregon, USA, pp. 1288–97.Google Scholar
Zhang, Y., Hildebrand, A. S., and Vogel, S., 2006. Distributed language modeling for n-best list re-ranking. In Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing, Sydney, Australia, pp. 216–23.Google Scholar
Zhou, G., 2004. Modeling of long distance context dependency. In Proceedings of Coling, Geneva, Switzerland, pp. 92–8.Google Scholar