Hostname: page-component-cd9895bd7-p9bg8 Total loading time: 0 Render date: 2024-12-19T14:46:34.041Z Has data issue: false hasContentIssue false

Neural text normalization with adapted decoding and POS features

Published online by Cambridge University Press:  20 August 2019

T. Ruzsics*
Affiliation:
URPP Language and Space, University of Zurich, Zurich, Switzerland
M. Lusetti
Affiliation:
Institute of Romance Studies, University of Zurich, Zurich, Switzerland
A. Göhring
Affiliation:
Institute of Romance Studies, University of Zurich, Zurich, Switzerland Institute of Computational Linguistics, University of Zurich, Zurich, Switzerland
T. Samardžić
Affiliation:
URPP Language and Space, University of Zurich, Zurich, Switzerland
E. Stark
Affiliation:
Institute of Romance Studies, University of Zurich, Zurich, Switzerland
*
*Corresponding author. Email: [email protected]

Abstract

Text normalization is the task of mapping noncanonical language, typical of speech transcription and computer-mediated communication, to a standardized writing. This task is especially important for languages such as Swiss German, with strong regional variation and no written standard. In this paper, we propose a novel solution for normalizing Swiss German WhatsApp messages using the encoder–decoder neural machine translation (NMT) framework. We enhance the performance of a plain character-level NMT model with the integration of a word-level language model and linguistic features in the form of part-of-speech (POS) tags. The two components are intended to improve the performance by addressing two specific issues: the former is intended to improve the fluency of the predicted sequences, whereas the latter aims at resolving cases of word-level ambiguity. Our systematic comparison shows that our proposed solution results in an improvement over a plain NMT system and also over a comparable character-level statistical machine translation system, considered the state of the art in this task till recently. We perform a thorough analysis of the compared systems’ output, showing that our two components produce indeed the intended, complementary improvements.

Type
Article
Copyright
© Cambridge University Press 2019 

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Footnotes

*

This research is funded by the Swiss National Science Foundation, project “What’s Up, Switzerland? Language, Individuals and Ideologies in Mobile Messaging” (Sinergia: CRSII1_160714).

References

Bahdanau, D., Cho, K., and Bengio, Y. (2014). Neural machine translation by jointly learning to align and translate. CoRR, abs/1409.0473.Google Scholar
Bollmann, M. and Søgaard, A. (2016). Improving historical spelling normalization with bi-directional LSTMs and multi-task learning. In Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers. The COLING 2016 Organizing Committee, pp. 131139.Google Scholar
Cho, K., van Merrienboer, B., Gulcehre, C., Bahdanau, D., Bougares, F., Schwenk, H., and Bengio, Y. (2014). Learning phrase representations using RNN encoder–decoder for statistical machine translation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), Doha, Qatar. Association for Computational Linguistics, pp. 17241734.CrossRefGoogle Scholar
Gesmundo, A. and Samardžić, T. (2012). Lemmatisation as a tagging task. In Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), Jeju Island, Korea: Association for Computational Linguistics, pp. 368372.Google Scholar
Gulcehre, C., Firat, O., Xu, K., Cho, K., and Bengio, Y. (2016). On integrating a language model into neural machine translation. Computer Speech and Language, 45, 137148.CrossRefGoogle Scholar
Heafield, K. (2011). KenLM: faster and smaller language model queries. In Proceedings of the EMNLP 2011 Sixth Workshop on Statistical Machine Translation, Edinburgh, Scotland, United Kingdom, pp. 187197.Google Scholar
Hochreiter, S. and Schmidhuber, J. (1997). Long short-term memory. Neural Comput. 9(8), 17351780.CrossRefGoogle ScholarPubMed
Honnet, P.-E., Popescu-Belis, A., Musat, C., and Baeriswyl, M. (2017). Machine translation of low-resource spoken dialects: Strategies for normalizing Swiss German. ArXiv e-prints, 1710.11035.Google Scholar
Kalchbrenner, N. and Blunsom, P. (2013). Recurrent continuous translation models. In Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, Seattle, Washington, USA: Association for Computational Linguistics, pp. 17001709.Google Scholar
Koehn, P. and Hoang, H. (2007). Factored translation models. In Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL).Google Scholar
Luong, M.-T., Pham, H., and Manning, C.D. (2015). Effective approaches to attention-based neural machine translation. In Empirical Methods in Natural Language Processing (EMNLP), Lisbon, Portugal. Association for Computational Linguistics, pp. 14121421.Google Scholar
Lusetti, M., Ruzsics, T., Göhring, A., Samardžić, T., and Stark, E. (2018). Encoder-decoder methods for text normalization. In Proceedings of the Fifth Workshop on NLP for Similar Languages, Varieties and Dialects (VarDial 2018). Association for Computational Linguistics, pp. 1828.Google Scholar
Neubig, G., Dyer, C., Goldberg, Y., Matthews, A., Ammar, W., Anastasopoulos, A., Ballesteros, M., Chiang, D., Clothiaux, D., Cohn, T., Duh, K., Faruqui, M., Gan, C., Garrette, D., Ji, Y., Kong, L., Kuncoro, A., Kumar, G., Malaviya, C., Michel, P., Oda, Y., Richardson, M., Saphra, N., Swayamdipta, S., and Yin, P. (2017). Dynet: The dynamic neural network toolkit. arXiv preprint arXiv:1701.03980.Google Scholar
Rash, F. (1998). The German language in Switzerland: Multilingualism, Diglossia and Variation. Lang, Bern.Google Scholar
Ruef, B. and Ueberwasser, S. (2013). The taming of a dialect: Interlinear glossing of Swiss German text messages. In Zampieri, M. and Diwersy, S. (eds), Non-standard Data Sources in Corpus-Based Research, Aachen, Germany, pp. 6168.Google Scholar
Ruzsics, T. and Samardžić, T. (2017). Neural sequence-to-sequence learning of internal word structure. In Proceedings of the 21st Conference on Computational Natural Language Learning (CoNLL 2017), Vancouver, Canada: Association for Computational Linguistics, pp. 184194.CrossRefGoogle Scholar
Samardžić, T., Scherrer, Y., and Glaser, E. (2015). Normalising orthographic and dialectal variants for the automatic processing of Swiss German. In Proceedings of The 4th Biennial Workshop on Less-Resourced Languages. ELRA.Google Scholar
Schmid, H. (1994). Probabilistic part-of-speech tagging using decision trees. In International Conference on New Methods in Language Processing, pages 4449, Manchester, UK.Google Scholar
Sennrich, R. and Haddow, B. (2016). Linguistic input features improve neural machine translation. In Proceedings of the First Conference on Machine Translation: Volume 1, Research Papers. Association for Computational Linguistics, pp. 8391.CrossRefGoogle Scholar
Stark, E., Ueberwasser, S., and Göhring, A. (2014). Corpus “What’s up, Switzerland?”. Technical report, University of Zurich, Switzerland.Google Scholar
Stark, E., Ueberwasser, S., and Ruef, B. (2009–2015). Swiss SMS corpus, University of Zurich. https://sms.linguistik.uzh.ch.Google Scholar
Sutskever, I., Vinyals, O., and Le, Q.V. (2014). Sequence to sequence learning with neural networks. In Advances in Neural Information Processing Systems 27: Annual Conference on Neural Information Processing Systems 2014, December 8-13 2014, Montreal, Quebec, Canada, pp. 31043112.Google Scholar
Tjong Kim Sang, E., Bollmann, M., Boschker, R., Casacuberta, F., Dietz, F., Dipper, S., Domingo, M., van der Goot, R., van Koppen, M., Ljubešić, N., Östling, R., Petran, F., Pettersson, E., Scherrer, Y., Schraagen, M., Sevens, L., Tiedemann, J., Vanallemeersch, T., and Zervanou, K. (2017). The CLIN27 shared task: Translating historical text to contemporary language for improving automatic linguistic annotation. Computational Linguistics in the Netherlands Journal 7, 5364.Google Scholar
Ueberwasser, S. and Stark, E. (2017). What’s up, Switzerland? A corpus-based research project in a multilingual country. Linguistik Online, 84(5), https://doi.org/10.13092/lo.84.3849.CrossRefGoogle Scholar