Neural text normalization with adapted decoding and POS features

T. Ruzsics; M. Lusetti; A. Göhring; T. Samardžić; E. Stark

doi:10.1017/S1351324919000391

Neural text normalization with adapted decoding and POS features

Published online by Cambridge University Press: 20 August 2019

T. Ruzsics ,

M. Lusetti ,

and

T. Ruzsics*: Affiliation:
URPP Language and Space, University of Zurich, Zurich, Switzerland
M. Lusetti: Affiliation:
Institute of Romance Studies, University of Zurich, Zurich, Switzerland
A. Göhring: Affiliation:
Institute of Romance Studies, University of Zurich, Zurich, Switzerland Institute of Computational Linguistics, University of Zurich, Zurich, Switzerland
T. Samardžić: Affiliation:
URPP Language and Space, University of Zurich, Zurich, Switzerland
E. Stark: Affiliation:
Institute of Romance Studies, University of Zurich, Zurich, Switzerland
*: *Corresponding author. Email: [email protected]

Article contents

Abstract
Footnotes
References

Get access

Rights & Permissions

Abstract

Text normalization is the task of mapping noncanonical language, typical of speech transcription and computer-mediated communication, to a standardized writing. This task is especially important for languages such as Swiss German, with strong regional variation and no written standard. In this paper, we propose a novel solution for normalizing Swiss German WhatsApp messages using the encoder–decoder neural machine translation (NMT) framework. We enhance the performance of a plain character-level NMT model with the integration of a word-level language model and linguistic features in the form of part-of-speech (POS) tags. The two components are intended to improve the performance by addressing two specific issues: the former is intended to improve the fluency of the predicted sequences, whereas the latter aims at resolving cases of word-level ambiguity. Our systematic comparison shows that our proposed solution results in an improvement over a plain NMT system and also over a comparable character-level statistical machine translation system, considered the state of the art in this task till recently. We perform a thorough analysis of the compared systems’ output, showing that our two components produce indeed the intended, complementary improvements.

Keywords

Text normalization Neural machine translation Swiss German

Type: Article
Information: Natural Language Engineering , Volume 25 , Special Issue 5: Natural Language Processing for Similar Languages, Varieties and Dialects , September 2019 , pp. 585 - 605

DOI: https://doi.org/10.1017/S1351324919000391 [Opens in a new window]
Copyright: © Cambridge University Press 2019

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Article purchase

Temporarily unavailable

Footnotes

This research is funded by the Swiss National Science Foundation, project “What’s Up, Switzerland? Language, Individuals and Ideologies in Mobile Messaging” (Sinergia: CRSII1_160714).

References

Bahdanau, D., Cho, K., and Bengio, Y. (2014). Neural machine translation by jointly learning to align and translate. CoRR, abs/1409.0473.Google Scholar

Bollmann, M. and Søgaard, A. (2016). Improving historical spelling normalization with bi-directional LSTMs and multi-task learning. In Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers. The COLING 2016 Organizing Committee, pp. 131–139.Google Scholar

Cho, K., van Merrienboer, B., Gulcehre, C., Bahdanau, D., Bougares, F., Schwenk, H., and Bengio, Y. (2014). Learning phrase representations using RNN encoder–decoder for statistical machine translation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), Doha, Qatar. Association for Computational Linguistics, pp. 1724–1734.CrossRef Google Scholar

Gesmundo, A. and Samardžić, T. (2012). Lemmatisation as a tagging task. In Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), Jeju Island, Korea: Association for Computational Linguistics, pp. 368–372.Google Scholar

Gulcehre, C., Firat, O., Xu, K., Cho, K., and Bengio, Y. (2016). On integrating a language model into neural machine translation. Computer Speech and Language, 45, 137–148.CrossRef Google Scholar

Heafield, K. (2011). KenLM: faster and smaller language model queries. In Proceedings of the EMNLP 2011 Sixth Workshop on Statistical Machine Translation, Edinburgh, Scotland, United Kingdom, pp. 187–197.Google Scholar

Hochreiter, S. and Schmidhuber, J. (1997). Long short-term memory. Neural Comput. 9(8), 1735–1780.CrossRef Google Scholar PubMed

Honnet, P.-E., Popescu-Belis, A., Musat, C., and Baeriswyl, M. (2017). Machine translation of low-resource spoken dialects: Strategies for normalizing Swiss German. ArXiv e-prints, 1710.11035.Google Scholar

Kalchbrenner, N. and Blunsom, P. (2013). Recurrent continuous translation models. In Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, Seattle, Washington, USA: Association for Computational Linguistics, pp. 1700–1709.Google Scholar

Koehn, P. and Hoang, H. (2007). Factored translation models. In Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL).Google Scholar

Luong, M.-T., Pham, H., and Manning, C.D. (2015). Effective approaches to attention-based neural machine translation. In Empirical Methods in Natural Language Processing (EMNLP), Lisbon, Portugal. Association for Computational Linguistics, pp. 1412–1421.Google Scholar

Lusetti, M., Ruzsics, T., Göhring, A., Samardžić, T., and Stark, E. (2018). Encoder-decoder methods for text normalization. In Proceedings of the Fifth Workshop on NLP for Similar Languages, Varieties and Dialects (VarDial 2018). Association for Computational Linguistics, pp. 18–28.Google Scholar

Neubig, G., Dyer, C., Goldberg, Y., Matthews, A., Ammar, W., Anastasopoulos, A., Ballesteros, M., Chiang, D., Clothiaux, D., Cohn, T., Duh, K., Faruqui, M., Gan, C., Garrette, D., Ji, Y., Kong, L., Kuncoro, A., Kumar, G., Malaviya, C., Michel, P., Oda, Y., Richardson, M., Saphra, N., Swayamdipta, S., and Yin, P. (2017). Dynet: The dynamic neural network toolkit. arXiv preprint arXiv:1701.03980.Google Scholar

Rash, F. (1998). The German language in Switzerland: Multilingualism, Diglossia and Variation. Lang, Bern.Google Scholar

Ruef, B. and Ueberwasser, S. (2013). The taming of a dialect: Interlinear glossing of Swiss German text messages. In Zampieri, M. and Diwersy, S. (eds), Non-standard Data Sources in Corpus-Based Research, Aachen, Germany, pp. 61–68.Google Scholar

Ruzsics, T. and Samardžić, T. (2017). Neural sequence-to-sequence learning of internal word structure. In Proceedings of the 21st Conference on Computational Natural Language Learning (CoNLL 2017), Vancouver, Canada: Association for Computational Linguistics, pp. 184–194.CrossRef Google Scholar

Samardžić, T., Scherrer, Y., and Glaser, E. (2015). Normalising orthographic and dialectal variants for the automatic processing of Swiss German. In Proceedings of The 4th Biennial Workshop on Less-Resourced Languages. ELRA.Google Scholar

Schmid, H. (1994). Probabilistic part-of-speech tagging using decision trees. In International Conference on New Methods in Language Processing, pages 44–49, Manchester, UK.Google Scholar

Sennrich, R. and Haddow, B. (2016). Linguistic input features improve neural machine translation. In Proceedings of the First Conference on Machine Translation: Volume 1, Research Papers. Association for Computational Linguistics, pp. 83–91.CrossRef Google Scholar

Stark, E., Ueberwasser, S., and Göhring, A. (2014). Corpus “What’s up, Switzerland?”. Technical report, University of Zurich, Switzerland.Google Scholar

Stark, E., Ueberwasser, S., and Ruef, B. (2009–2015). Swiss SMS corpus, University of Zurich. https://sms.linguistik.uzh.ch.Google Scholar

Sutskever, I., Vinyals, O., and Le, Q.V. (2014). Sequence to sequence learning with neural networks. In Advances in Neural Information Processing Systems 27: Annual Conference on Neural Information Processing Systems 2014, December 8-13 2014, Montreal, Quebec, Canada, pp. 3104–3112.Google Scholar

Tjong Kim Sang, E., Bollmann, M., Boschker, R., Casacuberta, F., Dietz, F., Dipper, S., Domingo, M., van der Goot, R., van Koppen, M., Ljubešić, N., Östling, R., Petran, F., Pettersson, E., Scherrer, Y., Schraagen, M., Sevens, L., Tiedemann, J., Vanallemeersch, T., and Zervanou, K. (2017). The CLIN27 shared task: Translating historical text to contemporary language for improving automatic linguistic annotation. Computational Linguistics in the Netherlands Journal 7, 53–64.Google Scholar

Ueberwasser, S. and Stark, E. (2017). What’s up, Switzerland? A corpus-based research project in a multilingual country. Linguistik Online, 84(5), https://doi.org/10.13092/lo.84.3849.CrossRef Google Scholar

Article contents

Neural text normalization with adapted decoding and POS features

Abstract

Keywords

Access options

Article purchase

Temporarily unavailable

Footnotes

References

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests