Reinforcement learning approaches to natural language generation in interactive systems

doi:10.1017/CBO9780511844492.007

7 - Reinforcement learning approaches to natural language generation in interactive systems

from Part III - Handling uncertainty

Published online by Cambridge University Press: 05 July 2014

Oliver Lemon ,

Srinivasan Janarthanam and

Verena Rieser

Edited by

Amanda Stent and

Srinivas Bangalore

Show author details

Oliver Lemon: Affiliation:
Heriot-Watt University
Srinivasan Janarthanam: Affiliation:
Heriot-Watt University
Verena Rieser: Affiliation:
Heriot-Watt University
Amanda Stent: Affiliation:
AT&T Research, Florham Park, New Jersey
Srinivas Bangalore: Affiliation:
AT&T Research, Florham Park, New Jersey

Book contents

Get access

Summary

In this chapter we will describe a new approach to generating natural language in interactive systems – one that shares many features with more traditional planning approaches but that uses statistical machine learning models to develop adaptive natural language generation (NLG) components for interactive applications. We employ statistical models of users, of generation contexts, and of natural language itself. This approach has several potential advantages: the ability to train models on real data, the availability of precise mathematical methods for optimization, and the capacity to adapt robustly to previously unseen situations. Rather than emulating human behavior in generation (which can be sub-optimal), these methods can find strategies for NLG that improve on human performance. Recently, some very encouraging test results have been obtained with real users of systems developed using these methods.

In this chapter we will explain the motivations behind this approach, and will present several case studies, with reference to recent empirical results in the areas of information presentation and referring expression generation, including new work on the generation of temporal referring expressions. Finally, we provide a critical outlook for future work on statistical approaches to adaptive NLG.

Type: Chapter
Information: Natural Language Generation in Interactive Systems , pp. 151 - 179

DOI: https://doi.org/10.1017/CBO9780511844492.007 [Opens in a new window]

Publisher: Cambridge University Press

Print publication year: 2014

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Book purchase

Temporarily unavailable

References

Ai, H. and Litman, D. (2007). Knowledge consistent user simulations for dialog systems. In Proceedings of the International Conference on Spoken Language Processing (INTERSPEECH), pages 2697-2700, Antwerp, Belgium. International Speech Communication Association.Google Scholar

Akiba, T. and Tanaka, H. (1994). A Bayesian approach for user modelling in dialogue systems. In Proceedings of the International Conference on Computational Linguistics (COLING), pages 1212-1218, Kyoto, Japan. International Committee on Computational Linguistics.Google Scholar

Bell, A. (1984). Language style as audience design. Language in Society, 13(2):145-204.CrossRef Google Scholar

Belz, A. (2007). Probabilistic generation of weather forecast texts. In Proceedings of Human Language Technologies: The Conference of the North American Chapter of the Association for Computational Linguistics (HLT-NAACL), pages 164-171, Rochester, NY. Association for Computational Linguistics.Google Scholar

Boidin, C., Rieser, V., van der Plas, L., Lemon, O., and Chevelu, J. (2009). Predicting how it sounds: Re-ranking dialogue prompts based on TTS quality for adaptive spoken dialogue systems. In Proceedings of the International Conference on Spoken Language Processing (INTERSPEECH), pages 2487-2490, Brighton, UK. International Speech Communication Association.Google Scholar

Cawsey, A. (1993). User modelling in interactive explanations. User Modeling and User-Adapted Interaction, 3(3):221-247.CrossRef Google Scholar

Chung, G. (2004). Developing a flexible spoken dialog system using simulation. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL), pages 63-70, Barcelona, Spain. Association for Computational Linguistics.Google Scholar

Clark, H. H. and Murphy, G. (1982). Audience design in meaning and reference. In Le Ny, J.-F. and Kintsch, W., editors, Language and Comprehension, pages 287-299. North-Holland Publishing Company, Amsterdam, The Netherlands.Google Scholar

Clarkson, P. and Rosenfeld, R. (1997). Statistical language modeling using the CMU-Cambridge toolkit. In Proceedings of the European Conference on Speech Communication and Technology (EUROSPEECH), pages 2707-2710, Rhodes, Greece. International Speech Communication Association.Google Scholar

Cuayáhuitl, H., Renals, S., Lemon, O., and Shimodaira, H. (2005). Human-computer dialogue simulation using hidden Markov models. In Proceedings of the IEEE workshop on Automatic Speech Recognition and Understanding (ASRU), pages 290-295, San Juan, Puerto Rico. Institute of Electrical and Electronics Engineers.Google Scholar

Dale, R. (1989). Cooking up referring expressions. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL), pages 68-75, Vancouver, Canada. Association for Computational Linguistics.Google Scholar

Demberg, V. and Moore, J. (2006). Information presentation in spoken dialogue systems. In Proceedings of the Conference of the European Chapter of the Association for Computational Linguistics (EACL), pages 65-72, Trento, Italy. Association for Computational Linguistics.Google Scholar

Dethlefs, N. and Cuayáhuitl, H. (2011a). Combining hierarchical reinforcement learning and Bayesian networks for natural language generation in situated dialogue. In Proceedings of the European Workshop on Natural Language Generation (ENLG), pages 110-120, Nancy, France. Association for Computational Linguistics.Google Scholar

Dethlefs, N. and Cuayahuitl, H. (2011b). Hierarchical reinforcement learning and hidden Markov models for task-oriented natural language generation. In Proceedings of the Annual Meeting of the Association for Computational Linguistics: Human Language Technologies (ACL-HLT), pages 654-659, Portland, OR. Association for Computational Linguistics.Google Scholar

Dethlefs, N., Cuayahuitl, H., and Viethen, J. (2011). Optimising natural language generation decision making for situated dialogue. In Proceedings of the SIGdial Conference on Discourse and Dialogue (SIGDIAL), pages 78-87, Portland, OR. Association for Computational Linguistics.Google Scholar

Dethlefs, N., Hastie, H., Rieser, V., and Lemon, O. (2012). Optimising incremental dialogue decisions using information density for interactive systems. In Proceedings of the Conference on Empirical Methods in Natural Language Processing and the Conference on Computational Natural Language Learning (EMNLP-CONLL), pages 82-93, Jeju Island, Korea. Association for Computational Linguistics.Google Scholar

Eckert, W., Levin, E., and Pieraccini, R. (1997). User modeling for spoken dialogue system evaluation. In Proceedings of the IEEEworkshop on Automatic Speech Recognition and Understanding (ASRU), pages 80-87, Santa Barbara, CA. Institute of Electrical and Electronics Engineers.Google Scholar

Frampton, M. and Lemon, O. (2006). Learning more effective dialogue strategies using limited dialogue move features. In Proceedings of the International Conference on Computational Linguistics and the Annual Meeting of the Association for Computational Linguistics (COLING-ACL), pages 185-192, Sydney, Australia. Association for Computational Linguistics.Google Scholar

Georgila, K., Henderson, J., and Lemon, O. (2005). Learning user simulations for information state update dialogue systems. In Proceedings of the International Conference on Spoken Language Processing (EUROSPEECH), pages 893-896, Lisbon, Portugal. International Speech Communication Association.Google Scholar

Golland, D., Liang, P., and Klein, D. (2010). A game-theoretic approach to generating spatial descriptions. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 410-419, Boston, MA. Association for Computational Linguistics.Google Scholar

Henderson, J., Lemon, O., and Georgila, K. (2008). Hybrid reinforcement/supervised learning of dialogue policies from fixed datasets. Computational Linguistics, 34(4):487-513.CrossRef Google Scholar

Isaacs, E. A. and Clark, H. H. (1987). Reference in conversation between experts and novices. Journal of Experimental Psychology: General, 116(1):26-37.Google Scholar

Isard, A., Oberlander, J., Androutsopoulos, I., and Matheson, C. (2003). Speaking the users' languages. IEEE Intelligent Systems Magazine: Special Issue “Advances in Natural Language Processing”, 18(1):40-45.Google Scholar

Janarthanam, S., Hastie, H., Lemon, O., and Liu, X. (2011). “The day after the day after tomorrow?”: A machine learning approach to adaptive temporal expression generation: Training and evaluation with real users. In Proceedings of the SIGdial Conference on Discourse and Dialogue (SIGDIAL), pages 142-151, Portland, OR. Association for Computational Linguistics.Google Scholar

Janarthanam, S. and Lemon, O. (2009). A Wizard of Oz environment to study referring expression generation in a situated spoken dialogue task. In Proceedings of the European Workshop on Natural Language Generation (ENLG), pages 94-97, Athens, Greece. Association for Computational Linguistics.Google Scholar

Janarthanam, S. and Lemon, O. (2010a). Adaptive referring expression generation in spoken dialogue systems: Evaluation with real users. In Proceedings of the SIGdial Conference on Discourse and Dialogue (SIGDIAL), pages 124-131, Tokyo, Japan. Association for Computational Linguistics.Google Scholar

Janarthanam, S. and Lemon, O. (2010b). Learning adaptive referring expression generation policies for spoken dialogue systems. In Krahmer, E. and Theune, M., editors, Empirical Methods in Natural Language Generation, pages 67-84. Springer, Berlin.Google Scholar

Janarthanam, S. and Lemon, O. (2010c). Learning to adapt to unknown users: Referring expression generation in spoken dialogue systems. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL), pages 69-78, Uppsala, Sweden. Association for Computational Linguistics.Google Scholar

Janarthanam, S. and Lemon, O. (2011). The GRUVE challenge: Generating routes under uncertainty in virtual environments. In Proceedings of the European Workshop on Natural Language Generation (ENLG), pages 208-211, Nancy, France. Association for Computational Linguistics.Google Scholar

Jung, S., Lee, C., Kim, K., Jeong, M., and Lee, G. G. (2009). Data-driven user simulation for automated evaluation of spoken dialog systems. Computer Speech & Language, 23(4): 479-509.CrossRef Google Scholar

Koller, A. and Petrick, R. (2011). Experiences with planning for natural language generation. Computational Intelligence, 27(1):23-40.CrossRef Google Scholar

Koller, A. and Stone, M. (2007). Sentence generation as a planning problem. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL), pages 336-343, Prague, Czech Republic. Association for Computational Linguistics.Google Scholar

Lemon, O. (2008). Adaptive natural language generation in dialogue using reinforcement learning. In Proceedings of the Workshop on the Semantics and Pragmatics of Dialogue (SEMDIAL), pages 141-148, London, UK. SemDial.Google Scholar

Lemon, O. (2011). Learning what to say and how to say it: Joint optimisation of spoken dialogue management and natural language generation. Computer Speech & Language, 25(2):210-221.CrossRef Google Scholar

Lemon, O., Janarthanam, S., and Rieser, V. (2010). Generation under uncertainty. In Proceedings of the International Conference on Natural Language Generation (INLG), pages 255-260, Trim, Ireland. Association for Computational Linguistics.Google Scholar

Mairesse, F. (2008). Learning to Adapt in Dialogue Systems: Data-driven Models for Personality Recognition and Generation. PhD thesis, Department of Computer Science, University of Sheffield.Google Scholar

Mairesse, F. and Walker, M. A. (2010). Towards personality-based user adaptation: Psychologically-informed stylistic language generation. User Modeling and User-Adapted Interaction, 20(3):227-278.CrossRef Google Scholar

Maloor, P. and Chai, J. (2000). Dynamic user level and utility measurement for adaptive dialog in a help-desk system. In Proceedings of the SIGdial Workshop on Discourse and Dialogue (SIGDIAL), pages 94-101, Hong Kong. Association for Computational Linguistics.Google Scholar

Moore, J. D., Foster, M. E., Lemon, O., and White, M. (2004). Generating tailored, comparative descriptions in spoken dialogue. In Proceedings of the Florida Artificial Intelligence Research Society Conference (FLAIRS), pages 917-922, Miami Beach, FL. The Florida Artificial Intelligence Research Society.Google Scholar

Nakatsu, C. (2008). Learning contrastive connectives in sentence realization ranking. In Proceedings of the SIGdial Workshop on Discourse and Dialogue (SIGDIAL), pages 76-79, Columbus, OH. Association for Computational Linguistics.Google Scholar

Paris, C. (1988). Tailoring object descriptions to a user's level of expertise. Computational Linguistics, 14(3):64-78.Google Scholar

Polifroni, J. and Walker, M. (2006). Learning database content for spoken dialogue system design. In Proceedings of the International Conference on Language Resources andEvaluation (LREC), Genoa, Italy. European Language Resources Association.Google Scholar

Polifroni, J. and Walker, M. (2008). Intensional summaries as cooperative responses in dialogue automation and evaluation. In Proceedings of the Annual Meeting of the Association for Computational Linguistics: Human Language Technologies (ACL-HLT), pages 479-487, Columbus, OH. Association for Computational Linguistics.Google Scholar

Reiter, E., Robertson, R., and Osman, L. M. (2003). Lessons from a failure: Generating tailored smoking cessation letters. Artificial Intelligence, 144(1-2):41-58.CrossRef Google Scholar

Reiter, E., Sripada, S., Hunter, J., and Davy, I. (2005). Choosing words in computer-generated weather forecasts. Artificial Intelligence, 167:137-169.CrossRef Google Scholar

Rieser, V., Keizer, S., Lemon, O., and Liu, X. (2011). Adaptive information presentation for spoken dialogue systems: Evaluation with human subjects. In Proceedings of the European Workshop on Natural Language Generation (ENLG), pages 102-109, Nancy, France. Association for Computational Linguistics.Google Scholar

Rieser, V. and Lemon, O. (2008). Learning effective multimodal dialogue strategies from Wizard of Oz data: Bootstrapping and evaluation. In Proceedings of the Annual Meeting of the Association for Computational Linguistics: Human Language Technologies (ACL-HLT), pages 638-646, Columbus, OH. Association for Computational Linguistics.Google Scholar

Rieser, V. and Lemon, O. (2009). Natural language generation as planning under uncertainty for spoken dialogue systems. In Proceedings of the Conference of the European Chapter of the Association for Computational Linguistics (EACL), pages 683-691, Athens, Greece. Association for Computational Linguistics.Google Scholar

Rieser, V. and Lemon, O. (2011). Learning and evaluation of dialogue strategies for new applications: Empirical methods for optimization from small data sets. Computational Linguistics, 37(1):153-196.CrossRef Google Scholar

Rieser, V., Lemon, O., and Liu, X. (2010). Optimising information presentation for spoken dialogue systems. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL), pages 1009-1018, Uppsala, Sweden. Association for Computational Linguistics.Google Scholar

Schatzmann, J., Thomson, B., Weilhammer, K., Ye, H., and Young, S. (2007). Agenda-based user simulation for bootstrapping a POMDP dialogue system. In Proceedings ofHuman Language Technologies: The Conference of the North American Chapter of the Association for Computational Linguistics (HLT-NAACL), pages 149-152, Rochester, NY. Association for Computational Linguistics.Google Scholar

Schatzmann, J., Weilhammer, K., Stuttle, M., and Young, S. (2006). A survey of statistical user simulation techniques for reinforcement-learning of dialogue management strategies. The Knowledge Engineering Review, 21(2):97-126.CrossRef Google Scholar

Shapiro, D. and Langley, P. (2002). Separating skills from preference: Using learning to program by reward. In Proceedings of the International Conference on Machine Learning (ICML), pages 570-577, Sydney, Australia. The International Machine Learning Society.Google Scholar

Stent, A., Prasad, R., and Walker, M. A. (2004). Trainable sentence planning for complex information presentation in spoken dialog systems. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL), pages 79-86, Barcelona, Spain. Association for Computational Linguistics.Google Scholar

Stent, A., Walker, M., Whittaker, S., and Maloor, P. (2002). User-tailored generation for spoken dialogue: An experiment. In Proceedings of the International Conference on Spoken Language Processing (INTERSPEECH), pages 1281-1284, Denver, CO. International Speech Communication Association.Google Scholar

Stoyanchev, S. and Stent, A. (2009). Concept form adaptation in human-computer dialog. In Proceedings of the SIGdial Conference on Discourse and Dialogue (SIGDIAL), pages 144-147, London, UK. Association for Computational Linguistics.Google Scholar

Sutton, R. S. and Barto, A. G. (1998). Reinforcement Learning: An Introduction. MIT Press, Cambridge, MA.Google Scholar

van Deemter, K. (2009). Utility and language generation: The case of vagueness. Journal of Philosophical Logic, 38(6):607-632.CrossRef Google Scholar

Walker, M., Whittaker, S., Stent, A., Maloor, P., Moore, J., Johnston, M., and Vasireddy, G. (2004). Generation and evaluation of user tailored responses in multimodal dialogue. Cognitive Science, 28(5):811-840.CrossRef Google Scholar

Walker, M. A., Kamm, C., and Litman, D. (2000). Towards developing general models of usability with PARADISE. Natural Language Engineering, 6(3-1):363-377.CrossRef Google Scholar

Walker, M. A., Rambow, O., and Rogati, M. (2001). SPoT: A trainable sentence planner. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics (NAACL), Pittsburgh, PA. Association for Computational Linguistics.Google Scholar

Walker, M. A., Stent, A., Mairesse, F., and Prasad, R. (2007). Individual and domain adaptation in sentence planning for dialogue. Journal of Artificial Intelligence Research, 30(1):413-456.Google Scholar

White, M., Rajkumar, R., and Martin, S. (2007). Towards broad coverage surface realization with CCG. In Proceedings of the Workshop on Using Corpora for NLG: Language Generation and Machine Translation. Association for Computational Linguistics.Google Scholar

Winterboer, A., Hu, J., Moore, J. D., and Nass, C. (2007). The influence of user tailoring and cognitive load on user performance in spoken dialogue systems. In Proceedings of the International Conference on Spoken Language Processing (INTERSPEECH), pages 2717-2720, Antwerp, Belgium. International Speech Communication Association.Google Scholar

Young, S., Gasić, M., Keizer, S., Mairesse, F., Schatzmann, J., Thomson, B., and Yu, K. (2010). The hidden information state model: A practical framework for POMDP-based spoken dialogue management. Computer Speech & Language, 24(2):150-174.CrossRef Google Scholar

Young, S., Schatzmann, J., Weilhammer, K., and Ye, H. (2007). The hidden information state approach to dialog management. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), pages IV-149–IV-152, Honolulu, HI. Institute of Electrical and Electronics Engineers.Google Scholar