Hostname: page-component-78c5997874-4rdpn Total loading time: 0 Render date: 2024-11-03T08:28:43.376Z Has data issue: false hasContentIssue false

Instance-based natural language generation

Published online by Cambridge University Press:  12 May 2010

S. VARGES
Affiliation:
Department of Information Engineering and Computer Science, University of Trento, Via Sommarive, 14 38050 Povo (TN), Italy e-mail: [email protected]
C. MELLISH
Affiliation:
Department of Computing Science, University of Aberdeen, King's College, Aberdeen AB24 3UE, UK e-mail: [email protected]

Abstract

We investigate the use of instance-based ranking methods for surface realization in natural language generation. Our approach to instance-based natural language generation (IBNLG) employs two components: a rule system that ‘overgenerates’ a number of realization candidates from a meaning representation and an instance-based ranker that scores the candidates according to their similarity to examples taken from a training corpus. We develop an efficient search technique for identifying the optimal candidate based on a novel extension of the A* algorithm. The rule system is produced automatically from a semantically annotated fragment of the Penn Treebank II containing management succession texts. We detail the annotation scheme and grammar induction algorithm and evaluate the efficiency and output of the generator. We also discuss issues such as input coverage (completeness) and fluency that are relevant to surface generation in general.

Type
Papers
Copyright
Copyright © Cambridge University Press 2010

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Aha, D. W., Kibler, D., and Albert, M. 1991. Instance-based learning agorithms. Machine Learning 7: 3766.Google Scholar
Bangalore, S., and Rambow, O. 2000. Corpus-based lexical choice in natural language generation. In Proceedings of the 38th Annual Meeting of the Association for Computational Linguistics (ACL-00), Hong Kong.Google Scholar
Bangalore, S., Rambow, O., and Whittaker, S. 2000. Evaluation metrics for generation. In Proceedings of the 1st International Conference on Natural Language Generation (INLG-00), Mitzpe Ramon, Israel.Google Scholar
Barzilay, R., and Lee, L. Learning to paraphrase: an unsupervised approach using multiple-sequence alignment. In Proceedings of the 2003 Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics (HLT/NAACL-03), Edmonton, Canada.Google Scholar
Belz, A. 2008. Automatic generation of weather forecast texts using comprehensive probabilistic generation-space models. Natural Language Engineering 14 (4): 431455. Cambridge University Press.Google Scholar
Brown, R. D. 1996. Example-based machine translation in the Pangloss system. In Proceedings of the 16th International Conference on Computational Linguistics (COLING-96), Copenhagen, Denmark.Google Scholar
Cohn, T., Callison-Burch, C., and Lapata, M. 2008. Constructing corpora for the development and evaluation of paraphrase systems. Computational Linguistics 34 (4): 597614.Google Scholar
Copestake, A., Flickinger, D., Pollard, C. J., and Sag, I. A. 2005. Minimal recursion semantics: an introduction. Research on Language and Computation 3 (4): 281332.Google Scholar
Corston-Oliver, S., Gamon, M., Ringger, E., and Moore, R. 2002. An overview of amalgam: a machine-learned generation module. In Proceedings of the Second International Natural Language Generation Conference (INLG-02), New York.Google Scholar
Daelemans, W. 1999. Memory-based Language Processing. Introduction to the special issue. Journal of Experimental and Theoretical AI 11 (3): 287467.Google Scholar
Daelemans, W., Buchholz, S., and Veenstra, J. 1999. Memory-based shallow parsing. In Proceedings of the EACL'99 workshop on Computational Natural Language Learning (CoNLL-99), Bergen, Norway.Google Scholar
Dale, R., and Reiter, E. 1995. Computational interpretations of the Gricean maxims in the generation of referring expressions. Cognitive Science 19: 233263.Google Scholar
Defense Advanced Research Projects Agency. 1995. In Proceedings of the Sixth Message Understanding Conference (MUC-6). Columbia, MD.Google Scholar
DeVault, D., Traum, D., and Artstein, R. 2008. Practical grammar-based NLG from examples. In Proceedings of the Fifth International Natural Language Generation Conference (INLG-08), Columbus, OH.Google Scholar
Forgy, C. L. 1982. Rete: a fast Algorithm for the many pattern/many object pattern match problem. Artificial Intelligence 19: 1737.Google Scholar
Huang, X., Acero, A., and Hon, H-W. 2001. Spoken Language Processing: A Guide to Theory, Algorithm, and System Development. Upper Saddle River, NJ: Prentice Hall.Google Scholar
Hanks, P., and Pustejovsky, J. 2005. A pattern dictionary for natural language processing. Revue Francaise de linguistique appliquée 10 (2): 6382.Google Scholar
Joshi, A. K. 1987. Mathematics of language. In Manaster-Ramis, A. (ed.), An Introduction to Tree Adjoining Grammars, pp. 87115. Amsterdam: John Benjamins.Google Scholar
Kay, M. 1996. Chart generation. In Proceedings of the 34th Annual Meeting of the Association for Computational Linguistics (ACL-96), Santa Cruz, CA.Google Scholar
Knight, K., and Hatzivassiloglou, V. 1995. Two-level, many-paths generation. In Proceedings of the 33th Annual Meeting of the Association for Computational Linguistics (ACL-95), Cambridge, MA.Google Scholar
Langkilde, I., and Knight, K. 1998. Generation that exploits corpus-based Statistical knowledge. In Proceedings of the 36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics (COLING/ACL-98), Montreal, Canada.Google Scholar
Knight, K., and Luk, S. K. 1994. Building a large-scale knowledge base for machine translation. In Proceedings of the National Conference on Artificial Intelligence (AAAI), Seattle, Washington.Google Scholar
Langkilde, I. 2000. Forest-based statistical sentence generation. In Proceedings of the North American Meeting of the Association of Computational Linguistics (NAACL-00), Seattle, Washington DC.Google Scholar
Mairesse, F., and Walker, M. 2008. Trainable generation of big-five personality styles through data-driven parameter estimation. In Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics (ACL-08), Columbus, OH.Google Scholar
Marciniak, T., and Strube, M. 2004. Classification-based generation using TAG. In Proceedings of the 3rd International Natural Language Generation Conference (INLG-04), Brockenhurst, UK.Google Scholar
Marcus, M. P., Santorini, B., and Marcinkiewicz, M. 1993. Building a large annotated corpus for english: the Penn Treebank. Computational Linguistics 19 (2): 313330.Google Scholar
McDonald, D. D. 1993. Issues in the choice of a source for Natural Language Generation. Computational Linguistics 19: 191197.Google Scholar
Nicolov, N., Mellish, C., and Richie, G. 1996. Approximate generation from non-hierarchical representations. In Proceedings of the 8th International Workshop on Natural Language Generation, Herstmonceux Castle, UK.Google Scholar
Paiva, D. S., and Evans, R. 2005. Empirically-based control of Natural Language Generation. In Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics (ACL-05), Ann Arbor, MI.Google Scholar
Pan, S., and Shaw, J. 2004. SEGUE: a hybrid case-based surface natural language generator. In Proceedings of the Third International Conference on Natural Language Generation (INLG-04), Brockenhurst, UK.Google Scholar
Papineni, K, Roukos, S., Ward, T., and Zhu, W-J. 2002. Bleu: a method for automatic evaluation of machine translation. In Proceedings of 40th Annual Meeting of the Association for Computational Linguistics (ACL-02), Philadelphia, PA.Google Scholar
Pearl, J. 1984. Heuristics: Intelligent Search Strategies for Computer Problem Solving. Reading MA: Addison-Wesley.Google Scholar
Russell, S., and Norvig, P. 2002. Artificial Intelligence: A Modern Approach, 2nd ed.Upper Saddle River, NJ: Prentice Hall.Google Scholar
Salton, G., and McGill, M. J. 1983. The SMART and SIRE experimental retrieval systems. In Jones, K. S., and Willett, P. (eds.), Readings in Information Retrieval, pp. 118155. McGraw-Hill, New York.Google Scholar
Sang, E. F. T. K. 2002. Memory-based shallow parsing. Journal of Machine Learning Research 2: 559594.Google Scholar
Santorini, B. 1990. Part-of-speech tagging guidelines for the Penn Treebank project. Technical Report MS-CIS-90-47, Department of Computer and Information Science, University of Pennsylvania.Google Scholar
Shemtov, H. 1998. Ambiguity Management in Natural Language Generation, PhD thesis, Department of Linguistics, Stanford University.Google Scholar
Shieber, S. M., Schabes, Y., and Pereira, F. C. N. 1995. Principles and implementation of deductive parsing. Journal of Logic Programming 24 (1–2): 336.Google Scholar
Somers, H. 1999. Review article: example-based machine translation. Machine Translation 14: 113158.Google Scholar
Stanfill, C., and Waltz, D. 1986. Toward memory-based reasoning. Communications of the ACM 29 (12): 12131228.Google Scholar
Varges, S. 2002. Fluency and completeness in instance-based natural language generation. In Proceedings of the 19th International Conference on Computational Linguistics (COLING-02), Taipei, Taiwan.Google Scholar
Varges, S. 2003. Instance-based Natural Language Generation, PhD thesis, Institute for Communicating and Collaborative Systems, School of Informatics, University of Edinburgh.Google Scholar
Varges, S., and van Deemter, K. 2005. Generating referring expressions containing quantifiers. In Proceedings of the 6th International Workshop on Computational Semantics (IWCS-6), Tilburg, The Netherlands.Google Scholar
Varges, S., and Mellish, C. 2001. Instance-based Natural Language Generation. In Proceedings of the 2nd Meeting of the North American Chapter of the Association for Computational Linguistics (NAACL-01), Pittsburgh, PA.Google Scholar
White, M. 2006. Efficient realization of coordinate structures in combinatory categorial grammar. Research on Language and Computation 4 (1): 3975.Google Scholar
Wong, Y. W., and Mooney, R. 2006. Learning for semantic parsing with Statistical Machine Translation In Proceedings of the Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics (HLT/NAACL-06), New York.Google Scholar
Wong, Y. W., and Mooney, R. 2007. Generation by inverting a semantic parser that uses statistical machine translation. In Proceedings of Human Language Technologies 2007: The Conference of the North American Chapter of the Association for Computational Linguistics, Rochester, New York.Google Scholar
XTAG Research Group. 2001. A lexicalized tree adjoining grammar for English. Technical Report IRCS-01-03, IRCS, University of Pennsylvania.Google Scholar