Hostname: page-component-cd9895bd7-7cvxr Total loading time: 0 Render date: 2024-12-17T22:25:32.353Z Has data issue: false hasContentIssue false

Products of weighted logic programs

Published online by Cambridge University Press:  28 January 2011

SHAY B. COHEN
Affiliation:
School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA (e-mail: [email protected], [email protected], [email protected])
ROBERT J. SIMMONS
Affiliation:
School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA (e-mail: [email protected], [email protected], [email protected])
NOAH A. SMITH
Affiliation:
School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA (e-mail: [email protected], [email protected], [email protected])

Abstract

Weighted logic programming, a generalization of bottom-up logic programming, is a well-suited framework for specifying dynamic programming algorithms. In this setting, proofs correspond to the algorithm's output space, such as a path through a graph or a grammatical derivation, and are given a real-valued score (often interpreted as a probability) that depends on the real weights of the base axioms used in the proof. The desired output is a function over all possible proofs, such as a sum of scores or an optimal score. We describe the product transformation, which can merge two weighted logic programs into a new one. The resulting program optimizes a product of proof scores from the original programs, constituting a scoring function known in machine learning as a “product of experts.” Through the addition of intuitive constraining side conditions, we show that several important dynamic programming algorithms can be derived by applying product to weighted logic programs corresponding to simpler weighted logic programs. In addition, we show how the computation of Kullback–Leibler divergence, an information-theoretic measure, can be interpreted using product.

Type
Regular Papers
Copyright
Copyright © Cambridge University Press 2011

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Charniak, E., Knight, K. and Yamada, K. 2003. Syntax-based language models for statistical machine translation. In Proc. of the MT Summit IX, Association for Machine Translation in the Americas, Washington, DC.Google Scholar
Chiang, D. 2007. Hierarchical phrase-based translation. Computational Linguistics 33, 2, 201228.CrossRefGoogle Scholar
Cocke, J. and Schwartz, J. T. 1970. Programming Languages and their Compilers: Preliminary Notes. Courant Institute of Mathematical Sciences, New York University.Google Scholar
Cohen, S. B. and Smith, N. A. 2007. Joint morphological and syntactic disambiguation. In Proc. of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, Association for Computational Linguistics, 208217.Google Scholar
Cortes, C., Mohri, M., Rastogi, A. and Riley, M. D. 2006. Efficient computation of the relative entropy of probabilistic automata. In Proc. of LATIN 2006: Theoretical Informatics: 7th Latin American Symposium, Correa, José R., Hevia, Alejandro and Kiwi, Marcos, Eds. Lecture Notes in Computer Science, vol. 3887. Springer, 323336.CrossRefGoogle Scholar
Dempster, A., Laird, N. and Rubin, D. 1977. Maximum likelihood estimation from incomplete data via the EM algorithm. Journal of the Royal Statistical Society B 39, 138.Google Scholar
Durbin, R., Eddy, S., Krogh, A. and Mitchison, G. 1998. Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids. Cambridge University Press.CrossRefGoogle Scholar
Eisner, J. 2000. Bilexical grammars and their cubic-time parsing algorithms. In Harry Bunt and Anton Nijholt Eds. Advances in Probabilistic and Other Parsing Technologies, Kluwer Academic Publishers, 29–62.Google Scholar
Eisner, J. and Blatz, J. 2007. Program transformations for optimization of parsing algorithms and other weighted logic programs. In Proc. of Formal Grammar, CSLI Publications, 4585.Google Scholar
Eisner, J., Goldlust, E. and Smith, N. A. 2005. Compiling Comp Ling: Practical weighted dynamic programming and the Dyna language. In Proc. of the Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing, 281–290.Google Scholar
Eisner, J. and Satta, G. 1999. Efficient parsing for bilexical context-free grammars and head automaton grammars. In Proc. of the 37th Annual Meeting of the Association of Computational Linguistics, Association of Computational Linguistics, 457464.Google Scholar
Felzenszwalb, P. F. and McAllester, D. 2007. The generalized A* architecture. Journal of Artificial Intelligence Research 29, 153190.CrossRefGoogle Scholar
Gaifman, H. 1965. Dependency systems and phrase-structure systems. Information and Control 8, 3, 304337.CrossRefGoogle Scholar
Goodman, J. 1999. Semiring parsing. Computational Linguistics 25, 4, 573605.Google Scholar
Hernando, D., Crespi, V. and Cybenko, G. 2005. Efficient computation of the hidden Markov model entropy for a given observation sequence. IEEE Transactions on Information Theory 51, 7, 26812685.CrossRefGoogle Scholar
Hinton, G. E. 2002. Training products of experts by minimizing contrastive divergence. Neural Computation 14, 17711800.CrossRefGoogle ScholarPubMed
Hopcroft, J. E. and Ullman, J. D. 1979. Introduction to Automata Theory, Languages and Computation. Addison-Wesley.Google Scholar
Huang, L. and Chiang, D. 2005. Better k-best parsing. In Proc. of the 9th International Workshop on Parsing Technology, Association for Computational Linguistics, 5364.CrossRefGoogle Scholar
Hwa, R. 2004. Sample selection for statistical parsing. Computational Linguistics 30, 3, 253276.CrossRefGoogle Scholar
Kasami, T. 1965. An Efficient Recognition and Syntax-Analysis Algorithm for Context-Free Languages. Tech. Rep. AFCRL-65-758, Air Force Cambridge Research Lab.Google Scholar
Klein, D. and Manning, C. D. 2003. Fast exact inference with a factored model for natural language parsing. In Advances in Neural Information Processing Systems, vol. 15. MIT Press, 310.Google Scholar
Klein, D. and Manning, C. D. 2004. Parsing and hypergraphs. New Developments in Parsing Technology, Bunt, Harry, Carroll, John and Satta, Giorgio Eds. Kluwer Academic Publishers, 351372.CrossRefGoogle Scholar
Koehn, P., Och, F. J. and Marcu, D. 2003. Statistical phrase-based translation. In Proc. of the Human Language Technology Conference and Conference of the North American Chapter of the Association for Computational Linguistics. Association for Computational Linguistics. 4854.Google Scholar
Koller, D. and Friedman, N. 2009. Probabilistic Graphical Models: Principles and Techniques. MIT Press.Google Scholar
Kullback, S. and Leibler, R. A. 1951. On information and sufficiency. Annals of Mathematical Statistics 22, 7986.CrossRefGoogle Scholar
Liang, P., Klein, D. and Jordan, M. 2008. Agreement-based learning. In Advances in Neural Information Processing Systems 20, Platt, J., Koller, D., Singer, Y. and Roweis, S., Eds. MIT Press, Cambridge, MA, 913920.Google Scholar
Lopez, A. 2009. Translation as weighted deduction. In Proc. of the 12th Conference of the European Chapter of the Association for Computational Linguistics, Association for Computational Linguistics, 532540.Google Scholar
Manning, C. and Schütze, H. 1999. Foundations of Statistical Natural Language Processing. MIT Press.Google Scholar
McAllester, D. 2002. On the complexity analysis of static analyses. Journal of the ACM 49, 4, 512537.CrossRefGoogle Scholar
Mohri, M. 1997. Finite-state transducers in language and speech processing. Computational Linguistics 23, 2, 269311.Google Scholar
O'Sullivan, J. A. 1998. Alternating minimization algorithms: From Blahut-Armijo to expectation-maximization. In Codes, Curves and Signals: Common Threads in Communications, Vardy, A., Ed. Kluwer, 173192.CrossRefGoogle Scholar
Pereira, F. C. N. and Riley, M. D. 1997. Speech recognition by composition of weighted finite automata. In Finite-State Language Processing, Roche, E. and Schabes, Y., Eds. MIT Press, 431453.CrossRefGoogle Scholar
Pereira, F. C. N. and Schabes, Y. 1992. Inside–outside reestimation from partially bracketed corpora. In Proc. of the 30th Annual Meeting of the Association for Computational Linguistics, Association for Computational Linguistics, 128135.CrossRefGoogle Scholar
Pettorossi, A. 1999. Synthesis and transformation of logic programs using unfold/fold proofs. Journal of Logic Programming 41, 2–3 (December), 197230.CrossRefGoogle Scholar
Pettorossi, A. and Proietti, M. 1994. Transformation of logic programs: Foundations and techniques. Journal of Logic Programming 19, 261320.CrossRefGoogle Scholar
Shannon, C. 1948. A mathematical theory of communication. Bell System Technical Journal 27, 379423.CrossRefGoogle Scholar
Shieber, S. M., Schabes, Y. and Pereira, F. C. N. 1995. Principles and implementation of deductive parsing. Journal of Logic Programming 24, 1–2, 336.CrossRefGoogle Scholar
Sikkel, K. 1997. Parsing Schemata. Springer-Verlag.CrossRefGoogle Scholar
Smith, A., Cohn, T. and Osborne, M. 2005. Logarithmic opinion pools for conditional random fields. In Proc. of the 43rd Annual Meeting of the Association for Computational Linguistics, Association for Computational Linguistics, 1825.Google Scholar
Smith, D. A. and Smith, N. A. 2004. Bilingual parsing with factored estimation: Using English to parse Korean. In Proc. of the Conference on Empirical Methods in Natural Language Processing, 49–56.Google Scholar
Sutton, C. and McCallum, A. 2005. Piecewise training of undirected models. In Proc. of the 21st Conference on Uncertainty in Artificial Intelligence, 568–575.Google Scholar
Tarjan, R. E. 1981. A unified approach to path problems. Journal of the ACM 28, 3, 577593.CrossRefGoogle Scholar
Wu, D. 1997. Stochastic inversion transduction grammars and bilingual parsing of parallel corpora. Computational Linguistics 23, 3, 377404.Google Scholar
Younger, D. H. 1967. Recognition and parsing of context-free languages in time n 3. Information and Control 10, 2, 189208.CrossRefGoogle Scholar