Hostname: page-component-586b7cd67f-tf8b9 Total loading time: 0 Render date: 2024-11-27T19:31:13.877Z Has data issue: false hasContentIssue false

Extracting paraphrase patterns from bilingual parallel corpora

Published online by Cambridge University Press:  16 September 2009

SHIQI ZHAO
Affiliation:
Harbin Institute of Technology, No. 27 Jiaohua Street, Nangang District, Harbin 150001, China e-mails: [email protected], [email protected], [email protected]
HAIFENG WANG
Affiliation:
Toshiba (China) Research and Development Center, No. 1, East Chang An Ave., Dongcheng District, Beijing 100738, [email protected]
TING LIU
Affiliation:
Harbin Institute of Technology, No. 27 Jiaohua Street, Nangang District, Harbin 150001, China e-mails: [email protected], [email protected], [email protected]
SHENG LI
Affiliation:
Harbin Institute of Technology, No. 27 Jiaohua Street, Nangang District, Harbin 150001, China e-mails: [email protected], [email protected], [email protected]

Abstract

Paraphrase patterns are semantically equivalent patterns, which are useful in both paraphrase recognition and generation. This paper presents a pivot approach for extracting paraphrase patterns from bilingual parallel corpora, whereby the paraphrase patterns in English are extracted using the patterns in another language as pivots. We make use of log-linear models for computing the paraphrase likelihood between pattern pairs and exploit feature functions based on maximum likelihood estimation (MLE), lexical weighting (LW), and monolingual word alignment (MWA). Using the presented method, we extract more than 1 million pairs of paraphrase patterns from about 2 million pairs of bilingual parallel sentences. The precision of the extracted paraphrase patterns is above 78%. Experimental results show that the presented method significantly outperforms a well-known method called discovery of inference rules from text (DIRT). Additionally, the log-linear model with the proposed feature functions are effective. The extracted paraphrase patterns are fully analyzed. Especially, we found that the extracted paraphrase patterns can be classified into five types, which are useful in multiple natural language processing (NLP) applications.

Type
Papers
Copyright
Copyright © Cambridge University Press 2009

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Bannard, C., and Callison-Burch, C. 2005. Paraphrasing with bilingual parallel corpora. In Proceedings of ACL, pp. 597604, Ann Arbor, MI.Google Scholar
Bar-Haim, R., Dagan, I., Greental, I., Szpektor, I., and Friedman, M. 2007. Semantic inference at the lexical-syntactic level for textual entailment recognition. In Proceedings of the Workshop on Textual Entailment and Paraphrasing, pp. 131–6, Prague, Czech Republic.CrossRefGoogle Scholar
Barzilay, R. 2003. Information Fusion for Multidocument Summarization: Paraphrasing and Generation. Ph.D. Thesis, Columbia University, New York.Google Scholar
Barzilay, R., and Lee, L. 2003. Learning to paraphrase: an unsupervised approach using multiple-sequence alignment. In Proceedings of HLT-NAACL, pp. 1623, Edmonton, Canada.Google Scholar
Barzilay, R., and McKeown, K. R. 2001. Extracting paraphrases from a parallel corpus. In Proceedings of ACL. pp. 50–7. Toulouse, France.Google Scholar
Callison-Burch, C., Koehn, P., and Osborne, M. 2006. Improved statistical machine translation using paraphrases. In Proceedings of HLT-NAACL, pp. 1724, New York.Google Scholar
Chambers, N., Cer, D., Grenager, T., Hall, D., Kiddon, C., MacCartney, B., de Marneffe, M., Ramage, D., Yeh, E., and Manning, C. D. 2007. Learning alignments and leveraging natural logics. In Proceedings of the Workshop on Textual Entailment and Paraphrasing, pp. 165–70, Prague, Czech Republic.CrossRefGoogle Scholar
Clark, P., Murray, W. R., Thompson, J., Harrison, P., Hobbs, J., and Fellbaum, C. 2007. On the role of lexical and world knowledge in RTE3. In Proceedings of the Workshop on Textual Entailment and Paraphrasing, pp. 54–9, Prague, Czech Republic.CrossRefGoogle Scholar
Hermjakob, U., Echihabi, A., and Marcu, D. 2002. Natural language based reformulation resource and web exploitation for question answering. In Proceedings of TREC, Gaithersburg, MD.Google Scholar
Ibrahim, A., Katz, B., and Lin, J. 2003. Extracting structural paraphrases from aligned monolingual corpora. In Proceedings of IWP, pp. 5764, Sapporo, Japan.Google Scholar
Iftene, A., and Balahur-Dobrescu, A. 2007. Hypothesis transformation and semantic variability rules used in recognizing textual entailment. In Proceedings of the Workshop on Textual Entailment and Paraphrasing, pp. 125–30, Prague, Czech Republic.CrossRefGoogle Scholar
Iordanskaja, L., Kittredge, R., and Polguère, A. 1991. Lexical selection and paraphrase in a meaning-text generation model. In Paris, C. L., Swartout, W. R., and Mann, W. C. (eds.), Natural Language Generation in Artificial Intelligence and Computational Linguistics, Kluwer, Norwell, MA, pp. 293312.CrossRefGoogle Scholar
Kauchak, D., and Barzilay, R. 2006. Paraphrasing for automatic evaluation. In Proceedings of HLT-NAACL, pp. 455–62, New York.Google Scholar
Koehn, P., Axelrod, A., Mayne, A. B., Callison-Burch, C., Osborne, M., and Talbot, D. 2005. Edinburgh system description for the 2005 IWSLT speech translation evaluation. In Proceedings of IWSLT, Pittsburgh, PA.Google Scholar
Koehn, P., Och, F. J., and Marcu, D. 2003. Statistical phrase-based translation. In Proceedings of HLT-NAACL, pp. 127–33, Edmonton, Canada.Google Scholar
Lepage, Y., and Denoual, E. 2005. Automatic generation of paraphrases to be used as translation references in objective evaluation measures of machine translation. In Proceedings of IWP, pp. 5764, Jeju, Korea.Google Scholar
Lin, D., and Pantel, P. 2001. Discovery of inference rules for question answering. Natural Language Engineering 7 (4): 343–60.CrossRefGoogle Scholar
Liu, T., Ma, J., Zhu, H., and Li, S. 2006. Dependency parsing based on dynamic local optimization. In Proceedings of CoNLL-X, pp. 211–15, New York.Google Scholar
Marsi, E., Krahmer, E., and Bosma, W. 2007. Dependency-based paraphrasing for recognizing textual entailment. In Proceedings of the Workshop on Textual Entailment and Paraphrasing, pp. 83–8, Prague, Czech Republic.CrossRefGoogle Scholar
Mckeown, K. R., Barzilay, R., Evans, D., Hatzivassiloglou, V., Klavans, J. L., Nenkova, A., Sable, C., Schiffman, B., and Sigelman, S. 2002. Tracking and summarizing news on a daily basis with Columbia's newsblaster. In Proceedings of HLT, pp. 280–5, San Diego, CA.Google Scholar
Nivre, J., Hall, J., Nilsson, J., Chanev, A., Eryigit, G., Kübler, S., Marinov, S., and Marsi, E. 2007. MaltParser: a language-lndependent system for data-driven dependency parsing. Natural Language Engineering 13 (2): 95135.CrossRefGoogle Scholar
Och, F., and Ney, H. 2000. Improved statistical alignment models. In Proceedings of ACL, pp. 440–7, Hong Kong, China.Google Scholar
Ouangraoua, A., Ferraro, P., Tichit, L., and Dulucq, S. 2007. Local similarity between quotiented ordered trees. Journal of Discrete Algorithms 5 (1): 2335.CrossRefGoogle Scholar
Pang, B., Knight, K., and Marcu, D. 2003. Syntax-based alignment of multiple translations: extracting paraphrases and generating new sentences. In Proceedings of HLT-NAACL, pp. 181–8, Edmonton, Canada.Google Scholar
Pantel, P., Bhagat, R., Coppola, B., Chklovski, T., and Hovy, E. 2007. ISP: Learning inferential selectional preferences. In Proceedings of HLT-NAACL, pp. 564–71, Rochester, NY.Google Scholar
Press, W. H., Teukolsky, S. A., Vetterling, W. T., and Flannery, B. P. 1992. Numerical recipes in C: the art of scientific computing. Cambridge, UK: Cambridge University Press, pp. 412–20.Google Scholar
Quirk, C., Brockett, C., and Dolan, W. 2004. Monolingual machine translation for paraphrase generation. In Proceedings of EMNLP, pp. 142–9, Barcelona, Spain.Google Scholar
Ravichandran, D., and Hovy, E. 2002. Learning surface text patterns for a question answering system. In Proceedings of ACL, pp. 41–7, Philadelphia, PA.Google Scholar
Roth, D., and Sammons, M. 2007. Semantic and logical inference model for textual entailment. In Proceedings of the Workshop on Textual Entailment and Paraphrasing, pp. 107–12, Prague, Czech Republic.CrossRefGoogle Scholar
Shinyama, Y., Sekine, S., and Sudo, K. 2002. Automatic paraphrase acquisition from news articles. In Proceedings of HLT, pp. 40–6. San Diego, CA.Google Scholar
Szpektor, I., and Dagan, I. 2007. Learning canonical forms of entailment rules. In Proceedings of RANLP, Borovets, Bulgaria.Google Scholar
Szpektor, I., Shnarch, E., and Dagan, I. 2007. Instance-based evaluation of entailment rule acquisition. In Proceedings of ACL, pp. 456–63, Prague, Czech Republic.Google Scholar
Szpektor, I., Tanev, H., Dagan, I., and Coppola, B. 2004. Scaling web-based acquisition of entailment relations. In Proceedings of EMNLP, pp. 41–8, Barcelona, Spain.Google Scholar
Zhao, S.-Q., Wang, H.-F., Liu, T., and Li, S. 2008. Pivot approach for extracting paraphrase patterns from bilingual corpora. In Proceedings of ACL-08:HLT, pp. 780–8, Columbus, OH.Google Scholar
Zhao, S.-Q., Zhou, M., and Liu, T. 2007. Learning question paraphrases for QA from encarta logs. In Proceedings of IJCAI, pp. 17951800, Hyderabad, India.Google Scholar