Extracting paraphrase patterns from bilingual parallel corpora

SHIQI ZHAO; HAIFENG WANG; TING LIU; SHENG LI

doi:10.1017/S1351324909990155

Extracting paraphrase patterns from bilingual parallel corpora

Published online by Cambridge University Press: 16 September 2009

SHIQI ZHAO ,

HAIFENG WANG ,

TING LIU and

SHENG LI

Show author details

SHIQI ZHAO: Affiliation:
Harbin Institute of Technology, No. 27 Jiaohua Street, Nangang District, Harbin 150001, China e-mails: [email protected], [email protected], [email protected]
HAIFENG WANG: Affiliation:
Toshiba (China) Research and Development Center, No. 1, East Chang An Ave., Dongcheng District, Beijing 100738, [email protected]
TING LIU: Affiliation:
Harbin Institute of Technology, No. 27 Jiaohua Street, Nangang District, Harbin 150001, China e-mails: [email protected], [email protected], [email protected]
SHENG LI: Affiliation:
Harbin Institute of Technology, No. 27 Jiaohua Street, Nangang District, Harbin 150001, China e-mails: [email protected], [email protected], [email protected]

Article contents

Abstract
References

Get access

Rights & Permissions

Abstract

Paraphrase patterns are semantically equivalent patterns, which are useful in both paraphrase recognition and generation. This paper presents a pivot approach for extracting paraphrase patterns from bilingual parallel corpora, whereby the paraphrase patterns in English are extracted using the patterns in another language as pivots. We make use of log-linear models for computing the paraphrase likelihood between pattern pairs and exploit feature functions based on maximum likelihood estimation (MLE), lexical weighting (LW), and monolingual word alignment (MWA). Using the presented method, we extract more than 1 million pairs of paraphrase patterns from about 2 million pairs of bilingual parallel sentences. The precision of the extracted paraphrase patterns is above 78%. Experimental results show that the presented method significantly outperforms a well-known method called discovery of inference rules from text (DIRT). Additionally, the log-linear model with the proposed feature functions are effective. The extracted paraphrase patterns are fully analyzed. Especially, we found that the extracted paraphrase patterns can be classified into five types, which are useful in multiple natural language processing (NLP) applications.

Type: Papers
Information: Natural Language Engineering , Volume 15 , Special Issue 4: Textual Entailment , October 2009 , pp. 503 - 526

DOI: https://doi.org/10.1017/S1351324909990155 [Opens in a new window]
Copyright: Copyright © Cambridge University Press 2009

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Article purchase

Temporarily unavailable

References

Bannard, C., and Callison-Burch, C. 2005. Paraphrasing with bilingual parallel corpora. In Proceedings of ACL, pp. 597–604, Ann Arbor, MI.Google Scholar

Bar-Haim, R., Dagan, I., Greental, I., Szpektor, I., and Friedman, M. 2007. Semantic inference at the lexical-syntactic level for textual entailment recognition. In Proceedings of the Workshop on Textual Entailment and Paraphrasing, pp. 131–6, Prague, Czech Republic.CrossRef Google Scholar

Barzilay, R. 2003. Information Fusion for Multidocument Summarization: Paraphrasing and Generation. Ph.D. Thesis, Columbia University, New York.Google Scholar

Barzilay, R., and Lee, L. 2003. Learning to paraphrase: an unsupervised approach using multiple-sequence alignment. In Proceedings of HLT-NAACL, pp. 16–23, Edmonton, Canada.Google Scholar

Barzilay, R., and McKeown, K. R. 2001. Extracting paraphrases from a parallel corpus. In Proceedings of ACL. pp. 50–7. Toulouse, France.Google Scholar

Callison-Burch, C., Koehn, P., and Osborne, M. 2006. Improved statistical machine translation using paraphrases. In Proceedings of HLT-NAACL, pp. 17–24, New York.Google Scholar

Chambers, N., Cer, D., Grenager, T., Hall, D., Kiddon, C., MacCartney, B., de Marneffe, M., Ramage, D., Yeh, E., and Manning, C. D. 2007. Learning alignments and leveraging natural logics. In Proceedings of the Workshop on Textual Entailment and Paraphrasing, pp. 165–70, Prague, Czech Republic.CrossRef Google Scholar

Clark, P., Murray, W. R., Thompson, J., Harrison, P., Hobbs, J., and Fellbaum, C. 2007. On the role of lexical and world knowledge in RTE3. In Proceedings of the Workshop on Textual Entailment and Paraphrasing, pp. 54–9, Prague, Czech Republic.CrossRef Google Scholar

Hermjakob, U., Echihabi, A., and Marcu, D. 2002. Natural language based reformulation resource and web exploitation for question answering. In Proceedings of TREC, Gaithersburg, MD.Google Scholar

Ibrahim, A., Katz, B., and Lin, J. 2003. Extracting structural paraphrases from aligned monolingual corpora. In Proceedings of IWP, pp. 57–64, Sapporo, Japan.Google Scholar

Iftene, A., and Balahur-Dobrescu, A. 2007. Hypothesis transformation and semantic variability rules used in recognizing textual entailment. In Proceedings of the Workshop on Textual Entailment and Paraphrasing, pp. 125–30, Prague, Czech Republic.CrossRef Google Scholar

Iordanskaja, L., Kittredge, R., and Polguère, A. 1991. Lexical selection and paraphrase in a meaning-text generation model. In Paris, C. L., Swartout, W. R., and Mann, W. C. (eds.), Natural Language Generation in Artificial Intelligence and Computational Linguistics, Kluwer, Norwell, MA, pp. 293–312.CrossRef Google Scholar

Kauchak, D., and Barzilay, R. 2006. Paraphrasing for automatic evaluation. In Proceedings of HLT-NAACL, pp. 455–62, New York.Google Scholar

Koehn, P., Axelrod, A., Mayne, A. B., Callison-Burch, C., Osborne, M., and Talbot, D. 2005. Edinburgh system description for the 2005 IWSLT speech translation evaluation. In Proceedings of IWSLT, Pittsburgh, PA.Google Scholar

Koehn, P., Och, F. J., and Marcu, D. 2003. Statistical phrase-based translation. In Proceedings of HLT-NAACL, pp. 127–33, Edmonton, Canada.Google Scholar

Lepage, Y., and Denoual, E. 2005. Automatic generation of paraphrases to be used as translation references in objective evaluation measures of machine translation. In Proceedings of IWP, pp. 57–64, Jeju, Korea.Google Scholar

Lin, D., and Pantel, P. 2001. Discovery of inference rules for question answering. Natural Language Engineering 7 (4): 343–60.CrossRef Google Scholar

Liu, T., Ma, J., Zhu, H., and Li, S. 2006. Dependency parsing based on dynamic local optimization. In Proceedings of CoNLL-X, pp. 211–15, New York.Google Scholar

Marsi, E., Krahmer, E., and Bosma, W. 2007. Dependency-based paraphrasing for recognizing textual entailment. In Proceedings of the Workshop on Textual Entailment and Paraphrasing, pp. 83–8, Prague, Czech Republic.CrossRef Google Scholar

Mckeown, K. R., Barzilay, R., Evans, D., Hatzivassiloglou, V., Klavans, J. L., Nenkova, A., Sable, C., Schiffman, B., and Sigelman, S. 2002. Tracking and summarizing news on a daily basis with Columbia's newsblaster. In Proceedings of HLT, pp. 280–5, San Diego, CA.Google Scholar

Nivre, J., Hall, J., Nilsson, J., Chanev, A., Eryigit, G., Kübler, S., Marinov, S., and Marsi, E. 2007. MaltParser: a language-lndependent system for data-driven dependency parsing. Natural Language Engineering 13 (2): 95–135.CrossRef Google Scholar

Och, F., and Ney, H. 2000. Improved statistical alignment models. In Proceedings of ACL, pp. 440–7, Hong Kong, China.Google Scholar

Ouangraoua, A., Ferraro, P., Tichit, L., and Dulucq, S. 2007. Local similarity between quotiented ordered trees. Journal of Discrete Algorithms 5 (1): 23–35.CrossRef Google Scholar

Pang, B., Knight, K., and Marcu, D. 2003. Syntax-based alignment of multiple translations: extracting paraphrases and generating new sentences. In Proceedings of HLT-NAACL, pp. 181–8, Edmonton, Canada.Google Scholar

Pantel, P., Bhagat, R., Coppola, B., Chklovski, T., and Hovy, E. 2007. ISP: Learning inferential selectional preferences. In Proceedings of HLT-NAACL, pp. 564–71, Rochester, NY.Google Scholar

Press, W. H., Teukolsky, S. A., Vetterling, W. T., and Flannery, B. P. 1992. Numerical recipes in C: the art of scientific computing. Cambridge, UK: Cambridge University Press, pp. 412–20.Google Scholar

Quirk, C., Brockett, C., and Dolan, W. 2004. Monolingual machine translation for paraphrase generation. In Proceedings of EMNLP, pp. 142–9, Barcelona, Spain.Google Scholar

Ravichandran, D., and Hovy, E. 2002. Learning surface text patterns for a question answering system. In Proceedings of ACL, pp. 41–7, Philadelphia, PA.Google Scholar

Roth, D., and Sammons, M. 2007. Semantic and logical inference model for textual entailment. In Proceedings of the Workshop on Textual Entailment and Paraphrasing, pp. 107–12, Prague, Czech Republic.CrossRef Google Scholar

Shinyama, Y., Sekine, S., and Sudo, K. 2002. Automatic paraphrase acquisition from news articles. In Proceedings of HLT, pp. 40–6. San Diego, CA.Google Scholar

Szpektor, I., and Dagan, I. 2007. Learning canonical forms of entailment rules. In Proceedings of RANLP, Borovets, Bulgaria.Google Scholar

Szpektor, I., Shnarch, E., and Dagan, I. 2007. Instance-based evaluation of entailment rule acquisition. In Proceedings of ACL, pp. 456–63, Prague, Czech Republic.Google Scholar

Szpektor, I., Tanev, H., Dagan, I., and Coppola, B. 2004. Scaling web-based acquisition of entailment relations. In Proceedings of EMNLP, pp. 41–8, Barcelona, Spain.Google Scholar

Zhao, S.-Q., Wang, H.-F., Liu, T., and Li, S. 2008. Pivot approach for extracting paraphrase patterns from bilingual corpora. In Proceedings of ACL-08:HLT, pp. 780–8, Columbus, OH.Google Scholar

Zhao, S.-Q., Zhou, M., and Liu, T. 2007. Learning question paraphrases for QA from encarta logs. In Proceedings of IJCAI, pp. 1795–1800, Hyderabad, India.Google Scholar

Article contents

Extracting paraphrase patterns from bilingual parallel corpora

Abstract

Access options

Article purchase

Temporarily unavailable

References

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests