Hostname: page-component-586b7cd67f-rdxmf Total loading time: 0 Render date: 2024-12-04T09:19:25.690Z Has data issue: false hasContentIssue false

Experiments with three approaches to recognizing lexical entailment

Published online by Cambridge University Press:  28 January 2014

P. D. TURNEY
Affiliation:
National Research Council Canada, Ottawa, Ontario K1A 0R6, Canada e-mail: [email protected], [email protected]
S. M. MOHAMMAD
Affiliation:
National Research Council Canada, Ottawa, Ontario K1A 0R6, Canada e-mail: [email protected], [email protected]

Abstract

Inference in natural language often involves recognizing lexical entailment (RLE), that is, identifying whether one word entails another. For example, buy entails own. Two general strategies for RLE have been proposed: One strategy is to manually construct an asymmetric similarity measure for context vectors (directional similarity) and another is to treat RLE as a problem of learning to recognize semantic relations using supervised machine-learning techniques (relation classification). In this paper, we experiment with two recent state-of-the-art representatives of the two general strategies. The first approach is an asymmetric similarity measure (an instance of the directional similarity strategy), designed to capture the degree to which the contexts of a word, a, form a subset of the contexts of another word, b. The second approach (an instance of the relation classification strategy) represents a word pair, a: b, with a feature vector that is the concatenation of the context vectors of a and b, and then applies supervised learning to a training set of labeled feature vectors. In addition, we introduce a third approach that is a new instance of the relation classification strategy. The third approach represents a word pair, a: b, with a feature vector in which the features are the differences in the similarities of a and b to a set of reference words. All three approaches use vector space models of semantics, based on word–context matrices. We perform an extensive evaluation of the three approaches using three different datasets. The proposed new approach (similarity differences) performs significantly better than the other two approaches on some datasets and there is no dataset for which it is significantly worse. Along the way, we address some of the concerns raised in past research, regarding the treatment of RLE as a problem of semantic relation classification, and we suggest, it is beneficial to make connections between the research in lexical entailment and the research in semantic relation classification.

Type
Articles
Copyright
Copyright © Her Majesty the Queen in Right of Canada, as represented by the National Research Council Canada 2014 

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Agresti, A., 1996. An Introduction to Categorical Data Analysis. New York, NY: Wiley.Google Scholar
Akhmatova, E., and Dras, M., 2009. Using hypernymy acquisition to tackle (part of) textual entailment. In Proceedings of the 2009 Workshop on Applied Textual Inference at ACL-IJCNLP 2009, Suntec, Singapore, pp. 5260.Google Scholar
Androutsopoulos, I., and Malakasiotis, P., 2010. A survey of paraphrasing and textual entailment methods. Journal of Artificial Intelligence Research 38: 135–87.CrossRefGoogle Scholar
Baroni, M., Bernardi, R., Do, N.-Q., and Shan, C., 2012. Entailment above the word level in distributional semantics. In Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics (EACL 2012), Avignon, France, pp. 2332.Google Scholar
Bejar, I. I., Chaffin, R., and Embretson, S. E., 1991. Cognitive and Psychometric Analysis of Analogical Problem Solving. New York, NY: Springer-Verlag.CrossRefGoogle Scholar
Buckley, C., and Voorhees, E. 2000. Evaluating evaluation measure stability. In Proceedings of the 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Athens, Greece, pp. 3340. New York, NY: ACM.CrossRefGoogle Scholar
Bullinaria, J., and Levy, J., 2007. Extracting semantic representations from word co-occurrence statistics: a computational study. Behavior Research Methods 39 (3): 510–26.CrossRefGoogle ScholarPubMed
Bullinaria, J., and Levy, J. 2012. Extracting semantic representations from word co-occurrence statistics: stop-lists, stemming, and SVD. Behavior Research Methods 44, 890907.CrossRefGoogle ScholarPubMed
Büttcher, S., and Clarke, C. 2005. Efficiency vs. effectiveness in terabyte-scale information retrieval. In Proceedings of the 14th Text REtrieval Conference (TREC 2005), Gaithersburg, MD.Google Scholar
Caron, J., 2001. Experiments with LSA scoring: optimal rank and basis. In Proceedings of the SIAM Computational Information Retrieval Workshop, Raleigh, NC, pp. 157–69.Google Scholar
Dagan, I., Dolan, B., Magnini, B., and Roth, D. 2009. Recognizing textual entailment: rational, evaluation and approaches. Natural Language Engineering 15 (4): ixvii.CrossRefGoogle Scholar
Dagan, I., Glickman, O., and Magnini, B. 2006. The PASCAL recognising textual entailment challenge. In Quiñonero-Candela, J., Dagan, I., Magnini, B., and d’Alché-Buc, F. (eds.), Machine Learning Challenges: Evaluating Predictive Uncertainty, Visual Object Classification, and Recognising Tectual Entailment, pp. 177–90, New York, NY: Springer.CrossRefGoogle Scholar
Do, Q. X., and Roth, D., 2010. Constraints-based taxonomic relation classification. In Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing (EMNLP 2010), MIT, Cambridge, MA, pp. 1099–109.Google Scholar
Do, Q. X., and Roth, D., 2012. Exploiting the Wikipedia structure in local and global classification of taxonomic relations. Natural Language Engineering 18 (2): 235–62.Google Scholar
Firth, J. R. 1957. A synopsis of linguistic theory 1930–1955. In Palmer, F. (ed.), Studies in Linguistic Analysis, pp. 132. Oxford, UK: Blackwell.Google Scholar
Geffet, M., and Dagan, I., 2005. The distributional inclusion hypotheses and lexical entailment. In Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics (ACL 2005), Ann Arbor, MI, pp. 107–14.Google Scholar
Girju, R., Nakov, P., Nastase, V., Szpakowicz, S., Turney, P., and Yuret, D., 2007. SemEval-2007 Task 4: classification of semantic relations between nominals. In Proceedings of the Fourth International Workshop on Semantic Evaluations (SemEval 2007), Prague, Czech Republic, pp. 13–8.CrossRefGoogle Scholar
Glickman, O., Dagan, I., and Shnarch, E., 2006. Lexical reference: a semantic matching subtask. In Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing (EMNLP 2006), Sydney, Australia, pp. 172–9.CrossRefGoogle Scholar
Golub, G. H., and VanAAAALoan, C. F. 1996. Matrix Computations, 3rd edn.Baltimore, MD: Johns Hopkins University Press.Google Scholar
Harris, Z., 1954. Distributional structure. Word 10 (23): 146–62.CrossRefGoogle Scholar
Hearst, M., 1992. Automatic acquisition of hyponyms from large text corpora. In Proceedings of the 14th Conference on Computational Linguistics (COLING-92), Nantes, France, pp. 539–45.CrossRefGoogle Scholar
Hendrickx, I., Kim, S. N., Kozareva, Z., Nakov, P., Séaghdha, D. O., Padó, S., Pennacchiotti, M., Romano, L., and Szpakowicz, S., 2010. Semeval-2010 task 8: multi-way classification of semantic relations between pairs of nominals. In Proceedings of the 5th International Workshop on Semantic Evaluation, Uppsala, Sweden, pp. 33–8.Google Scholar
Herrera, J., Peñas, A., and Verdejo, F. 2006. Textual entailment recognition based on dependency analysis and WordNet. In Machine Learning Challenges: Evaluating Predictive Uncertainty, Visual Object Classification, and Recognising Tectual Entailment, Lecture Notes in Computer Science, vol. 3944, pp. 231–9. New York, NY: Springer.CrossRefGoogle Scholar
Hickl, A., Bensley, J., Williams, J., Roberts, K., Rink, B., and Shi, Y. 2006. Recognizing textual entailment with LCC’s GROUNDHOG system. In Proceedings of the Second PASCAL Challenges Workshop on Recognizing Textual Entailment, Venice, Italy.Google Scholar
Hunter, G., 1996. Metalogic: An Introduction to the Metatheory of Standard First Order Logic. Berkeley, CA: University of California Press.Google Scholar
Jurgens, D. A., Mohammad, S. M., Turney, P. D., and Holyoak, K. J., 2012. SemEval-2012 Task 2: measuring degrees of relational similarity. In Proceedings of the First Joint Conference on Lexical and Computational Semantics (*SEM), Montréal, Canada, pp. 356–64.Google Scholar
Kotlerman, L., Dagan, I., Szpektor, I., and Zhitomirsky-Geffet, M., 2010. Directional distributional similarity for lexical inference. Natural Language Engineering 16 (4): 359–89.CrossRefGoogle Scholar
Landauer, T. K., McNamara, D. S., Dennis, S., and Kintsch, W., 2007. Handbook of Latent Semantic Analysis. Mahwah, NJ: Lawrence Erlbaum.CrossRefGoogle Scholar
Lee, L., 1999. Measures of distributional similarity. In Proceedings of the 37th Annual Meeting of the Association for Computational Linguistics, College Park, MD, pp. 2532.Google Scholar
Lin, D. 1998. Automatic retrieval and clustering of similar words. In Proceedings of the 17th International Conference on Computational Linguistics, Montreal, Quebec, Canada, pp. 768–74. Ann Arbor, MI: Association for Computational Linguistics.Google Scholar
Lin, D., and Pantel, P., 2001. DIRT – discovery of inference rules from text. In Proceedings of ACM SIGKDD Conference on Knowledge Discovery and Data Mining 2001, San Francisco, CA, pp. 323–8.Google Scholar
Manning, C., and Schütze, H., 1999. Foundations of Statistical Natural Language Processing. Cambridge, MA: MIT Press.Google Scholar
Mirkin, S., Bar-Haim, R., Berant, J., Dagan, I., Shnarch, E., Stern, A., and Szpektor, I. 2009a. Bar-ilan University’s submission to RTE-5. In Proceedings of the Second Text Analysis Conference (TAC 2009), Gaithersburg, MD.Google Scholar
Mirkin, S., Dagan, I., and Shnarch, E., 2009b. Evaluating the inferential utility of lexical-semantic resources. In Proceedings of the 12th Conference of the European Chapter of the ACL (EACL 2009), Athens, Greece, pp. 558–66.Google Scholar
Morris, J., and Hirst, G. 2004. Non-classical lexical semantic relations. In Workshop on Computational Lexical Semantics, HLT-NAACL-04, Boston, MA.Google Scholar
Nastase, V., and Szpakowicz, S., 2003. Exploring noun-modifier semantic relations. In Proceedings of the Fifth International Workshop on Computational Semantics (IWCS-5), Tilburg, Netherlands, pp. 285301.Google Scholar
Ogden, C. K., 1930. Basic English: A General Introduction with Rules and Grammar. London: Kegan Paul, Trench, Trubner.Google Scholar
Pan, S. J., and Yang, Q., 2010. A survey on transfer learning. IEEE Transactions on Knowledge and Data Engineering 22: 1345–59.CrossRefGoogle Scholar
Pedersen, T., Patwardhan, S., and Michelizzi, J. 2004. WordNet::Similarity – measuring the relatedness of concepts. In Palmer, D., Polifroni, J., and Roy, D. (eds.), Demonstration Papers at HLT-NAACL 2004, Boston, MA, pp. 3841.CrossRefGoogle Scholar
Platt, J. C. 1998. Fast training of support vector machines using sequential minimal optimization. In Schölkopf, B., Burges, C. J. C., and Smola, A. J. (eds.), Advances in Kernel Methods: Support Vector Learning, pp. 185208, Cambridge, MA: MIT Press.Google Scholar
Rosario, B., and Hearst, M., 2001. Classifying the semantic relations in noun-compounds via a domain-specific lexical hierarchy. In Proceedings of the 2001 Conference on Empirical Methods in Natural Language Processing (EMNLP-01), Pittsburgh, PA, pp. 8290.Google Scholar
Rosario, B., Hearst, M., and Fillmore, C., 2002. The descent of hierarchy, and selection in relational semantics. In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics (ACL-02), Philadelphia, PA, pp. 247–54.Google Scholar
Salton, G., and McGill, M., 1983. Introduction to Modern Information Retrieval. New York, NY: McGraw-Hill.Google Scholar
Shnarch, E., Barak, L., and Dagan, I., 2009. Extracting lexical reference rules from Wikipedia. In Proceedings of the 47th Annual Meeting of the ACL and the 4th IJCNLP of the AFNLP, Suntec, Singapore, pp. 450–8.Google Scholar
Snow, R., Jurafsky, D., and Ng, A. Y., 2006. Semantic taxonomy induction from heterogenous evidence. In Proceedings of the 21st International Conference on Computational Linguistics and the 44th Annual Meeting of the ACL, Sydney, NSW, Australia, pp. 801–8.Google Scholar
Szpektor, I., and Dagan, I., 2008. Learning entailment rules for unary templates. In Proceedings of the 22nd International Conference on Computational Linguistics (COLING 2008), Manchester, UK, pp. 849–56.CrossRefGoogle Scholar
Turney, P. D., 2006. Similarity of semantic relations. Computational Linguistics 32 (3): 379416.CrossRefGoogle Scholar
Turney, P. D., 2012. Domain and function: a dual-space model of semantic relations and compositions. Journal of Artificial Intelligence Research 44: 533–85.CrossRefGoogle Scholar
Turney, P. D., Neuman, Y., Assaf, D., and Cohen, Y., 2011. Literal and metaphorical sense identification through concrete and abstract context. In Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing, Edinburgh, UK, pp. 680–90.Google Scholar
Turney, P. D., and Pantel, P., 2010. From frequency to meaning: vector space models of semantics. Journal of Artificial Intelligence Research 37: 141–88.CrossRefGoogle Scholar
Weeds, J., and Weir, D., 2003. A general framework for distributional similarity. In Proceedings of Empirical Methods in Natural Language Processing (EMNLP 2003), Sapporo, Japan, pp. 81–8.Google Scholar
Weeds, J., Weir, D., and McCarthy, D., 2004. Characterising measures of lexical distributional similarity. In Proceedings of the 20th International Conference on Computational Linguistics (COLING '04), Geneva, Switzerland, pp. 1015–21.CrossRefGoogle Scholar
Witten, I. H., Frank, E., and Hall, M. A. 2011. Data Mining: Practical Machine Learning Tools and Techniques, 3rd edn. San Francisco, CA: Morgan Kaufmann.Google Scholar
Zhitomirsky-Geffet, M., and Dagan, I., 2009. Bootstrapping distributional feature vector quality. Computational Linguistics 35 (3): 435–61.CrossRefGoogle Scholar