Hostname: page-component-745bb68f8f-mzp66 Total loading time: 0 Render date: 2025-01-24T19:59:17.922Z Has data issue: false hasContentIssue false

Sparsity and normalization in word similarity systems

Published online by Cambridge University Press:  19 August 2015

JEAN MARK GAWRON
Affiliation:
Department of Linguistics, San Diego State University, San Diego, CA, USA e-mails: [email protected], [email protected]
KELLEN STEPHENS
Affiliation:
Department of Linguistics, San Diego State University, San Diego, CA, USA e-mails: [email protected], [email protected]

Abstract

We investigate the problem of improving performance in distributional word similarity systems trained on sparse data, focusing on a family of similarity functions we call Dice-family functions (Dice 1945Ecology26(3): 297–302), including the similarity function introduced in Lin (1998Proceedings of the 15th International Conference on Machine Learning, 296–304), and Curran (2004 PhD thesis, University of Edinburgh. College of Science and Engineering. School of Informatics), as well as a generalized version of Dice Coefficient used in data mining applications (Strehl 2000, 55). We propose a generalization of the Dice-family functions which uses a weight parameter α to make the similarity functions asymmetric. We show that this generalized family of functions (α systems) all belong to the class of asymmetric models first proposed in Tversky (1977Psychological Review84: 327–352), and in a multi-task evaluation of ten word similarity systems, we show that α systems have the best performance across word ranks. In particular, we show that α-parameterization substantially improves the correlations of all Dice-family functions with human judgements on three words sets, including the Miller–Charles/Rubenstein Goodenough word set (Miller and Charles 1991Language and Cognitive Processes6(1): 1–28; Rubenstein and Goodenough 1965Communications of the ACM8: 627–633).

Type
Articles
Copyright
Copyright © Cambridge University Press 2015 

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Agirre, E., Alfonseca, E., Hall, K., Kravalova, J., Pasca, M., and Soroa, A. 2009. A study on similarity and relatedness using distributional and wordnet-based approaches. In Proceedings of NAACL-HLT 09, Stroudsberg, PA. Association for Computational Linguistics, pp. 1927.CrossRefGoogle Scholar
Agirre, E. and Soroa, A. 2009. Personalizing pagerank for word sense disambiguation. In Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics, Stroudsberg, PA . Association for Computational Linguistics, pp. 3341.Google Scholar
Bordag, S. 2008. A comparison of co-occurrence and similarity measures as simulations of context. In Proceedings of the 9th International Conference on Computational Linguistics and Intelligent Text Processing, Berlin: Springer, pp. 5263.Google Scholar
Bouma, G. 2009. Normalized (pointwise) mutual information in collocation extraction. In Proceedings of the Biennial GSCL Conference, Tubingen. Gunter Narr Verlag, pp. 3140.Google Scholar
Bullinaria, J. A. and Levy, J. P. 2012. Extracting semantic representations from word co-occurrence statistics: stop-lists, stemming, and SVD. Behavior Research Methods 44 (3): 890907.CrossRefGoogle ScholarPubMed
Burnard, L. 1995. Users Reference Guide British National Corpus: Version 1.0. Oxford: Oxford University Computing Services.Google Scholar
Church, K. W. and Hanks, P. 1990. Word association norms, mutual information, and lexicography. Computational Linguistics 16 (1): 2229.Google Scholar
Curran, J. R. 2004. From Distributional to Semantic Similarity. PhD thesis, University of Edinburgh. College of Science and Engineering. School of Informatics.Google Scholar
Dagan, I., Lee, L. and Pereira, F. 1997. Similarity-based methods for word sense disambiguation. In Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and 8th Conference of the European Chapter of the Association for Computational Linguistics, Stroudsberg, PA. Association for Computational Linguistics, pp. 5663.Google Scholar
Dagan, I., Lee, L. and Pereira, F. C. N. 1999. Similarity-based models of word cooccurrence probabilities. Machine Learning 34 (1): 4369.Google Scholar
Dagan, I. 2000. Contextual word similarity. In Dale, R., Moisl, H. L., and Somers, H. L. (eds.), Handbook of Natural Language Processing, pp. 459475. New York: Marcel Dekker.Google Scholar
Dice, L. R. 1945. Measures of the amount of ecologic association between species. Ecology 26 (3): 297302.Google Scholar
Eisler, H. and Ekman, G. 1959. A mechanism of subjective similarity. Nordisk Psykologi 11 (1): 110.CrossRefGoogle Scholar
Evert, S. 2008. Corpora and collocations. In Lüdeling, A. and Kytö, M. (eds.), Corpus Linguistics: An International Handbook. Berlin: Mouton de Gruyter.Google Scholar
Ferreira da Silva, J., and Pereira Lopes, G. 1999. A local maxima method and a fair dispersion normalization for extracting multi-word units from corpora. In Proceedings of the 6th Meeting on Mathematics of Language, University of Pennsylvania, Philadelphia, PA. Association for the Mathematics of Language, pp. 369381.Google Scholar
Finkelstein, L., Gabrilovich, E., Matias, Y., Rivlin, E., Solan, Z., Wolfman, G., and Ruppin, E. 2002. Placing search in context: the concept revisited. ACM Transactions on Information Systems 20 (1): 116131.Google Scholar
Freitag, D., Blume, M., Byrnes, J., Chow, E., Kapadia, S., Rohwer, R. and Wang, Z. 2005. New experiments in distributional representations of synonymy. In Proceedings of the 9th Conference on Computational Natural Language Learning, Stroudsberg, PA. Association for Computational Linguistics, pp. 2532.Google Scholar
Gabrilovich, E. and Markovitch, S. 2009. Wikipedia-based semantic interpretation for natural language processing. Journal of Artificial Intelligence Research 34 (2): 443498.CrossRefGoogle Scholar
Gawron, J. M. 2011. Frame semantics. In Maienborn, C., von Heusinger, K., and Portner, P. (eds.), Semantics: An International Handbook of Natural Language Meaning, vol. 23. HSK Handbooks of Linguistics and Communication Science Series. Berlin: Mouton de Gruyter.Google Scholar
Grefenstette, G. 1994. Explorations in Automatic Thesaurus Discovery. New York: Springer Science and Business Media.Google Scholar
Hassan, S. and Mihalcea, R. 2011. Semantic relatedness using salient semantic analysis. In Proceedings of AAAI Conference Artificial Intelligence, Palo Alto, CA. AAAI Press, pp. 884889.Google Scholar
Haveliwala, T. H. 2003. Topic-sensitive pagerank: a context-sensitive ranking algorithm for web search. IEEE Transactions on Knowledge and Data Engineering 15 (4): 784796.CrossRefGoogle Scholar
Heylen, K., Peirsman, Y., Geeraerts, D. and Speelman, D. 2008. Modelling word similarity: an evaluation of automatic synonymy extraction algorithms. In Proceedings of the 6th International Language Resources and Evaluation (LREC-2008), Marrakech, Morocco. European Language Resources Association, pp. 32433249.Google Scholar
Hughes, T. and Ramage, D. 2007. Lexical semantic relatedness with random graph walks. In Proceedings of the Conference on Empirical Methods in Natural Language Processing/ Conference on Computational Natural Language Learning (EMNLP-CoNLL), Prague, Czech Republic. Association for Computational Linguistics, pp. 581589.Google Scholar
Jaccard, P. 1912. The distribution of the ora in the alpine zone. New Phytologist 11 (2): 3750.Google Scholar
Jiang, J. J. and Conrath, D. W. 1997. Semantic similarity based on corpus statistics and lexical taxonomy. In Proceedings of International Conference on Research in Computational Linguistics (ROCLING-10), Stroudsberg, PA. Association for Computational Linguistics.Google Scholar
Jimenez, S., Becerra, C. and Gelbukh, A. 2012. Soft cardinality: a parameterized similarity function for text comparison. In Proceedings of the 1st Joint Conference on Lexical and Computational Semantics, Stroudsberg, PA. Association for Computational Linguistics, pp. 449453.Google Scholar
Landauer, T. K. and Dumais, S. T. 1994. Latent semantic analysis and the measurement of knowledge. In Kaplan, R., and Burstein, J. C. B. (eds.), Educational Testing Service Conference on Natural Language Processing Techniques and Technology in Assessment and Education, Ewing, NJ: Educational Testing Service.Google Scholar
Leacock, C., Miller, G. A. and Chodorow, M. 1998. Using corpus statistics and wordnet relations for sense identification. Computational Linguistics 24 (1): 147165.Google Scholar
Lee, L. 1997. Similarity-Based Approaches to Natural Language Processing. PhD thesis, Harvard University.Google Scholar
Lee, L. 1999. Measures of distributional similarity. In Proceedings of the 37th Annual Meeting of the Association for Computational Linguistics on Computational Linguistics, Stroudsberg, PA. Association for Computational Linguistics, pp. 2532.Google Scholar
Lee, L. 2001. On the effectiveness of the skew divergence for statistical language analysis. In Proceedings of the 8th International Workshop on Artificial Intelligence and Statistics, Fort Lauderdale, FL. Society for Artificial Intelligence and Statistics, pp. 6572.Google Scholar
Lin, D. 1998. An information-theoretic definition of similarity. In Proceedings of the 15th International Conference on Machine Learning, Madison, Wisconsin. International Machine Learning Society, pp. 296304.Google Scholar
Manning, C. D. and Schütze, H. 1999. Foundations of Statistical Natural Language Processing. Cambridge: MIT Press.Google Scholar
McHale, M. 1998. A comparison of WordNet and Roget's taxonomy for measuring semantic similarity. In Workshop on Usage of WordNet in Natural Language Processing Systems, Stroudsberg, PA. COLING-ACL. Available from http://xxx.lanl.gov/abs/cmp-lg/9809003.Google Scholar
Miller, G. A. and Charles, W. G. 1991. Contextual correlates of semantic similarity. Language and Cognitive Processes 6 (1): 128.CrossRefGoogle Scholar
Nida, E. A. 1975. Componential Analysis of Meaning: An Introduction to Semantic Structures. The Hague: Mouton.Google Scholar
Nivre, J. 2003. An efficient algorithm for projective dependency parsing. In Proceedings of the 5th International Conference on Computational Natural Language Learning (CONLL-2003), Stroudsberg, PA. Association of Computational Linguistics, pp. 149160.Google Scholar
Pilehvar, M. T., Jurgens, D. and Navigli, R. 2013. Align, disambiguate and walk: a unified approach for measuring semantic similarity. In Proceedings of the 51st Annual Meeting of the ACL, Stroudsberg, PA. Association for Computational Linguistics, pp. 13411351.Google Scholar
Resnik, P. 1995. Using information content to evaluate semantic similarity in a taxonomy. In Proceedings of the 14th International Joint Conference on Artificial Intelligence (IJCAI), Montreal, Canada. International Joint Conferences on Artificial Intelligence, pp. 448453.Google Scholar
Rosch, E. 1975. Cognitive reference points. Cognitive Psychology 7 (4): 532547.Google Scholar
Rubenstein, H. and Goodenough, J. B. 1965. Contextual correlates of synonymy. Communications of the ACM 8 (10): 627633.CrossRefGoogle Scholar
Schütze, H. 1993. Part-of-speech induction from scratch. In Proceedings of the 31st annual meeting on Association for Computational Linguistics, Stroudsberg, PA. Association for Computational Linguistics, pp. 251258.Google Scholar
Sjoberg, L. 1972. A cognitive theory of similarity. Goteborg Psychological Reports 10. Department of Psychology. University of Goteburg.CrossRefGoogle Scholar
Strehl, A. 2000. Relation-Based Clustering and Cluster Ensembles for High-Dimensional Data-Mining. PhD thesis, University of Texas, Austin, TX.Google Scholar
Turney, P., Littman, M. L., Bigham, J. and Shnayder, V. 2003. Combining independent modules to solve multiple-choice synonym and analogy problems. In Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP-03), Borovets, Bulgaria. INCOMA, Ltd, pp. 482489.Google Scholar
Turney, P. D. 2008. A uniform approach to analogies, synonyms, antonyms, and associations. In Proceedings of the 22nd International Conference on Computational Linguistics, Proceedings of the Conference (COLING-2008), Manchester, UK. ACL-COLING, pp. 905912.Google Scholar
Tversky, A. 1977. Features of similarity. Psychological Review 84 (4): 327352.Google Scholar
van Rijsbergen, C. J. 1979. Information retrieval. Oxford: Butterworth-Heinemann.Google Scholar
Weeds, J. and Weir, D. 2005. Co-occurrence retrieval: a flexible framework for lexical distributional similarity. Computational Linguistics 31 (4): 439475.Google Scholar
Yang, D. and Powers, D. M. 2005. Measuring semantic similarity in the texonomy of wordnet. In Proceedings of 28th Australasian Computer Science Conference, Newcastle, NSW, Australia. Australian Computer Society, pp. 315322.Google Scholar
Yih, W.-T. and Qazvinian, V. 2012. Measuring word relatedness using heterogeneous vector space models. In Proceedings of the 2012 Conference of NACCL, Stroudsberg, PA. Association for Computational Linguistics, pp. 616620.Google Scholar