Hostname: page-component-745bb68f8f-lrblm Total loading time: 0 Render date: 2025-01-12T23:07:27.274Z Has data issue: false hasContentIssue false

A survey of author name disambiguation techniques: 2010–2016

Published online by Cambridge University Press:  05 December 2017

Ijaz Hussain
Affiliation:
Department of Computer Science, COMSATS Institute of Information Technology, Islamabad 45550, Pakistan e-mail: [email protected], [email protected]
Sohail Asghar
Affiliation:
Department of Computer Science, COMSATS Institute of Information Technology, Islamabad 45550, Pakistan e-mail: [email protected], [email protected]

Abstract

Digital libraries content and quality of services are badly affected by the author name ambiguity problem in the citations and it is considered as one of the hardest problems faced by the digital library researchers. Several techniques have been proposed in the literature for the author name ambiguity problem. In this paper, we reviewed some recently presented author name disambiguation techniques and give some challenges and future research directions. We analyze the recent advancements in this field and classify these techniques into supervised, unsupervised, semi-supervised, graph-based and heuristic-based techniques according to their problem formulation that is mainly used for the author name disambiguation. A few surveys have been conducted to review different techniques for the author name disambiguation. These surveys highlighted only the methodology adopted for author name disambiguation but did not critically review their shortcomings. This survey provides a detailed review of author name disambiguation techniques available in the literature, makes a comparison of these techniques at an abstract level and discusses their limitations.

Type
Adaptive and Learning Agents
Copyright
© Cambridge University Press, 2017 

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Amancio, D. R., Oliveira, O. N. Jr & Costa, L. D. F. 2015. Topological-collaborative approach for disambiguating authors names in collaborative networks. Scientometrics 102(1), 465485.CrossRefGoogle Scholar
Arunachalam, S. & Madhan, M. 2016. Adopting orcid as a unique identifier will benefit all involved in scholarly communication. The National Medical Journal of India 29(4), 227234.Google ScholarPubMed
Aswani, N., Bontcheva, K. & Cunningham, H. 2006. Mining information for instance unification. In International Semantic Web Conference, 329–342. Springer.CrossRefGoogle Scholar
Bekkerman, R. & McCallum, A. 2005. Disambiguating web appearances of people in a social network. In Proceedings of the 14th International Conference on World Wide Web, 463–470. ACM.CrossRefGoogle Scholar
Bhattacharya, I. & Getoor, L. 2007. Collective entity resolution in relational data. ACM Transactions on Knowledge Discovery from Data (TKDD) 1(1), 5.CrossRefGoogle Scholar
Carrasco, R. C., Serrano, A. & Castillo-Buergo, R. 2016. A parser for authority control of author names in bibliographic records. Information Processing & Management 52(5), 753764.CrossRefGoogle Scholar
Chin, W.-S., Zhuang, Y., Juan, Y.-C., Wu, F., Tung, H.-Y., Yu, T., Wang, J.-P., Chang, C.-X, Yang, C.-P., Chang, W.-C. Huang, K.-H., Kuo, T.-M., Lin, S.-W., Lin, Y.-S., Lu, Y.-C., Su, Y.-C., Wei, C.-K., Yin, T.-C., Li, C.-L., Lin, T.-W., Tsai, C.-H., Lin, S.-D., Lin, H.-T. & Lin, C.-J. 2014. Effective string processing and matching for author disambiguation. The Journal of Machine Learning Research 15(1), 30373064.Google Scholar
Chisholm, A. & Hachey, B 2015. Entity disambiguation with web links. Transactions of the Association for Computational Linguistics 3, 145156.CrossRefGoogle Scholar
Christen, P. 2006. A comparison of personal name matching: techniques and practical issues. In Sixth IEEE International Conference on Data Mining-Workshops (ICDMW’06), 290–294. IEEE.CrossRefGoogle Scholar
De Carvalho, A. P., Ferreira, A. A., Laender, A. H. & Gonçalves, M. A. 2011. Incremental unsupervised name disambiguation in cleaned digital libraries. Journal of Information and Data Management 2(3), 289.Google Scholar
Elliott, S. 2010. Survey of author name disambiguation: 2004 to 2010. Library Philosophy and Practice 473, http://digitalcommons.unl.edu/libphilprac/473/.Google Scholar
Esperidião, L. V. B., Ferreira, A. A., Laender, A. H., Gonçalves, M. A., Gomes, D. M., Tavares, A. I. & de Assis, G. T. 2014. Reducing fragmentation in incremental author name disambiguation. Journal of Information and Data Management 5(3), 293.Google Scholar
Fan, X., Wang, J., Pu, X., Zhou, L. & Lv, B. 2011. On graph-based name disambiguation. Journal of Data and Information Quality (JDIQ) 2(2), 10.Google Scholar
Ferreira, A. A., Gonçalves, M. A. & Laender, A. H. 2012. A brief survey of automatic methods for author name disambiguation. Acm Sigmod Record 41(2), 1526.CrossRefGoogle Scholar
Ferreira, A. A., Gonçalves, M. A. & Laender, A. H. 2015. Automatic methods for disambiguating author names in bibliographic data repositories. In Proceedings of the 15th ACM/IEEE-CS Joint Conference on Digital Libraries, 297–298. ACM.CrossRefGoogle Scholar
Ferreira, A. A., Veloso, A., Gonçalves, M. A. & Laender, A. H. 2010. Effective self-training author name disambiguation in scholarly digital libraries. In Proceedings of the 10th Annual Joint Conference on Digital Libraries, 39–48. ACM.CrossRefGoogle Scholar
Ferreira, A. A., Veloso, A., Gonçalves, M. A. & Laender, A. H. 2014. Self-training author name disambiguation for information scarce scenarios. Journal of the Association for Information Science and Technology 65(6), 12571278.CrossRefGoogle Scholar
Giunchiglia, F. & Shvaiko, P. 2003. Semantic matching. The Knowledge Engineering Review 18(3), 265280.CrossRefGoogle Scholar
Gurney, T., Horlings, E. & Van Den Besselaar, P. 2012. Author disambiguation using multi-aspect similarity indicators. Scientometrics 91(2), 435449.CrossRefGoogle ScholarPubMed
Han, D., Liu, S., Hu, Y., Wang, B. & Sun, Y. 2015. Elm-based name disambiguation in bibliography. World Wide Web 18(2), 253263.CrossRefGoogle Scholar
Han, H., Giles, L., Zha, H., Li, C. & Tsioutsiouliklis, K. 2004. Two supervised learning approaches for name disambiguation in author citations. In Proceedings of the 2004 joint ACM/IEEE conference on Digital Libraries, 2004, 296–305. IEEE.CrossRefGoogle Scholar
Han, H., Xu, W., Zha, H. & Giles, C. L. 2005. A hierarchical naive bayes mixture model for name disambiguation in author citations. In Proceedings of the 2005 ACM Symposium on Applied Computing, 1065–1069. ACM.CrossRefGoogle Scholar
Huynh, T., Hoang, K., Do, T. & Huynh, D. 2013. Vietnamese author name disambiguation for integrating publications from heterogeneous sources. In Asian Conference on Intelligent Information and Database Systems, 226–235. Springer.CrossRefGoogle Scholar
Imran, M., Gillani, S. & Marchese, M. 2013. A real-time heuristic-based unsupervised method for name disambiguation in digital libraries. D-Lib Magazine 19(9), 1.CrossRefGoogle Scholar
Johnson, D. B. 1975. Finding all the elementary circuits of a directed graph. SIAM Journal on Computing 4(1), 7784.CrossRefGoogle Scholar
Kofod-Petersen, A. 2012. How to do a structured literature review in computer science. Document released as a guide to performing a Structured Literature Review at NTNU. https://pdfs.semanticscholar.org/f9e7/b1f645ddeddfbf702558f554dd316a7692ae.pdf.Google Scholar
Krzywicki, A., Wobcke, W., Bain, M., Martinez, J. C. & Compton, P. 2016. Data mining for building knowledge bases: techniques, architectures and applications. Knowledge Engineering Review 31(2), 97123.CrossRefGoogle Scholar
Kum, H.-C., Krishnamurthy, A., Machanavajjhala, A., Reiter, M. K. & Ahalt, S. 2014. Privacy preserving interactive record linkage (ppirl). Journal of the American Medical Informatics Association 21(2), 212220.CrossRefGoogle ScholarPubMed
LaFlamme, M. 2016. On the problem of the namesake. Cultural Anthropology 31(1), 13.CrossRefGoogle Scholar
Lee, D., Kang, J., Mitra, P., Giles, C. L. & On, B.-W. 2007. Are your citations clean? Communications of the ACM 50(12), 3338.CrossRefGoogle Scholar
Levin, F. H. & Heuser, C. A. 2010. Evaluating the use of social networks in author name disambiguation in digital libraries. Journal of Information and Data Management 1(2), 183.Google Scholar
Levin, M., Krawczyk, S., Bethard, S. & Jurafsky, D. 2012. Citation-based bootstrapping for large-scale author disambiguation. Journal of the American Society for Information Science and Technology 63(5), 10301047.CrossRefGoogle Scholar
Liu, Y., Li, W., Huang, Z. & Fang, Q. 2015. A fast method based on multiple clustering for name disambiguation in bibliographic citations. Journal of the Association for Information Science and Technology 66(3), 634644.CrossRefGoogle Scholar
Liu, Y. & Tang, Y. 2015. Network based framework for author name disambiguation applications. International Journal of u-and e-Service, Science and Technology 8(9), 7582.CrossRefGoogle Scholar
Maguire, E. J. 2016. Ethnicity sensitive author disambiguation using semi-supervised learning. In Proceedings of the Knowledge Engineering and Semantic Web: 7th International Conference, KESW 2016 649, 272. Springer, 21–23 September 2016.Google Scholar
Moher, D., Liberati, A., Tetzlaff, J. & Altman, D. G. 2009. Preferred reporting items for systematic reviews and meta-analyses: the prisma statement. Annals of Internal Medicine 151(4), 264269.CrossRefGoogle ScholarPubMed
Murnane, E. L., Haslhofer, B. & Lagoze, C. 2013. Reslve: leveraging user interest to improve entity disambiguation on short text. In Proceedings of the 22nd International Conference on World Wide Web, 1275–1284. ACM.CrossRefGoogle Scholar
Nicholson, S. W. & Bennett, T. B. 2016. Dissemination and discovery of diverse data: do libraries promote their unique research data collections? International Information & Library Review 48(2), 8593.CrossRefGoogle Scholar
On, B.-W., Elmacioglu, E., Lee, D., Kang, J. & Pei, J. 2006. Improving grouped-entity resolution using quasi-cliques. In Sixth International Conference on Data Mining (ICDM’06), 1008–1015. IEEE.CrossRefGoogle Scholar
On, B.-W., Lee, D., Kang, J. & Mitra, P. 2005. Comparative study of name disambiguation problem using a scalable blocking-based framework. In Proceedings of the 5th ACM/IEEE-CS Joint Conference on Digital Libraries, 344–353. ACM.CrossRefGoogle Scholar
On, B.-W., Lee, I. & Lee, D. 2012. Scalable clustering methods for the name disambiguation problem. Knowledge and Information Systems 31(1), 129151.CrossRefGoogle Scholar
Onodera, N., Iwasawa, M., Midorikawa, N., Yoshikane, F., Amano, K., Ootani, Y., Kodama, T., Kiyama, Y., Tsunoda, H. & Yamazaki, S. 2011. A method for eliminating articles by homonymous authors from the large number of articles retrieved by author search. Journal of the American Society for Information Science and Technology 62(4), 677690.CrossRefGoogle Scholar
Oramas, S., Espinosa-Anke, L., Sordo, M., Saggion, H. & Serra, X. 2016. Elmd: an automatically generated entity linking gold standard dataset in the music domain. In Proceedings of the 10th International Conference on Language Resources and Evaluation, LREC.Google Scholar
Palfrey, J. 2016. Design choices for libraries in the digital-plus era. Daedalus 145(1), 7986.CrossRefGoogle Scholar
Peng, H.-T., Lu, C.-Y., Hsu, W. & Ho, J.-M. 2012. Disambiguating authors in citations on the web and authorship correlations. Expert Systems with Applications 39(12), 1052110532.CrossRefGoogle Scholar
Pereira, D. A., Ribeiro-Neto, B., Ziviani, N., Laender, A. H., Gonçalves, M. A. & Ferreira, A. A. 2009. Using web information for author name disambiguation. In Proceedings of the 9th ACM/IEEE-CS Joint Conference on Digital Libraries, 49–58. ACM.CrossRefGoogle Scholar
Provost, F. & Kohavi, R. 1998. Guest editors’ introduction: on applied research in machine learning. Machine Learning 30(2), 127132.CrossRefGoogle Scholar
Pyle, R. L. 2016. Towards a global names architecture: the future of indexing scientific names. ZooKeys 550, 261281.CrossRefGoogle Scholar
Santana, A. F., Gonçalves, M. A., Laender, A. H. & Ferreira, A. A. 2015. On the combination of domain-specific heuristics for author name disambiguation: the nearest cluster method. International Journal on Digital Libraries 16(3–4), 229246.CrossRefGoogle Scholar
Scholtes, J. C. & Maes, F. P. E. et al. 2016. System and method for authorship disambiguation and alias resolution in electronic data. US Patent 9,264,387.Google Scholar
Schulz, C., Mazloumian, A., Petersen, A. M., Penner, O. & Helbing, D. 2014. Exploiting citation networks for large-scale author name disambiguation. EPJ Data Science 3(1), 1.CrossRefGoogle Scholar
Seol, J.-W., Lee, S.-H. & Kim, K.-Y. 2016. Author disambiguation using co-author network and supervised learning approach in scholarly data. International Journal of Software Engineering and Its Applications 10(4), 7382.CrossRefGoogle Scholar
Shin, D., Kim, T., Choi, J. & Kim, J. 2014. Author name disambiguation using a graph model with node splitting and merging based on bibliographic information. Scientometrics 100(1), 1550.CrossRefGoogle Scholar
Song, Y., Huang, J., Councill, I. G., Li, J. & Giles, C. L. 2007. Efficient topic-based unsupervised name disambiguation. In Proceedings of the 7th ACM/IEEE-CS Joint Conference on Digital Libraries, 342–351. ACM.CrossRefGoogle Scholar
Tang, J., Fong, A. C., Wang, B. & Zhang, J. 2012. A unified probabilistic framework for name disambiguation in digital library. IEEE Transactions on Knowledge and Data Engineering 24(6), 975987.CrossRefGoogle Scholar
Tang, L. & Walsh, J. P. 2010. Bibliometric fingerprints: name disambiguation based on approximate structure equivalence of cognitive maps. Scientometrics 84(3), 763784.CrossRefGoogle Scholar
Torvik, V. I. & Smalheiser, N. R. 2009. Author name disambiguation in medline. ACM Transactions on Knowledge Discovery from Data (TKDD) 3(3), 11.CrossRefGoogle Scholar
Tran, H. N., Huynh, T. & Do, T. 2014. Author name disambiguation by using deep neural network. In Asian Conference on Intelligent Information and Database Systems, 123132. SpringerCrossRefGoogle Scholar
Wang, J., Berzins, K., Hicks, D., Melkers, J., Xiao, F. & Pinheiro, D. 2012. A boosted-trees method for name disambiguation. Scientometrics 93(2), 391411.CrossRefGoogle Scholar
Wang, P., Zhao, J., Huang, K. & Xu, B. 2014. A unified semi-supervised framework for author disambiguation in academic social network. In International Conference on Database and Expert Systems Applications, 1–16. Springer.CrossRefGoogle Scholar
Wang, X., Tang, J., Cheng, H. & Philip, S. Y. 2011. Adana: active name disambiguation. In 2011 IEEE 11th International Conference on Data Mining, 794–803. IEEE.CrossRefGoogle Scholar
Weiss, A. 2016. Examining massive digital libraries (mdls) and their impact on reference services. The Reference Librarian 57(4), 286306.CrossRefGoogle Scholar
Wu, H., Li, B., Pei, Y. & He, J. 2014. Unsupervised author disambiguation using Dempster-Shafer theory. Scientometrics 101(3), 19551972.CrossRefGoogle Scholar
Zhao, J., Wang, P. & Huang, K. 2013. A semi-supervised approach for author disambiguation in KDD CUP 2013. In Proceedings of the 2013 KDD CUP 2013 Workshop, 10. ACM.CrossRefGoogle Scholar
Zhu, J., Yang, Y., Xie, Q., Wang, L. & Hassan, S.-U. 2014. Robust hybrid name disambiguation framework for large databases. Scientometrics 98(3), 22552274.CrossRefGoogle Scholar
Zhu, L., Ghasemi-Gol, M., Szekely, P., Galstyan, A. & Knoblock, C. A. 2016. Unsupervised entity resolution on multi-type graphs. In International Semantic Web Conference, 649–667. Springer.CrossRefGoogle Scholar
Zhu, Y. & Li, Q. 2013. Enhancing object distinction utilizing probabilistic topic model. In 2013 International Conference on Cloud Computing and Big Data (CloudCom-Asia), 177–182. IEEE.CrossRefGoogle Scholar