Hostname: page-component-586b7cd67f-rdxmf Total loading time: 0 Render date: 2024-11-27T12:03:00.393Z Has data issue: false hasContentIssue false

Community structure: A comparative evaluation of community detection methods

Published online by Cambridge University Press:  03 January 2020

Vinh Loc Dao
Affiliation:
IMT Atlantique, Lab-STICC CNRS UMR 6285, F-29238, Brest, France, (e-mails: [email protected]; [email protected])
Cécile Bothorel*
Affiliation:
IMT Atlantique, Lab-STICC CNRS UMR 6285, F-29238, Brest, France, (e-mails: [email protected]; [email protected])
Philippe Lenca
Affiliation:
IMT Atlantique, Lab-STICC CNRS UMR 6285, F-29238, Brest, France, (e-mails: [email protected]; [email protected])
*
*Corresponding author. Email: [email protected]

Abstract

Discovering community structure in complex networks is a mature field since a tremendous number of community detection methods have been introduced in the literature. Nevertheless, it is still very challenging for practitioners to determine which method would be suitable to get insights into the structural information of the networks they study. Many recent efforts have been devoted to investigating various quality scores of the community structure, but the problem of distinguishing between different types of communities is still open. In this paper, we propose a comparative, extensive, and empirical study to investigate what types of communities many state-of-the-art and well-known community detection methods are producing. Specifically, we provide comprehensive analyses on computation time, community size distribution, a comparative evaluation of methods according to their optimization schemes as well as a comparison of their partitioning strategy through validation metrics. We process our analyses on a very large corpus of hundreds of networks from five different network categories and propose ways to classify community detection methods, helping a potential user to navigate the complex landscape of community detection.

Type
Research Article
Copyright
© Cambridge University Press 2020

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Agreste, S., Meo, P. D., Fiumara, G., Piccione, G., Piccolo, S., Rosaci, D., Sarne, G. M. L., & Vasilakos, A. V. (2017). An empirical comparison of algorithms to find communities in directed graphs and their application in web data analytics. IEEE Transactions on Big Data, 3(3), 289306.CrossRefGoogle Scholar
Aldecoa, R., & Marn, I. (2011). Deciphering network community structure by surprise. PLoS ONE, 6(9), e24195.CrossRefGoogle ScholarPubMed
Ames, B. P. W. (2013). Guaranteed clustering and biclustering via semidefinite programming. Mathematical Programming, 147(1–2), 429465.CrossRefGoogle Scholar
Ana, L. N. F., & Jain, A. K. (2003). Robust data clustering. In IEEE Computer Society Conference on Computer Vision and Pattern Recognition. Proceedings.CrossRefGoogle Scholar
Arifin, S., Zulkardi, , Putri, R. I. I., Hartono, Y., & Susanti, E. (2017). Developing Ill-defined problem-solving for the context of South Sumatera. Journal of Physics: Conference Series, 943(dec), 012038.Google Scholar
Blondel, V. D, Guillaume, J.-L., Lambiotte, R., & Lefebvre, E. (2008). Fast unfolding of communities in large networks. Journal of Statistical Mechanics: Theory and Experiment, 2008(10), P10008.CrossRefGoogle Scholar
Bohlin, L., Edler, D., Lancichinetti, A., & Rosvall, M. (2014). Community detection and visualization of networks with the map equation framework. In Measuring scholarly impact (pp. 334). Springer International Publishing.Google Scholar
Chakraborty, T., Dalmia, A., Mukherjee, A., & Ganguly, N. (2017). Metrics for community analysis: A survey. ACM Computing Surveys, 50(4), 137.CrossRefGoogle Scholar
Clauset, A., Newman, M. E. J., & Moore, C. (2004). Finding community structure in very large networks. Physical Review E, 70(6).CrossRefGoogle ScholarPubMed
Cleveland, W. S. (1979). Robust locally weighted regression and smoothing scatterplots. Journal of the American Statistical Association, 74(368), 829836.CrossRefGoogle Scholar
Coscia, M., Giannotti, F., & Pedreschi, D. (2011). A classification for community discovery methods in complex networks. Statistical Analysis and Data Mining, 4(5), 512546.CrossRefGoogle Scholar
Danon, L., Daz-Guilera, A., Duch, J., & Arenas, A. (2005). Comparing community structure identification. Journal of Statistical Mechanics: Theory and Experiment, 2005(09), P09008P09008.CrossRefGoogle Scholar
Dao, V. L., Bothorel, C., & Lenca, P. (2017). Community structures evaluation in complex networks: A descriptive approach. In The 3rd international winter school and conference on network science, NetSciX, 1119.CrossRefGoogle Scholar
Dao, V. L., Bothorel, C., & Lenca, P. (2018a). An empirical characterization of community structures in complex networks using a bivariate map of quality metrics. Arxiv e-prints, June.Google Scholar
Dao, V. L., Bothorel, C., & Lenca, P. (2018b). Estimating the similarity of community detection methods based on cluster size distribution. In The 7th international conference on complex networks and their applications. Springer International Publishing.CrossRefGoogle Scholar
Erdös, P., & Rényi, A. (1959). On random graphs, I. Publicationes mathematicae (debrecen), 6, 290297.Google Scholar
Fortunato, S., & Barthelemy, M. (2006). Resolution limit in community detection. Proceedings of the National Academy of Sciences, 104(1), 3641.CrossRefGoogle ScholarPubMed
Fortunato, S. (2010). Community detection in graphs. Physics Reports, 486(3–5), 75174.CrossRefGoogle Scholar
Fortunato, S., & Hric, D. (2016). Community detection in networks: A user guide. Physics Reports, 659(Nov.), 144.CrossRefGoogle Scholar
Ghasemian, A., Hosseinmardi, H., & Clauset, A. (2018). Evaluating overfit and underfit in models of network community structure. Arxiv e-prints, Feb.Google Scholar
Girvan, M., & Newman, M. E. J. (2002). Community structure in social and biological networks. Proceedings of the National Academy of Sciences, 99(12), 78217826.CrossRefGoogle ScholarPubMed
Holland, P. W, Laskey, K. B., & Leinhardt, S. (1983). Stochastic blockmodels: First steps. Social Networks, 5(2), 109137.CrossRefGoogle Scholar
Hric, D., Darst, R. K., & Fortunato, S. (2014). Community detection in networks: Structural communities versus ground truth. Physical Review E, 90(6).CrossRefGoogle ScholarPubMed
Hubert, L., & Arabie, P. (1985). Comparing partitions. Journal of Classification, 2(1), 193218.CrossRefGoogle Scholar
Jebabli, M., Cherifi, H., Cherifi, C., & Hamouda, A. (2015). Overlapping community detection versus ground-truth in amazon co-purchasing network. In 2015 11th international conference on signal-image technology & internet-based systems (SITIS) (pp. 328336). IEEE.CrossRefGoogle Scholar
Jebabli, M., Cherifi, H., Cherifi, C., & Hamouda, A. (2018). Community detection algorithm evaluation with ground-truth data. Physica A: Statistical Mechanics and Its Applications, 492, 651706.CrossRefGoogle Scholar
Jerome, K. (2013). The koblenz network collection. In Proceedings conference on world wide web companion (pp. 13431350).Google Scholar
Jr.Ward, J. H. (1963). Hierarchical grouping to optimize an objective function. Journal of the american statistical association, 58(301), 236244.CrossRefGoogle Scholar
Kullback, S., & Leibler, R. A. (1951). On information and sufficiency. The Annals of Mathematical Statistics, 22(1), 7986.CrossRefGoogle Scholar
Kuncheva, L. I., & Hadjitodorov, S. T. (2004). Using diversity in cluster ensembles. In IEEE international conference on systems, man and cybernetics.CrossRefGoogle Scholar
Lambiotte, R. (2010). Multi-scale modularity in complex networks. In 8th international symposium on modeling and optimization in mobile, ad hoc, and wireless networks, May (pp. 546553).Google Scholar
Lancichinetti, A., Kivelä, M., Saramäki, J., & Fortunato, S. (2010). Characterizing the community structure of complex networks. PLoS ONE, 5(8), e11976.CrossRefGoogle ScholarPubMed
Lancichinetti, A., Radicchi, F., Ramasco, J. J., & Fortunato, S. (2011). Finding statistically significant communities in networks. PLoS ONE, 6(4), e18961.CrossRefGoogle ScholarPubMed
Leskovec, J., & Krevl, A. (2014). SNAP Datasets: Stanford large network dataset collection. June.Google Scholar
Leskovec, J., Lang, K. J., Dasgupta, A., & Mahoney, M. W. (2008). Statistical properties of community structure in large social and information networks. Proceeding of the 17th international conference on world wide web – WWW 08.CrossRefGoogle Scholar
Li, Z., Zhang, S., Wang, R.-S., Zhang, X.-S., & Chen, L. (2008). Quantitative function for community detection. Physical Review E, 77(3).CrossRefGoogle ScholarPubMed
Meilă, M. (2003). Comparing clusterings by the variation of information. Learning Theory and Kernel Machines, 173187.CrossRefGoogle Scholar
Meo, P. D., Ferrara, E., Fiumara, G., & Provetti, A. (2014). Mixing local and global information for community detection in large networks. Journal of Computer and System Sciences, 80(1), 7287.CrossRefGoogle Scholar
Miyauchi, A., & Kawase, Y. (2016). Z-score-based modularity for community detection in networks. PLOS ONE, 11(1), e0147805.CrossRefGoogle ScholarPubMed
Newman, M. E. J. (2006a). Finding community structure in networks using the eigenvectors of matrices. Physical Review E, 74(Sep), 036104.CrossRefGoogle ScholarPubMed
Newman, M. E. J. (2006b). Modularity and community structure in networks. Proceedings of the National Academy of Sciences, 103(23), 85778582.CrossRefGoogle ScholarPubMed
Newman, M. E. J. (2010). Networks: An Introduction. Oxford University Press.CrossRefGoogle Scholar
Newman, M. E. J., & Girvan, M. (2004). Finding and evaluating community structure in networks. Physical Review E, 69(2).Google ScholarPubMed
Orman, G. K., Labatut, V., & Cherifi, H. (2012). Comparative evaluation of community detection algorithms: A topological approach. Journal of Statistical Mechanics: Theory and Experiment, 2012(08), P08001.CrossRefGoogle Scholar
Papadopoulos, S., Kompatsiaris, Y., Vakali, A., & Spyridonos, P. (2011). Community detection in social media. Data Mining and Knowledge Discovery, 24(3), 515554.CrossRefGoogle Scholar
Peel, L., Larremore, D. B., & Clauset, A. (2017). The ground truth about metadata and community detection in networks. Science Advances, 3(5), e1602548.CrossRefGoogle ScholarPubMed
Pons, P., & Latapy, M. (2005). Computing communities in large networks using random walks. In Yolum, P., Güngör, T., Gürgen, F., & Özturan, C. (Eds.) Computer and information sciences – ISCIS 2005 (pp. 284293). Berlin, Heidelberg: Springer.CrossRefGoogle Scholar
Pons, P., & Latapy, M. (2011). Post-processing hierarchical community structures: Quality improvements and multi-scale view. Theoretical Computer Science, 412(8–10), 892900.CrossRefGoogle Scholar
Porter, M. A., Onnela, J.-P., & Mucha, P. J. (2009). Communities in Networks. Notices of the American Mathematical Society, Feb.Google Scholar
Radicchi, F., Castellano, C., Cecconi, F., Loreto, V., & Parisi, D. (2004). Defining and identifying communities in networks. Proceedings of the National Academy of Sciences, 101(9), 26582663.CrossRefGoogle ScholarPubMed
Raghavan, U. N., Albert, R., & Kumara, S. (2007). Near linear time algorithm to detect community structures in large-scale networks. Physical Review E, 76(3).CrossRefGoogle ScholarPubMed
Rand, W. M. (1971). Objective criteria for the evaluation of clustering methods. Journal of the American Statistical Association, 66(336), 846850.CrossRefGoogle Scholar
Reichardt, J., & Bornholdt, S. (2006). Statistical mechanics of community detection. Physical Review E, 74(1).CrossRefGoogle ScholarPubMed
Riolo, M. A., Cantwell, G. T., Reinert, G., & Newman, M. E. J. (2017). Efficient method for estimating the number of communities in a network. Physical Review E, 96(3).CrossRefGoogle ScholarPubMed
Rossi, R. A., & Ahmed, N. K. (2015). The network data repository with interactive graph analytics and visualization. In Proceedings of the twenty-ninth aaai conference on artificial intelligence.Google Scholar
Rosvall, M., & Bergstrom, C. T. (2007). An information-theoretic framework for resolving community structure in complex networks. Proceedings of the National Academy of Sciences, 104(18), 73277331.CrossRefGoogle ScholarPubMed
Rosvall, M., Axelsson, D., & Bergstrom, C. T. (2009). The map equation. European Physical Journal Special Topics, 178(Nov.), 1323.CrossRefGoogle Scholar
Schaub, M. T., Delvenne, J.-C., Rosvall, M., & Lambiotte, R. (2017). The many facets of community detection in complex networks. Applied Network Science, 2(1).CrossRefGoogle ScholarPubMed
Silverman, B. W. (1986). Density estimation for statistics and data analysis. London, New York: Chapman and Hall.CrossRefGoogle Scholar
Traag, V. A., Dooren, P. Van, & Nesterov, Y. (2011). Narrow scope for resolution-limit-free community detection. Physical Review E, 84(1).CrossRefGoogle ScholarPubMed
Traag, V. A., Aldecoa, R., & Delvenne, J.-C. (2015). Detecting communities using asymptotical surprise. Physical Review E, 92(2).CrossRefGoogle ScholarPubMed
Vinh, N. X., Epps, J., & Bailey, J. (2010). Information theoretic measures for clusterings comparison: Variants, properties, normalization and correction for chance. J. Mach. Learn. Res., 11(Dec.), 28372854.Google Scholar
Wasserman, S. (1994). Social network analysis: Methods and applications. Cambridge, New York: Cambridge University Press.CrossRefGoogle Scholar
Xie, J., & Szymanski, B. K. (2012). Towards linear time overlapping community detection in social networks. In Advances in knowledge discovery and data mining (pp. 2536). Berlin, Heidelberg: Springer.CrossRefGoogle Scholar
Yang, J., & Leskovec, J. (2013). Defining and evaluating network communities based on ground-truth. Knowledge and Information Systems, 42(1), 181213.CrossRefGoogle Scholar