Hostname: page-component-cd9895bd7-dzt6s Total loading time: 0 Render date: 2024-12-18T21:16:10.817Z Has data issue: false hasContentIssue false

On spectral embedding performance and elucidating network structure in stochastic blockmodel graphs

Published online by Cambridge University Press:  18 October 2019

Joshua Cape*
Affiliation:
Department of Statistics, University of Michigan 1085 South University Avenue, 323 West Hall Ann Arbor, MI 48109, USA
Minh Tang
Affiliation:
Department of Statistics, North Carolina State University2311 Stinson Drive, 5109 SAS Hall, Campus Box 8203 Raleigh, NC 27695, USA
Carey E. Priebe
Affiliation:
Department of Applied Mathematics and Statistics, Johns Hopkins University3400 North Charles Street, Whitehead Hall 100 Baltimore, MD 21218, USA (e-mails: [email protected], [email protected])
*
*Corresponding author. Email: [email protected]

Abstract

Statistical inference on graphs often proceeds via spectral methods involving low-dimensional embeddings of matrix-valued graph representations such as the graph Laplacian or adjacency matrix. In this paper, we analyze the asymptotic information-theoretic relative performance of Laplacian spectral embedding and adjacency spectral embedding for block assignment recovery in stochastic blockmodel graphs by way of Chernoff information. We investigate the relationship between spectral embedding performance and underlying network structure (e.g., homogeneity, affinity, core-periphery, and (un)balancedness) via a comprehensive treatment of the two-block stochastic blockmodel and the class of K-blockmodels exhibiting homogeneous balanced affinity structure. Our findings support the claim that, for a particular notion of sparsity, loosely speaking, “Laplacian spectral embedding favors relatively sparse graphs, whereas adjacency spectral embedding favors not-too-sparse graphs.” We also provide evidence in support of the claim that “adjacency spectral embedding favors core-periphery network structure.”

Type
Original Article
Copyright
© Cambridge University Press 2019 

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Footnotes

This work is partially supported by the XDATA and D3M programs of the Defense Advanced Research Projects Agency (DARPA) and by the Acheson J.Duncan Fund for the Advancement of Research in Statistics at JohnsHopkins University. Part of this work was done during visits by JC and CEP to the Isaac Newton Institute for Mathematical Sciences at the University of Cambridge under EPSCR grant EP/K032208/1. JC thanks Zachary Lubberts for productive discussions.

References

Abbe, E. (2018). Community detection and stochastic block models: Recent developments. Journal of Machine Learning Research, 18(177), 186.Google Scholar
Airoldi, E. M., Blei, D. M., Fienberg, S. E., & Xing, E. P. (2008). Mixed membership stochastic blockmodels. Journal of Machine Learning Research, 9, 19812014.Google ScholarPubMed
Athreya, A., Fishkind, D. E., Tang, M., Priebe, C. E., Park, Y., Vogelstein, J. T., … Sussman, D. L. (2018). Statistical inference on random dot product graphs: A survey. Journal of Machine Learning Research, 18(226), 192.Google Scholar
Athreya, A., Priebe, C. E., Tang, M., Lyzinski, V., Marchette, D. J., & Sussman, D. L. (2016). A limit theorem for scaled eigenvectors of random dot product graphs. Sankhya A, 78(1), 118.CrossRefGoogle Scholar
Bollobás, B., Janson, S., & Riordan, O. (2007). The phase transition in inhomogeneous random graphs. Random Structures and Algorithms, 31(1), 3122.CrossRefGoogle Scholar
Borgatti, S. P., & Everett, M. G. (2000). Models of core/periphery structures. Social Networks, 21(4), 375395.CrossRefGoogle Scholar
Chernoff, H. (1952). A measure of asymptotic efficiency for tests of a hypothesis based on the sum of observations. The Annals of Mathematical Statistics, 23(4), 493507.CrossRefGoogle Scholar
Chernoff, H. (1956). Large-sample theory: Parametric case. The Annals of Mathematical Statistics, 27(1), 122.CrossRefGoogle Scholar
Csermely, P., London, A., Wu, L.-Y., & Uzzi, B. (2013). Structure and dynamics of core-periphery networks. Journal of Complex Networks, 1(2), 93123.CrossRefGoogle Scholar
Devroye, L., Györfi, L., & Lugosi, G. (2013). A Probabilistic Theory of Pattern Recognition. Vol. 31. Springer.Google Scholar
Erdös, P., & Rényi, A. (1959). On random graphs. Publicationes Mathematicae (Debrecen), 6, 290297.Google Scholar
Fishkind, D. E., Sussman, D. L., Tang, M., Vogelstein, J. T., & Priebe, C. E. (2013). Consistent adjacency-spectral partitioning for the stochastic block model when the model parameters are unknown. SIAM Journal on Matrix Analysis and Applications, 34(1), 2339.CrossRefGoogle Scholar
Hoff, P. D., Raftery, A. E., & Handcock, M. S. (2002). Latent space approaches to social network analysis. Journal of the American Statistical Association, 97(460), 10901098.CrossRefGoogle Scholar
Holland, P. W., Laskey, K. B., & Leinhardt, S. (1983). Stochastic blockmodels: First steps. Social Networks, 5(2), 109137.CrossRefGoogle Scholar
Holme, P. (2005). Core-periphery organization of complex networks. Physical Review E, 72, 046111.CrossRefGoogle ScholarPubMed
Horn, R. A., & Johnson, C. R. (2012). Matrix Analysis. Cambridge University Press.CrossRefGoogle Scholar
Jeub, L. G. S., Balachandran, P., Porter, M. A., Mucha, P. J., & Mahoney, M. W. (2015). Think locally, act locally: Detection of small, medium-sized, and large communities in large networks. Physical Review E, 91(1), 012821.CrossRefGoogle ScholarPubMed
Karrer, B., & Newman, M. E. J. (2011). Stochastic blockmodels and community structure in networks. Physical Review E, 83, 016107.CrossRefGoogle ScholarPubMed
Lei, J., & Rinaldo, A. (2015). Consistency of spectral clustering in stochastic block models. Annals of Statistics, 43(1), 215237.CrossRefGoogle Scholar
Leskovec, J., Lang, K. J., Dasgupta, A., & Mahoney, M. W. (2009). Community structure in large networks: Natural cluster sizes and the absence of large well-defined clusters. Internet Mathematics, 6(1), 29123.CrossRefGoogle Scholar
Liese, Friedrich, & Vajda, Igor. (2006). On divergences and informations in statistics and information theory. IEEE Transactions on Information Theory, 52(10), 43944412.CrossRefGoogle Scholar
Lyzinski, V., Tang, M., Athreya, A., Park, Y., & Priebe, C. E. (2017). Community detection and classification in hierarchical stochastic blockmodels. IEEE Transactions on Network Science and Engineering, 4(1), 1326.CrossRefGoogle Scholar
McSherry, F. (2001). Spectral partitioning of random graphs. In Proceedings of the 42nd IEEE Symposium on Foundations of Computer Science, pp. 529537.CrossRefGoogle Scholar
Nickel, C. L. M. (2006). Random dot product graphs: A model for social networks. Ph.D. thesis, Johns Hopkins University. Google Scholar
Priebe, C. E., Park, Y., Vogelstein, J. T., Conroy, J. M., Lyzinski, V., Tang, M., … Bridgeford, E. (2019). On a two-truths phenomenon in spectral graph clustering. Proceedings of the National Academy of Sciences, 116(13), 59956000.CrossRefGoogle ScholarPubMed
Rohe, K., Chatterjee, S., & Yu, B. (2011). Spectral clustering and the high-dimensional stochastic block model. Annals of Statistics, 39(4), 18781915.CrossRefGoogle Scholar
Rubin-Delanchy, P., Priebe, C. E., Tang, M., & Cape, J. (2017). A statistical interpretation of spectral embedding: The generalised random dot product graph. arxiv preprint arxiv:1709.05506.Google Scholar
Sarkar, P., & Bickel, P. J. (2015). Role of normalization in spectral clustering for stochastic blockmodels. Annals of Statistics, 43(3), 962990.CrossRefGoogle Scholar
Sussman, D. L., Tang, M., Fishkind, D. E., & Priebe, C. E. (2014). A consistent adjacency spectral embedding for stochastic blockmodel graphs. Journal of the American Statistical Association, 107(499), 11191128.CrossRefGoogle Scholar
Tang, M., Athreya, A., Sussman, D. L., Lyzinski, V., & Priebe, C. E. (2017a). A nonparametric two-sample hypothesis testing problem for random graphs. Bernoulli, 23(3), 15991630.CrossRefGoogle Scholar
Tang, M., Athreya, A., Sussman, D. L., Lyzinski, V., Park, Y., & Priebe, C. E. (2017b). A semiparametric two-sample hypothesis testing problem for random graphs. Journal of Computational and Graphical Statistics, 26(2), 344354.CrossRefGoogle Scholar
Tang, M., & Priebe, C. E. (2018). Limit theorems for eigenvectors of the normalized Laplacian for random graphs. Annals of Statistics, 46(5), 23602415.CrossRefGoogle Scholar
von Luxburg, U. (2007). A tutorial on spectral clustering. Statistics and Computing, 17(4), 395416.CrossRefGoogle Scholar
Young, S., & Scheinerman, E. (2007). Random dot product graph models for social networks. Algorithms and Models for the Web-Graph, 4863, 138149.CrossRefGoogle Scholar
Zhang, X., Martin, T., & Newman, M. E. J. (2015). Identification of core-periphery structure in networks. Physical Review E, 91(3), 032803.CrossRefGoogle ScholarPubMed