Hostname: page-component-586b7cd67f-dlnhk Total loading time: 0 Render date: 2024-11-30T23:09:21.741Z Has data issue: false hasContentIssue false

Seeded PageRank solution paths

Published online by Cambridge University Press:  01 July 2016

D. F. GLEICH
Affiliation:
Department of Computer Science, Purdue University, West Lafayette IN, USA email: [email protected]
K. KLOSTER
Affiliation:
Department of Mathematics, Purdue University, West Lafayette IN, USA email: [email protected]

Abstract

We study the behaviour of network diffusions based on the PageRank random walk from a set of seed nodes. These diffusions are known to reveal small, localized clusters (or communities), and also large macro-scale clusters by varying a parameter that has a dual-interpretation as an accuracy bound and as a regularization level. We propose a new method that quickly approximates the result of the diffusion for all values of this parameter. Our method efficiently generates an approximate solution path or regularization path associated with a PageRank diffusion, and it reveals cluster structures at multiple size-scales between small and large. We formally prove a runtime bound on this method that is independent of the size of the network, and we investigate multiple optimizations to our method that can be more practical in some settings. We demonstrate that these methods identify refined clustering structure on a number of real-world networks with up to 2 billion edges.

Type
Papers
Copyright
Copyright © Cambridge University Press 2016 

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

[1] Andersen, R. & Chung, F. (2007) Detecting sharp drops in pagerank and a simplified local partitioning algorithm. In Cai, Jin-Yi, Cooper, S. Barry, and Zhu, Hong (Eds). Theory and Applications of Models of Computation, Springer-verlag, Berlin Heidelberg, pp. 112.Google Scholar
[2] Andersen, R., Chung, F. & Lang, K. (2006) Local graph partitioning using PageRank vectors. In: FOCS.CrossRefGoogle Scholar
[3] Boldi, P., Bonchi, F., Castillo, C., Donato, D., Gionis, A. & Vigna, S. (2008) The query-flow graph: Model and applications. In: Proceedings of the 17th ACM Conference on Information and Knowledge Management, CIKM '08, New York, NY, USA, ACM, pp. 609618.CrossRefGoogle Scholar
[4] Boldi, P., Rosa, M., Santini, M. & Vigna, S. (March 2011) Layered label propagation: A multiresolution coordinate-free ordering for compressing social networks. In: Proceedings of the 20th WWW2011, pp. 587–596.CrossRefGoogle Scholar
[5] Boldi, P., Santini, M. & Vigna, S. (2009) PageRank: Functional dependencies. ACM Trans. Inf. Syst. 27 (4), 123.CrossRefGoogle Scholar
[6] Brezinski, C., Redivo-Zaglia, M. & Serra-Capizzano, S. (March 2005) Extrapolation methods for pagerank computations. Comptes Rendus Mathematique 340 (5), 393397.CrossRefGoogle Scholar
[7] Chierichetti, F., Kumar, R., Lattanzi, S., Mitzenmacher, M., Panconesi, A. & Raghavan, P. (2009) On compressing social networks. In: Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD '09, New York, NY, USA, ACM, pp. 219228.CrossRefGoogle Scholar
[8] Delvenne, J.-C., Yaliraki, S. N. & Barahona, M. (June 2010) Stability of graph communities across time scales. Proc. Natl. Acad. Sci. 107 (29), 1275512760.CrossRefGoogle ScholarPubMed
[9] Efron, B., Hastie, T., Johnstone, I. & Tibshirani, R. (2004) Least angle regression. Ann. Statist. 32 (2), 407499.CrossRefGoogle Scholar
[10] Ghosh, R., Teng, S.-H., Lerman, K. & Yan, X. (2014) The interplay between dynamics and networks: Centrality, communities, and cheeger inequality. In: KDD, pp. 1406–1415.CrossRefGoogle Scholar
[11] Gleich, D. F. (August 2015) PageRank beyond the web. SIAM Rev. 57 (3), 321363.CrossRefGoogle Scholar
[12] Gleich, D. F. & Mahoney, M. M. (2014) Algorithmic anti-differentiation: A case study with min-cuts, spectral, and flow. In: ICML, pp. 1018–1025.Google Scholar
[13] Gleich, D. F. & Mahoney, M. W. (2015) Using local spectral methods to robustify graph-based learning algorithms. In: Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD '15, New York, NY, USA, ACM, pp. 359368.CrossRefGoogle Scholar
[14] Gleich, D. F. & Seshadhri, C. (2012) Vertex neighborhoods, low conductance cuts, and good seeds for local community methods. In: KDD, pp. 597–605.CrossRefGoogle Scholar
[15] Gutierrez-Bunster, T., Stege, U., Thomo, A. & Taylor, J. (2014) How do biological networks differ from social networks? (an experimental study). In: ASONAM, pp. 744–751.CrossRefGoogle Scholar
[16] Hastie, T., Tibshirani, R. & Friedman, J. (2009) The Elements of Statistical Learning: Data Mining, Inference, and Prediction, New York, Springer.CrossRefGoogle Scholar
[17] Hocking, T., Vert, J.-P., Joulin, A. & Bach, F. R. (2011) Clusterpath: An algorithm for clustering using convex fusion penalties. In: ICML, pp. 745–752.Google Scholar
[18] Jeub, L. G. S., Balachandran, P., Porter, M. A., Mucha, P. J. & Mahoney, M. W. (January 2015) Think locally, act locally: Detection of small, medium-sized, and large communities in large networks. Phys. Rev. E 91, 012821.CrossRefGoogle ScholarPubMed
[19] Kloster, K. & Gleich, D. F. (2014) Heat kernel based community detection. In: KDD, pp. 1386–1395.CrossRefGoogle Scholar
[20] Kwak, H., Lee, C., Park, H. & Moon, S. (2010) What is Twitter, a social network or a news media? In: WWW '10: Proceedings of the 19th International Conference on World Wide Web, New York, NY, USA, ACM, pp. 591600.CrossRefGoogle Scholar
[21] Langville, A. N. & Meyer, C. D. (2006) Google's PageRank and Beyond: The Science of Search Engine Rankings, Princeton, NJ, Princeton University Press.CrossRefGoogle Scholar
[22] Leskovec, J., Lang, K. J., Dasgupta, A. & Mahoney, M. W. (September 2009) Community structure in large networks: Natural cluster sizes and the absence of large well-defined clusters. Internet Math. 6 (1), 29123.CrossRefGoogle Scholar
[23] Lindsten, F., Ohlsson, H. & Ljung, L. (2011) Just Relax and Come Clustering! A Convexification of k-Means Clustering, Technical Report, Linköpings Universitet.Google Scholar
[24] Mahoney, M. W., Orecchia, L. & Vishnoi, N. K. (August 2012) A local spectral method for graphs: With applications to improving graph partitions and exploring data graphs locally. J. Mach. Learn. Res 13, 23392365.Google Scholar
[25] Mislove, A., Marcon, M., Gummadi, K. P., Druschel, P. & Bhattacharjee, B. (2007) Measurement and analysis of online social networks. In: Proceedings of the 7th ACM SIGCOMM Conference on Internet Measurement, IMC '07, New York, NY, USA, ACM, pp. 2942.CrossRefGoogle Scholar
[26] Newman, M. E. J. (September 2006) Finding community structure in networks using the eigenvectors of matrices. Phys. Rev. E 74 (3), 036104.CrossRefGoogle ScholarPubMed
[27] Poole, K. T. (2011) Vote View. URL: http://voteview.com/. [Accessed 11/10/2015].Google Scholar
[28] Schaeffer, S. E. (2007) Graph clustering. Comput. Sci. Rev. 1 (1), 2764.CrossRefGoogle Scholar
[29] C. (The Cooperative Association for Internet Data Analyais). (2005) Network datasets. Accessed 2005. URL: http://www.caida.org/tools/measurement/skitter/router_topology/ Google Scholar
[30] Whang, J. J., Gleich, D. F. & Dhillon, I. S. (2013) Overlapping community detection using seed set expansion. In: CIKM, pp. 2099–2108.CrossRefGoogle Scholar
[31] Wilson, C., Boe, B., Sala, A., Puttaswamy, K. P. & Zhao, B. Y. (2009) User interactions in social networks and their implications. In: EuroSys, pp. 205–218.CrossRefGoogle Scholar
[32] Xie, J., Kelley, S. & Szymanski, B. K. (2013) Overlapping community detection in networks: The state-of-the-art and comparative study. ACM Comput. Surv. 45 (4), 43:1–43:35.CrossRefGoogle Scholar
[33] Yang, J. & Leskovec, J. (December 2012) Defining and evaluating network communities based on ground-truth. In: IEEE 12th International Conference on Data Mining (ICDM), pp. 745–754.CrossRefGoogle Scholar
[34] Zhou, D., Bousquet, O., Lal, T. N., Weston, J. & Schölkopf, B. (2003) Learning with local and global consistency. In: Advances in Neural Information Processing Systems (NIPS), Vol. 16, pp. 321–328.Google Scholar