Hostname: page-component-cd9895bd7-jkksz Total loading time: 0 Render date: 2024-12-25T20:01:57.234Z Has data issue: false hasContentIssue false

How the initialization affects the stability of the қ-meansalgorithm

Published online by Cambridge University Press:  04 September 2012

Sébastien Bubeck
Affiliation:
Centre de Recerca Matemàtica, Barcelona, Spain. [email protected]
Marina Meilă
Affiliation:
University of Washington, Department of Statistics, Seattle, U.S.A.; [email protected]
Ulrike von Luxburg
Affiliation:
Max Planck Institute for Biological Cybernetics, Tübingen, Germany; [email protected]
Get access

Abstract

We investigate the role of the initialization for the stability of the қ-means clusteringalgorithm. As opposed to other papers, we consider the actual қ-means algorithm (also knownas Lloyd algorithm). In particular we leverage on the property that this algorithm can getstuck in local optima of the қ-means objective function. We are interested in the actualclustering, not only in the costs of the solution. We analyze when differentinitializations lead to the same local optimum, and when they lead to different localoptima. This enables us to prove that it is reasonable to select the number of clustersbased on stability scores.

Type
Research Article
Copyright
© EDP Sciences, SMAI, 2012

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Références

D. Arthur and S. Vassilvitskii, қ-means++ : the advantages of careful seeding, in Proc. of SODA (2007).
S. Ben-David and U. von Luxburg, Relating clustering stability to properties of cluster boundaries, in Proc. of COLT (2008).
S. Ben-David, U. von Luxburg and D. Pál, A sober look on clustering stability, in Proc. of COLT (2006).
S. Ben-David, D. Pál and H.-U. Simon, Stability of қ-means clustering, in Proc. of COLT (2007).
L. Bottou and Y. Bengio, Convergence properties of the қ-means algorithm, in Proc. of NIPS (1995).
Dasgupta, S. and Schulman, L., A probabilistic analysis of EM for mixtures of separated, spherical Gaussians. J. Mach. Learn. Res. 8 (2007) 203226. Google Scholar
S. Graf and H. Luschgy, Foundations of Quantization for Probability Distributions. Springer (2000).
Hochbaum, D. and Shmoys, D., A best possible heuristic for the -center problem. Math. Operat. Res. 10 (1985) 180184. Google Scholar
Lange, T., Roth, V., Braun, M. and Buhmann, J., Stability-based validation of clustering solutions. Neural Comput. 16 (2004) 12991323. Google ScholarPubMed
R. Ostrovsky, Y. Rabani, L.J. Schulman and C. Swamy, The effectiveness of Lloyd-type methods for the қ-means problem, in Proc. of FOCS (2006).
O. Shamir and N. Tishby, Cluster stability for finite samples, in Proc. of NIPS (2008).
O. Shamir and N. Tishby, Model selection and stability in қ-means clustering, in Proc. of COLT (2008).
O. Shamir and N. Tishby, On the reliability of clustering stability in the large sample regime, in Proc. of NIPS (2008).
N. Srebro, G. Shakhnarovich and S. Roweis, An investigation of computational and informational limits in Gaussian mixture clustering, in Proc. of ICML (2006).
Z. Zhang, B. Dai and A. Tung, Estimating local optimums in EM algorithm over Gaussian mixture model, in Proc. of ICML (2008).