Order-Constrained Solutions in K-Means Clustering: Even Better Than Being Globally Optimal

Douglas Steinley; Lawrence Hubert

doi:10.1007/s11336-008-9058-z

Order-Constrained Solutions in K-Means Clustering: Even Better Than Being Globally Optimal

Published online by Cambridge University Press: 01 January 2025

Douglas Steinley and

Lawrence Hubert

Show author details

Douglas Steinley*: Affiliation:
University of Missouri-Columbia
Lawrence Hubert: Affiliation:
University of Illinois, Urbana-Champaign
*: Requests for reprints should be sent to Douglas Steinley, Department of Psychological Sciences, University of Missouri-Columbia, 210 McAlester Hall, Columbia, MO 65211, USA. E-mail: [email protected]

Article contents

Abstract
References

Get access

Rights & Permissions

Abstract

This paper proposes an order-constrained K-means cluster analysis strategy, and implements that strategy through an auxiliary quadratic assignment optimization heuristic that identifies an initial object order. A subsequent dynamic programming recursion is applied to optimally subdivide the object set subject to the order constraint. We show that although the usual K-means sum-of-squared-error criterion is not guaranteed to be minimal, a true underlying cluster structure may be more accurately recovered. Also, substantive interpretability seems generally improved when constrained solutions are considered. We illustrate the procedure with several data sets from the literature.

Keywords

K-means cluster analysis dynamic programming quadratic assignment constrained optimization multicriterion optimization

Type: Theory and Methods
Information: Psychometrika , Volume 73 , Issue 4 , December 2008 , pp. 647 - 664

DOI: https://doi.org/10.1007/s11336-008-9058-z [Opens in a new window]
Copyright: Copyright © 2008 The Psychometric Society

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Article purchase

Temporarily unavailable

References

Al-Sultan, K. (1995). A tabu search approach to the clustering problem. Pattern Recognition, 28, 1443–1451.CrossRef Google Scholar

Babu, G.P., & Murty, M.N. (1994). Simulated annealing for selecting optimal initial seeds in the K-means algorithm. Indian Journal of Pure and Applied Mathematics, 25, 85–94.Google Scholar

Baker, F.B., & Hubert, L.J. (1977). Applications of combinatorial programming to data analysis: Seriation using asymmetric proximity measures. British Journal of Mathematical and Statistical Psychology, 30, 154–164.CrossRef Google Scholar

Brusco, M.J., & Cradit, J.D. (2004). Graph coloring, mimimum-diameter partitioning, and the analysis of confusion matrices. Journal of Mathematical Psychology, 48, 301–309.CrossRef Google Scholar

Brusco, M.J., & Cradit, J.D. (2005). Bicriterion methods for partitioning dissimilarity matrices. British Journal of Mathematical and Statistical Psychology, 58, 319–332.CrossRef Google Scholar PubMed

Brusco, M.J., & Stahl, S. (2001). An interactive multiobjective programming approach to combinatorial data analysis. Psychometrika, 66, 5–24.CrossRef Google Scholar

Brusco, M.J., & Stahl, S. (2001). Compact integer-programming models for extracting subsets of stimuli from confusion matrices. Psychometrika, 66, 405–420.CrossRef Google Scholar

Brusco, M.J., & Stahl, S. (2005). Branch-and-bound applications in combinatorial data analysis, New York: Springer.Google Scholar

Chang, W.C. (1983). On using principal components before separating a mixture of two multivariate normal distributions. Applied Statistics, 32, 267–275.CrossRef Google Scholar

Delattre, M., & Hansen, P. (1980). Bicriterion cluster analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2, 277–291.CrossRef Google Scholar PubMed

Fisher, R.A. (1936). The use of multiple measurements in taxonomic problems. Annals of Eugenics, 7, 179–188.CrossRef Google Scholar

Fisher, W.D. (1958). On grouping for maximum heterogeneity. Journal of the American Statistical Association, 53, 789–798.CrossRef Google Scholar

Forgy, E.W. (1965). Cluster analysis of multivariate data: Efficiency versus interpretability of classifications. Biometrics, 21, 768–769.Google Scholar

Hansen, P., & Mladenovic, N. (2001). J-Means: A new local search heuristic for minimum sum of squares clustering. Pattern Recognition, 34, 405–413.CrossRef Google Scholar

Hartigan, J. (1975). Clustering algorithms, New York: Wiley.Google Scholar

Hubert, L., & Arabie, P. (1985). Comparing partitions. Journal of Classification, 2, 193–218.CrossRef Google Scholar

Hubert, L., Arabie, P., & Meulman, J. (2001). Combinatorial data analysis: Optimization by dynamic programming, Philadelphia: SIAM.CrossRef Google Scholar

Hubert, L., Arabie, P., & Meulman, J. (2006). The structural representation of proximity matrices with MATLAB, Philadelphia: SIAM.CrossRef Google Scholar

Kiplinger’s personal finance. In Kiplinger’s personal finance (Vol. 57, pp. 104–123).Google Scholar

MacQueen, J. (1967). Some methods of classification and analysis of multivariate observations. In Le Cam, L.M., & Neyman, J. (Eds.), Proceedings of the fifth Berkeley symposium on mathematical statistics and probability (pp. 281–297). University of California Press: Berkeley.Google Scholar

Milligan, G.W. (1980). An examination of the effect of six types of error perturbation on fifteen clustering algorithms. Psychometrika, 45, 325–342.CrossRef Google Scholar

Milligan, G.W., & Cooper, M.C. (1985). An examination of procedures for determining the number of clusters in a data set. Psychometrika, 50, 159–179.CrossRef Google Scholar

Milligan, G.W., & Cooper, M.C. (1988). A study of standardization of variables in cluster analysis. Journal of Classification, 5, 181–204.CrossRef Google Scholar

Mirkin, B. (2005). Clustering for data mining, New York: Chapman & Hall.CrossRef Google Scholar

Pacheco, J., & Valencia, O. (2003). Design of hybrids for the minimum sum-of-squares clustering problem. Computational Statistics and Data Analysis, 43, 235–248.CrossRef Google Scholar

Späth, H. (1980). Cluster analysis algorithms, Chichester: Ellis Horwood.Google Scholar

Steinley, D. (2003). K-means clustering: What you don’t know may hurt you. Psychological Methods, 8, 294–304.CrossRef Google Scholar PubMed

Steinley, D. (2004). Standardizing variables in K-means clustering. In Banks, D., House, L., McMorris, F.R., Arabie, P., & Gaul, W. (Eds.), Classification, clustering, and data mining applications (pp. 53–60). New York: Springer.CrossRef Google Scholar

Steinley, D. (2004). Properties of the Hubert-Arabie adjusted Rand index. Psychological Methods, 9, 386–396.CrossRef Google Scholar PubMed

Steinley, D. (2006). K-means clustering: A half-century synthesis. British Journal of Mathematical and Statistical Psychology, 59, 1–34.CrossRef Google Scholar PubMed

Steinley, D. (2006). Profiling local optima in K-means clustering: Developing a diagnostic technique. Psychological Methods, 11, 178–192.CrossRef Google Scholar PubMed

Steinley, D., & Brusco, M.J. (2007). Initializing K-means batch clustering: A critical evaluation of several techniques. Journal of Classification, 24, 99–121.CrossRef Google Scholar

Steinley, D., & Brusco, M.J. (2008, in press). A new variable weighting and selection procedure for K-means cluster analysis. Multivariate Behavioral Research.CrossRef Google Scholar

Steinley, D., & Henson, R. (2005). OCLUS: An analytic method for generating clusters with known overlap. Journal of Classification, 22, 221–250.CrossRef Google Scholar

Theise, E.S. (1989). Finding a subset of stimulus-response pairs with minimum total confusion: A binary integer programming approach. Human Factors, 31, 291–305.CrossRef Google Scholar

Thorndike, R.L. (1953). Who belongs in the family?. Psychometrika, 18, 267–276.CrossRef Google Scholar

Van Ness, J.W. (1973). Admissible clustering procedures II. Biometrika, 60, 422–424.CrossRef Google Scholar

Article contents

Order-Constrained Solutions in K-Means Clustering: Even Better Than Being Globally Optimal

Abstract

Keywords

Access options

Article purchase

Temporarily unavailable

References

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests