Hostname: page-component-745bb68f8f-v2bm5 Total loading time: 0 Render date: 2025-01-08T12:09:11.840Z Has data issue: false hasContentIssue false

Optimal Partitioning of a Data Set Based on the p-Median Model

Published online by Cambridge University Press:  01 January 2025

Michael J. Brusco*
Affiliation:
Florida State University
Hans-Friedrich Köhn
Affiliation:
University of Missouri-Columbia
*
Requests for reprints should be sent to Michael J. Brusco, Department of Marketing, Florida State University, Tallahassee, FL 32306-1110, USA. E-mail: [email protected]

Abstract

Although the K-means algorithm for minimizing the within-cluster sums of squared deviations from cluster centroids is perhaps the most common method for applied cluster analyses, a variety of other criteria are available. The p-median model is an especially well-studied clustering problem that requires the selection of p objects to serve as cluster centers. The objective is to choose the cluster centers such that the sum of the Euclidean distances (or some other dissimilarity measure) of objects assigned to each center is minimized. Using 12 data sets from the literature, we demonstrate that a three-stage procedure consisting of a greedy heuristic, Lagrangian relaxation, and a branch-and-bound algorithm can produce globally optimal solutions for p-median problems of nontrivial size (several hundred objects, five or more variables, and up to 10 clusters). We also report the results of an application of the p-median model to an empirical data set from the telecommunications industry.

Type
Theory and Methods
Copyright
Copyright © 2007 The Psychometric Society

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Agmon, S. (1954). The relaxation method for linear inequalities. Canadian Journal of Mathematics, 6, 382392.CrossRefGoogle Scholar
Anderson, E. (1935). The irises of the Gaspé peninsula. Bulletin of the American Iris Society, 59, 25.Google Scholar
Avella, P., Sassano, A., & Vasil’ev, I. (2003). Computational study of large-scale p -median problems. Technical Report, Dipartimento di Informatica e Sistemistica, Università di Roma “La Sapienza.”Google Scholar
Beltran, C., Tadonki, C., & Vial, J. (2006). Solving the p-median problem with a semi-Lagrangian relaxation. Computational Optimization and Applications, June 5, 2006, DOI: 10.1007/s10589-006-6513-6.CrossRefGoogle Scholar
Brusco, M.J. (2006). A repetitive branch-and-bound algorithm for minimum within-cluster sums of squares partitioning. Psychometrika, 71, 347363.CrossRefGoogle ScholarPubMed
Brusco, M.J., & Stahl, S. (2005). Branch-and-bound applications in combinatorial data analysis, New York: Springer.Google Scholar
Brusco, M.J., Cradit, J.D., & Tashchian, A. (2003). Multicriterion clusterwise regression for joint segmentation settings: An application to customer value. Journal of Marketing Research, 40, 225234.CrossRefGoogle Scholar
Christofides, N., & Beasley, J.E. (1982). A tree search algorithm for the p-median problem. European Journal of Operational Research, 10, 196204.CrossRefGoogle Scholar
Cornuejols, G., Fisher, M.L., & Nemhauser, G.L. (1977). Location of bank accounts to optimize float: An analytic study of exact and approximate algorithms. Management Science, 23, 789810.CrossRefGoogle Scholar
Du Merle, O., & Vial, J.-P. (2002). Proximal-ACCPM, a cutting plane method for column generation and Lagrangian relaxation: Application to the p -median problem. Technical Report 2002.23, HEC Genève, University of Genève.Google Scholar
Du Merle, O., Hansen, P., Jaumard, B., & Mladenović, N. (2000). An interior point algorithm for minimum sum-of-squares clustering. SIAM Journal on Scientific Computing, 21, 14851505.CrossRefGoogle Scholar
Erlenkotter, D. (1977). Facility location with price-sensitive demands: Private, public, and quasi-public. Management Science, 24, 378386.CrossRefGoogle Scholar
Fisher, R.A. (1936). The use of multiple measurements in taxonomic problems. Annals of Eugenics, 7, 179188.CrossRefGoogle Scholar
Fisher, M.L. (1981). The Lagrangian relaxation method for solving integer programming problems. Management Science, 27, 118.CrossRefGoogle Scholar
Forgy, E.W. (1965). Cluster analyses of multivariate data: Efficiency versus interpretability of classifications. Biometrics, 21, 768.Google Scholar
Grötschel, M., & Holland, O. (1991). Solution of large-scale symmetric traveling salesman problems. Mathematical Programming, 51, 141202.CrossRefGoogle Scholar
Hair, J.F., Anderson, R.E., Tatham, R.L., & Black, W.C. (1998). Multivariate data analysis, (5th ed.). Upper Saddle River: Prentice-Hall.Google Scholar
Hakimi, S.L. (1964). Optimum locations of switching centers and the absolute centers and medians of a graph. Operations Research, 12, 450459.CrossRefGoogle Scholar
Hanjoul, P., &Peeters, D. (1985). A comparison of two dual-based procedures for solving the p-median problem. European Journal of Operational Research, 20, 387396.CrossRefGoogle Scholar
Hansen, P., & Jaumard, B. (1997). Cluster analysis and mathematical programming. Mathematical Programming, 79, 191215.CrossRefGoogle Scholar
Hansen, P., Mladenoviĉ, N., & Perez-Brito, D. (2001). Variable neighborhood decomposition search. Journal of Heuristics, 7, 335350.CrossRefGoogle Scholar
Hartigan, J.A. (1975). Clustering algorithms, New York: Wiley.Google Scholar
Hartigan, J.A., & Wong, M.A. (1979). Algorithm AS136: A k-means clustering program. Applied Statistics, 28, 100128.CrossRefGoogle Scholar
Heinz, G., Peterson, L.J., Johnson, R.W., & Kerk, C.J. (2003). Exploring relationships in body dimensions. Journal of Statistics Education, 11. Available at: www.amstat.org/publications/jse/v11n2/datasets.heinz.html.Google Scholar
Held, M., & Karp, R.M. (1970). The traveling salesman problem and minimum spanning trees. Operations Research, 18, 11381162.CrossRefGoogle Scholar
Held, M., Wolfe, P., & Crowder, H.P. (1974). Validation of subgradient optimization. Mathematical Programming, 6, 6288.CrossRefGoogle Scholar
Hubert, L.J. (1987). Assignment methods in combinatorial data analysis, New York: Marcel Dekker.Google Scholar
Hubert, L., & Arabie, P. (1985). Comparing partitions. Journal of Classification, 2, 193218.CrossRefGoogle Scholar
Hubert, L.J., & Baker, F.B. (1978). Applications of combinatorial programming to data analysis: The traveling salesman and related problems. Psychometrika, 43, 8191.CrossRefGoogle Scholar
Hubert, L.J., & Schultz, J.V. (1976). Quadratic assignment as a general data analysis strategy. British Journal of Mathematical and Statistical Psychology, 29, 190241.CrossRefGoogle Scholar
Hubert, L., Arabie, P., & Meulman, J. (2001). Combinatorial data analysis: Optimization by dynamic programming, Philadelphia: SIAM.CrossRefGoogle Scholar
Hubert, L., Arabie, P., & Meulman, J. (2006). The structural representation of proximity matrices with MATLAB, Philadelphia: SIAM.CrossRefGoogle Scholar
Johnson, S.C. (1967). Hierarchical clustering schemes. Psychometrika, 32, 241254.CrossRefGoogle ScholarPubMed
Klastorin, T. (1985). The p-median problem for cluster analysis: A comparative test using the mixture model approach. Management Science, 31, 8495.CrossRefGoogle Scholar
Lin, S., & Kernighan, B.W. (1973). An effective heuristic algorithm for the traveling salesman problem. Operations Research, 21, 498516.CrossRefGoogle Scholar
MacQueen, J.B. (1967). Some methods for classification and analysis of multivariate observations. In Le Cam, L.M., Neyman, J. (Eds.), Proceedings of the fifth Berkeley symposium on mathematical statistics and probability (pp. 281297). Berkeley: University of California Press.Google Scholar
Motzkin, T., & Schoenberg, I.J. (1954). The relaxation method for linear inequalities. Canadian Journal of Mathematics, 6, 393404.CrossRefGoogle Scholar
Mulvey, J.M., & Crowder, H.P. (1979). Cluster analysis: An application of Lagrangian relaxation. Management Science, 25, 329340.CrossRefGoogle Scholar
Narula, S.C., Ogbu, U.I., & Samuelson, H.M. (1977). An algorithm for the p-median problem. Operations Research, 25, 709713.CrossRefGoogle Scholar
Rao, M.R. (1971). Cluster analysis and mathematical programming. Journal of the American Statistical Association, 66, 622626.CrossRefGoogle Scholar
Sokal, R.R., & Sneath, P.H.A. (1963). Principles of numerical taxonomy, San Francisco: Freeman.Google Scholar
Späth, H. (1980). Cluster analysis algorithms for data reduction and classification of objects, New York: Wiley.Google Scholar
Steinley, D. (2004). Properties of the Hubert–Arabie adjusted Rand index. Psychological Methods, 9, 386396.CrossRefGoogle ScholarPubMed
Steinley, D. (2006). K-Means clustering: A half-century synthesis. British Journal of Mathematical and Statistical Psychology, 59, 134.CrossRefGoogle ScholarPubMed
Steinley, D. (2006). Profiling local optima in K-means clustering: Developing a diagnostic technique. Psychological Methods, 11, 178192.CrossRefGoogle ScholarPubMed
Teitz, M.B., & Bart, P. (1968). Heuristic methods for estimating the generalized vertex median of a weighted graph. Operations Research, 16, 955961.CrossRefGoogle Scholar
Ward, J.H. (1963). Hierarchical grouping to optimize an objective function. Journal of the American Statistical Association, 58, 236244.CrossRefGoogle Scholar