Variable Neighborhood Search Heuristics for Selecting a Subset of Variables in Principal Component Analysis

Michael J. Brusco; Renu Singh; Douglas Steinley

doi:10.1007/s11336-009-9130-3

Variable Neighborhood Search Heuristics for Selecting a Subset of Variables in Principal Component Analysis

Published online by Cambridge University Press: 01 January 2025

Michael J. Brusco ,

Renu Singh and

Douglas Steinley

Show author details

Michael J. Brusco*: Affiliation:
Florida State University
Renu Singh: Affiliation:
South Carolina State University
Douglas Steinley: Affiliation:
University of Missouri
*: Requests for reprints should be sent to Michael J. Brusco, Department of Marketing, College of Business, Florida State University, Tallahassee, FL 32306-1110, USA. E-mail: [email protected]

Article contents

Abstract
Footnotes
References

Get access

Rights & Permissions

Abstract

The selection of a subset of variables from a pool of candidates is an important problem in several areas of multivariate statistics. Within the context of principal component analysis (PCA), a number of authors have argued that subset selection is crucial for identifying those variables that are required for correct interpretation of the components. In this paper, we adapt the variable neighborhood search (VNS) paradigm to develop two heuristics for variable selection in PCA. The performances of these heuristics were compared to those obtained by a branch-and-bound algorithm, as well as forward stepwise, backward stepwise, and tabu search heuristics. In the first experiment, which considered candidate pools of 18 to 30 variables, the VNS heuristics matched the optimal subset obtained by the branch-and-bound algorithm more frequently than their competitors. In the second experiment, which considered candidate pools of 54 to 90 variables, the VNS heuristics provided better solutions than their competitors for a large percentage of the test problems. An application to a real-world data set is provided to demonstrate the importance of variable selection in the context of PCA.

Keywords

combinatorial optimization principal component analysis variable selection variable neighborhood search tabu search

Type: Theory and Methods
Information: Psychometrika , Volume 74 , Issue 4 , December 2009 , pp. 705 - 726

DOI: https://doi.org/10.1007/s11336-009-9130-3 [Opens in a new window]
Copyright: Copyright © 2009 The Psychometric Society

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Article purchase

Temporarily unavailable

Footnotes

Dr. Steinley was supported by grant K25AA017456 from the National Institute on Alcohol Abuse and Alcoholism. We appreciate the thoughtful and constructive comments of the Editor, Associate Editor, and three anonymous reviewers.

References

Aaker, J.L. (1997). Dimensions of brand personality. Journal of Marketing Research, 34, 347–356.CrossRef Google Scholar

Aarts, E., Korst, J. (1989). Simulated annealing and Boltzmann machines: a stochastic approach to combinatorial optimization and neural computing, New York: Wiley.Google Scholar

Balas, E. (1965). An additive algorithm for solving linear programs with zero-one variables. Operations Research, 13, 517–546.CrossRef Google Scholar

Beale, E.M.L., Kendall, M.G., Mann, D.W. (1967). The discarding of variables in multivariate analysis. Biometrika, 54, 357–366.CrossRef Google Scholar PubMed

Bentler, P.M. (1977). Factor simplicity index and transformations. Psychometrika, 42, 277–295.CrossRef Google Scholar

Brusco, M.J. (2002). A branch-and-bound method for fitting anti-Robinson structures to symmetric dissimilarity matrices. Psychometrika, 67(3), 459–471.CrossRef Google Scholar

Brusco, M.J. (2006). A repetitive branch-and-bound algorithm for minimum within-cluster sums of squares partitioning. Psychometrika, 71, 347–363.CrossRef Google Scholar PubMed

Brusco, M.J., Cradit, J.D. (2001). A variable-selection heuristic for k-means clustering. Psychometrika, 66, 249–270.CrossRef Google Scholar

Brusco, M.J., Stahl, S. (2005). Optimal least-squares unidimensional scaling: improved branch-and-bound procedures and a comparison to dynamic programming. Psychometrika, 70, 253–270.CrossRef Google Scholar

Brusco, M.J., Steinley, D. (2007). A comparison of heuristic procedures for minimum within-cluster sums of squares partitioning. Psychometrika, 72, 583–600.CrossRef Google Scholar

Brusco, M.J., Steinley, D., & Cradit, J.D. (2009, in press). An exact algorithm for hierarchically well-formulated subsets in second-order polynomial regression. Technometrics.CrossRef Google Scholar

Burke, E.K., Curtois, T., Post, G., Qu, R., Veltman, B. (2008). A hybrid heuristic ordering and variable neighborhood search for the nurse rostering problem. European Journal of Operational Research, 188, 330–341.CrossRef Google Scholar

Cadima, J., Cerdeira, J.O., Minhoto, M. (2004). Computational aspects of algorithms for variable selection in the context of principal components. Computational Statistics & Data Analysis, 47, 225–236.CrossRef Google Scholar

Cadima, J., Jolliffe, I.T. (1995). Loadings and correlations in the interpretation of principal components. Journal of Applied Statistics, 22, 203–214.CrossRef Google Scholar

Cadima, J., Jolliffe, I.T. (2001). Variable selection and the interpretation of principal subspaces. Journal of Agricultural, Biological and Environmental Statistics, 6, 62–79.CrossRef Google Scholar

Derpich, I., Vera, J.R. (2006). Improving the efficiency of branch and bound algorithm for integer programming based on “flatness” information. European Journal of Operational Research, 174, 92–101.CrossRef Google Scholar

Diehr, G. (1985). Evaluation of a branch and bound algorithm for clustering. SIAM Journal for Scientific and Statistical Computing, 6, 268–284.CrossRef Google Scholar

Dray, S. (2008). On the number of principal components: a test of dimensionality based on measurements of similarity between matrices. Computational Statistics & Data Analysis, 52, 2228–2237.CrossRef Google Scholar

Drezner, Z., Marcoulides, G.A., Salhi, S. (1999). Tabu search model selection in multiple regression analysis. Communications in Statistics, 28, 349–367.CrossRef Google Scholar

Duarte Silva, A.P. (2001). Efficient variable screening for multivariate analysis. Journal of Multivariate Analysis, 76, 35–62.CrossRef Google Scholar

Duarte Silva, A.P. (2002). Discarding variables in principal component analysis: algorithms for all-subsets comparisons. Computational Statistics, 17, 251–271.CrossRef Google Scholar

Escoufier, Y. (1973). Le traitement des variable vectorielles. Biometrics, 29, 750–760.CrossRef Google Scholar

Fueda, K., Iizuka, M., Mori, Y. (2009). Variable selection in multivariate methods using global score estimation. Computational Statistics, 24, 127–144.CrossRef Google Scholar

Furnival, G.M., Wilson, R.W. (1974). Regression by leaps and bounds. Technometrics, 16, 499–512.CrossRef Google Scholar

Garcia, C.G., Pérez-Brito, D., Campos, V., Marti, R. (2006). Variable neighborhood search for the linear ordering problem. Computers & Operations Research, 33, 3549–3565.CrossRef Google Scholar

Geoffrion, A.M., Marsten, R.E. (1972). Integer programming algorithms: a framework and state-of-the-art survey. Management Science, 18, 465–491.CrossRef Google Scholar

Glover, F., Laguna, M. (1993). Tabu search. In Reeves, C. (Eds.), Modern heuristic techniques for combinatorial problems (pp. 70–141). Oxford: Blackwell.Google Scholar

Goel, A., Gruhn, V. (2008). A general vehicle routing problem. European Journal of Operational Research, 191, 650–660.CrossRef Google Scholar

Goldberg, D.E. (1989). Genetic algorithms in search, optimization, and machine learning, New York: Addison–Wesley.Google Scholar

Hansen, P., Mladenović, N. (1997). Variable neighborhood search for the p-median. Location Science, 5, 207–226.CrossRef Google Scholar

Hansen, P., Mladenović, N. (2003). Variable neighborhood search. In Glover, F.W., Kochenberger, G.A. (Eds.), Handbook of metaheuristics (pp. 145–184). Norwell: Kluwer Academic.CrossRef Google Scholar

Hogarty, K.Y., Kromrey, J.D., Ferron, J.M., Hines, C.V. (2004). Selection of variables in exploratory factor analysis: an empirical comparison of a stepwise and traditional approach. Psychometrika, 69, 593–611.CrossRef Google Scholar

Iizuka, M., Mori, Y., Tarumi, T., Tanaka, Y. (2003). Computer intensive trials to determine the number of variables in PCA. Journal of the Japanese Society of Computational Statistics, 15, 337–345.CrossRef Google Scholar

Jolliffe, I.T. (1972). Discarding variables in a principal component analysis, I: artificial data. Applied Statistics, 21, 160–173.CrossRef Google Scholar

Jolliffe, I.T. (1973). Discarding variables in a principal component analysis, II: real data. Applied Statistics, 22, 21–31.CrossRef Google Scholar

Jolliffe, I.T. (2002). Principal component analysis, (2nd ed.). New York: Springer.Google Scholar

Kaiser, H.F. (1958). The varimax criterion for analytic rotation in factor analysis. Psychometrika, 23, 187–200.CrossRef Google Scholar

Kaiser, H.F. (1974). An index of factorial simplicity. Psychometrika, 39, 31–36.CrossRef Google Scholar

Kano, Y., Harada, A. (2000). Stepwise variable selection in factor analysis. Psychometrika, 65, 7–22.CrossRef Google Scholar

Krzanowski, W.J. (1987). Selection of variables to preserve multivariate data structure using principal components. Applied Statistics, 36, 22–33.CrossRef Google Scholar

Land, A.H., Doig, A. (1960). An automatic method of solving discrete programming problems. Econometrica, 28, 497–520.CrossRef Google Scholar

Lejeune, M.A. (2006). A variable neighborhood decomposition search method for supply chain management planning problems. European Journal of Operational Research, 175, 959–976.CrossRef Google Scholar

Lorenzo-Seva, U., ten Berge, J.M.F. (2006). Tucker’s congruence coefficient as a meaningful index of factor similarity. Methodology, 2, 57–64.CrossRef Google Scholar

MathWorks, Inc. (2005). Using MATLAB (version 7), Natick: The MathWorks, Inc.Google Scholar

McCabe, G.P. (1975). Computations for variable selection in discriminant analysis. Technometrics, 17, 103–109.CrossRef Google Scholar

McCabe, G.P. (1984). Principal variables. Technometrics, 26, 137–144.CrossRef Google Scholar

McKay, R.J., Campbell, N.A. (1982). Variable selection techniques in discriminant analysis: I. Description. British Journal of Mathematical and Statistical Psychology, 35, 1–29.CrossRef Google Scholar

McKay, R.J., Campbell, N.A. (1982). Variable selection techniques in discriminant analysis: II. Allocation. British Journal of Mathematical and Statistical Psychology, 35, 30–41.CrossRef Google Scholar

Miller, A.J. (2002). Subset selection in regression, (2nd ed.). London: Chapman and Hall.CrossRef Google Scholar

Mladenović, N., Hansen, P. (1997). Variable neighborhood search. Computers & Operations Research, 24, 1097–1100.CrossRef Google Scholar

Mori, Y., Iizuka, M., Tarumi, T., Tanaka, Y. (2007). Variable selection in principal component analysis. In Härdle, W., Mori, Y., Vieu, P. (Eds.), Statistical methods for biostatistics and related fields (pp. 265–284). Berlin: Springer.CrossRef Google Scholar

Peres-Neto, P.R., Jackson, D.A., Somers, K.M. (2005). How many principal components?: stopping rules for determining the number of non-trivial axes revisited. Computational Statistics Data Analysis, 49, 974–997.CrossRef Google Scholar

Ramsay, J.O., ten Berge, J.M.F., Styan, G.P.H. (1984). Matrix correlation. Psychometrika, 49, 403–423.CrossRef Google Scholar

Robert, P., Escoufier, Y. (1976). A unifying tool for linear multivariate statistical methods. Applied Statistics, 25, 257–265.CrossRef Google Scholar

Steinley, D., Brusco, M.J. (2008). Selection of variables in cluster analysis: an empirical comparison of eight procedures. Psychometrika, 73, 125–144.CrossRef Google Scholar

Tanaka, Y., Mori, Y. (1997). Principal component analysis based on a subset of variables: variable selection and sensitivity analysis. American Journal of Mathematical and Management Sciences, 17, 61–89.CrossRef Google Scholar

van der Linden, W.J., Boekkooi-Timminga, E. (1988). A zero-one programming approach to Gulliksen’s matched random subtests method. Applied Psychological Measurement, 12, 201–209.CrossRef Google Scholar

Article contents

Variable Neighborhood Search Heuristics for Selecting a Subset of Variables in Principal Component Analysis

Abstract

Keywords

Access options

Article purchase

Temporarily unavailable

Footnotes

References

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests