Hostname: page-component-745bb68f8f-b95js Total loading time: 0 Render date: 2025-01-08T12:12:22.677Z Has data issue: false hasContentIssue false

Variable Neighborhood Search Heuristics for Selecting a Subset of Variables in Principal Component Analysis

Published online by Cambridge University Press:  01 January 2025

Michael J. Brusco*
Affiliation:
Florida State University
Renu Singh
Affiliation:
South Carolina State University
Douglas Steinley
Affiliation:
University of Missouri
*
Requests for reprints should be sent to Michael J. Brusco, Department of Marketing, College of Business, Florida State University, Tallahassee, FL 32306-1110, USA. E-mail: [email protected]

Abstract

The selection of a subset of variables from a pool of candidates is an important problem in several areas of multivariate statistics. Within the context of principal component analysis (PCA), a number of authors have argued that subset selection is crucial for identifying those variables that are required for correct interpretation of the components. In this paper, we adapt the variable neighborhood search (VNS) paradigm to develop two heuristics for variable selection in PCA. The performances of these heuristics were compared to those obtained by a branch-and-bound algorithm, as well as forward stepwise, backward stepwise, and tabu search heuristics. In the first experiment, which considered candidate pools of 18 to 30 variables, the VNS heuristics matched the optimal subset obtained by the branch-and-bound algorithm more frequently than their competitors. In the second experiment, which considered candidate pools of 54 to 90 variables, the VNS heuristics provided better solutions than their competitors for a large percentage of the test problems. An application to a real-world data set is provided to demonstrate the importance of variable selection in the context of PCA.

Type
Theory and Methods
Copyright
Copyright © 2009 The Psychometric Society

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Footnotes

Dr. Steinley was supported by grant K25AA017456 from the National Institute on Alcohol Abuse and Alcoholism. We appreciate the thoughtful and constructive comments of the Editor, Associate Editor, and three anonymous reviewers.

References

Aaker, J.L. (1997). Dimensions of brand personality. Journal of Marketing Research, 34, 347356.CrossRefGoogle Scholar
Aarts, E., Korst, J. (1989). Simulated annealing and Boltzmann machines: a stochastic approach to combinatorial optimization and neural computing, New York: Wiley.Google Scholar
Balas, E. (1965). An additive algorithm for solving linear programs with zero-one variables. Operations Research, 13, 517546.CrossRefGoogle Scholar
Beale, E.M.L., Kendall, M.G., Mann, D.W. (1967). The discarding of variables in multivariate analysis. Biometrika, 54, 357366.CrossRefGoogle ScholarPubMed
Bentler, P.M. (1977). Factor simplicity index and transformations. Psychometrika, 42, 277295.CrossRefGoogle Scholar
Brusco, M.J. (2002). A branch-and-bound method for fitting anti-Robinson structures to symmetric dissimilarity matrices. Psychometrika, 67(3), 459471.CrossRefGoogle Scholar
Brusco, M.J. (2006). A repetitive branch-and-bound algorithm for minimum within-cluster sums of squares partitioning. Psychometrika, 71, 347363.CrossRefGoogle ScholarPubMed
Brusco, M.J., Cradit, J.D. (2001). A variable-selection heuristic for k-means clustering. Psychometrika, 66, 249270.CrossRefGoogle Scholar
Brusco, M.J., Stahl, S. (2005). Optimal least-squares unidimensional scaling: improved branch-and-bound procedures and a comparison to dynamic programming. Psychometrika, 70, 253270.CrossRefGoogle Scholar
Brusco, M.J., Steinley, D. (2007). A comparison of heuristic procedures for minimum within-cluster sums of squares partitioning. Psychometrika, 72, 583600.CrossRefGoogle Scholar
Brusco, M.J., Steinley, D., & Cradit, J.D. (2009, in press). An exact algorithm for hierarchically well-formulated subsets in second-order polynomial regression. Technometrics.CrossRefGoogle Scholar
Burke, E.K., Curtois, T., Post, G., Qu, R., Veltman, B. (2008). A hybrid heuristic ordering and variable neighborhood search for the nurse rostering problem. European Journal of Operational Research, 188, 330341.CrossRefGoogle Scholar
Cadima, J., Cerdeira, J.O., Minhoto, M. (2004). Computational aspects of algorithms for variable selection in the context of principal components. Computational Statistics & Data Analysis, 47, 225236.CrossRefGoogle Scholar
Cadima, J., Jolliffe, I.T. (1995). Loadings and correlations in the interpretation of principal components. Journal of Applied Statistics, 22, 203214.CrossRefGoogle Scholar
Cadima, J., Jolliffe, I.T. (2001). Variable selection and the interpretation of principal subspaces. Journal of Agricultural, Biological and Environmental Statistics, 6, 6279.CrossRefGoogle Scholar
Derpich, I., Vera, J.R. (2006). Improving the efficiency of branch and bound algorithm for integer programming based on “flatness” information. European Journal of Operational Research, 174, 92101.CrossRefGoogle Scholar
Diehr, G. (1985). Evaluation of a branch and bound algorithm for clustering. SIAM Journal for Scientific and Statistical Computing, 6, 268284.CrossRefGoogle Scholar
Dray, S. (2008). On the number of principal components: a test of dimensionality based on measurements of similarity between matrices. Computational Statistics & Data Analysis, 52, 22282237.CrossRefGoogle Scholar
Drezner, Z., Marcoulides, G.A., Salhi, S. (1999). Tabu search model selection in multiple regression analysis. Communications in Statistics, 28, 349367.CrossRefGoogle Scholar
Duarte Silva, A.P. (2001). Efficient variable screening for multivariate analysis. Journal of Multivariate Analysis, 76, 3562.CrossRefGoogle Scholar
Duarte Silva, A.P. (2002). Discarding variables in principal component analysis: algorithms for all-subsets comparisons. Computational Statistics, 17, 251271.CrossRefGoogle Scholar
Escoufier, Y. (1973). Le traitement des variable vectorielles. Biometrics, 29, 750760.CrossRefGoogle Scholar
Fueda, K., Iizuka, M., Mori, Y. (2009). Variable selection in multivariate methods using global score estimation. Computational Statistics, 24, 127144.CrossRefGoogle Scholar
Furnival, G.M., Wilson, R.W. (1974). Regression by leaps and bounds. Technometrics, 16, 499512.CrossRefGoogle Scholar
Garcia, C.G., Pérez-Brito, D., Campos, V., Marti, R. (2006). Variable neighborhood search for the linear ordering problem. Computers & Operations Research, 33, 35493565.CrossRefGoogle Scholar
Geoffrion, A.M., Marsten, R.E. (1972). Integer programming algorithms: a framework and state-of-the-art survey. Management Science, 18, 465491.CrossRefGoogle Scholar
Glover, F., Laguna, M. (1993). Tabu search. In Reeves, C. (Eds.), Modern heuristic techniques for combinatorial problems (pp. 70141). Oxford: Blackwell.Google Scholar
Goel, A., Gruhn, V. (2008). A general vehicle routing problem. European Journal of Operational Research, 191, 650660.CrossRefGoogle Scholar
Goldberg, D.E. (1989). Genetic algorithms in search, optimization, and machine learning, New York: Addison–Wesley.Google Scholar
Hansen, P., Mladenović, N. (1997). Variable neighborhood search for the p-median. Location Science, 5, 207226.CrossRefGoogle Scholar
Hansen, P., Mladenović, N. (2003). Variable neighborhood search. In Glover, F.W., Kochenberger, G.A. (Eds.), Handbook of metaheuristics (pp. 145184). Norwell: Kluwer Academic.CrossRefGoogle Scholar
Hogarty, K.Y., Kromrey, J.D., Ferron, J.M., Hines, C.V. (2004). Selection of variables in exploratory factor analysis: an empirical comparison of a stepwise and traditional approach. Psychometrika, 69, 593611.CrossRefGoogle Scholar
Iizuka, M., Mori, Y., Tarumi, T., Tanaka, Y. (2003). Computer intensive trials to determine the number of variables in PCA. Journal of the Japanese Society of Computational Statistics, 15, 337345.CrossRefGoogle Scholar
Jolliffe, I.T. (1972). Discarding variables in a principal component analysis, I: artificial data. Applied Statistics, 21, 160173.CrossRefGoogle Scholar
Jolliffe, I.T. (1973). Discarding variables in a principal component analysis, II: real data. Applied Statistics, 22, 2131.CrossRefGoogle Scholar
Jolliffe, I.T. (2002). Principal component analysis, (2nd ed.). New York: Springer.Google Scholar
Kaiser, H.F. (1958). The varimax criterion for analytic rotation in factor analysis. Psychometrika, 23, 187200.CrossRefGoogle Scholar
Kaiser, H.F. (1974). An index of factorial simplicity. Psychometrika, 39, 3136.CrossRefGoogle Scholar
Kano, Y., Harada, A. (2000). Stepwise variable selection in factor analysis. Psychometrika, 65, 722.CrossRefGoogle Scholar
Krzanowski, W.J. (1987). Selection of variables to preserve multivariate data structure using principal components. Applied Statistics, 36, 2233.CrossRefGoogle Scholar
Land, A.H., Doig, A. (1960). An automatic method of solving discrete programming problems. Econometrica, 28, 497520.CrossRefGoogle Scholar
Lejeune, M.A. (2006). A variable neighborhood decomposition search method for supply chain management planning problems. European Journal of Operational Research, 175, 959976.CrossRefGoogle Scholar
Lorenzo-Seva, U., ten Berge, J.M.F. (2006). Tucker’s congruence coefficient as a meaningful index of factor similarity. Methodology, 2, 5764.CrossRefGoogle Scholar
MathWorks, Inc. (2005). Using MATLAB (version 7), Natick: The MathWorks, Inc.Google Scholar
McCabe, G.P. (1975). Computations for variable selection in discriminant analysis. Technometrics, 17, 103109.CrossRefGoogle Scholar
McCabe, G.P. (1984). Principal variables. Technometrics, 26, 137144.CrossRefGoogle Scholar
McKay, R.J., Campbell, N.A. (1982). Variable selection techniques in discriminant analysis: I. Description. British Journal of Mathematical and Statistical Psychology, 35, 129.CrossRefGoogle Scholar
McKay, R.J., Campbell, N.A. (1982). Variable selection techniques in discriminant analysis: II. Allocation. British Journal of Mathematical and Statistical Psychology, 35, 3041.CrossRefGoogle Scholar
Miller, A.J. (2002). Subset selection in regression, (2nd ed.). London: Chapman and Hall.CrossRefGoogle Scholar
Mladenović, N., Hansen, P. (1997). Variable neighborhood search. Computers & Operations Research, 24, 10971100.CrossRefGoogle Scholar
Mori, Y., Iizuka, M., Tarumi, T., Tanaka, Y. (2007). Variable selection in principal component analysis. In Härdle, W., Mori, Y., Vieu, P. (Eds.), Statistical methods for biostatistics and related fields (pp. 265284). Berlin: Springer.CrossRefGoogle Scholar
Peres-Neto, P.R., Jackson, D.A., Somers, K.M. (2005). How many principal components?: stopping rules for determining the number of non-trivial axes revisited. Computational Statistics Data Analysis, 49, 974997.CrossRefGoogle Scholar
Ramsay, J.O., ten Berge, J.M.F., Styan, G.P.H. (1984). Matrix correlation. Psychometrika, 49, 403423.CrossRefGoogle Scholar
Robert, P., Escoufier, Y. (1976). A unifying tool for linear multivariate statistical methods. Applied Statistics, 25, 257265.CrossRefGoogle Scholar
Steinley, D., Brusco, M.J. (2008). Selection of variables in cluster analysis: an empirical comparison of eight procedures. Psychometrika, 73, 125144.CrossRefGoogle Scholar
Tanaka, Y., Mori, Y. (1997). Principal component analysis based on a subset of variables: variable selection and sensitivity analysis. American Journal of Mathematical and Management Sciences, 17, 6189.CrossRefGoogle Scholar
van der Linden, W.J., Boekkooi-Timminga, E. (1988). A zero-one programming approach to Gulliksen’s matched random subtests method. Applied Psychological Measurement, 12, 201209.CrossRefGoogle Scholar