Hostname: page-component-cd9895bd7-dk4vv Total loading time: 0 Render date: 2024-12-17T09:29:39.493Z Has data issue: false hasContentIssue false

Choosing Imputation Models

Published online by Cambridge University Press:  10 December 2021

Moritz Marbach*
Affiliation:
The Bush School of Government & Public Service, Texas A&M University, 4220 TAMU, College Station, TX 77843-4220, USA. Email: [email protected]
*
Corresponding author Moritz Marbach

Abstract

Imputing missing values is an important preprocessing step in data analysis, but the literature offers little guidance on how to choose between imputation models. This letter suggests adopting the imputation model that generates a density of imputed values most similar to those of the observed values for an incomplete variable after balancing all other covariates. We recommend stable balancing weights as a practical approach to balance covariates whose distribution is expected to differ if the values are not missing completely at random. After balancing, discrepancy statistics can be used to compare the density of imputed and observed values. We illustrate the application of the suggested approach using simulated and real-world survey data from the American National Election Study, comparing popular imputation approaches including random forests, hot-deck, predictive mean matching, and multivariate normal imputation. An R package implementing the suggested approach accompanies this letter.

Type
Letter
Copyright
© The Author(s) 2021. Published by Cambridge University Press on behalf of the Society for Political Methodology

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Footnotes

Edited by Daniel Hopkins

References

Abayomi, K., Gelman, A., and Levy, M.. 2008. “ Diagnostics for Multivariate Imputations .” Journal of the Royal Statistical Society: Series C 57 (3): 273291.Google Scholar
Andridge, R. R., and Little, R. J.. 2010. “A Review of Hot Deck Imputation for Survey Non-response.” International Statistical Review 78 (1): 4064.CrossRefGoogle ScholarPubMed
Bondarenko, I., and Raghunathan, T.. 2016. “Graphical and Numerical Diagnostic Tools to Assess Suitability of Multiple Imputations and Imputation Models.” Statistics in Medicine 35 (17): 30073020.CrossRefGoogle ScholarPubMed
Cranmer, S. J., and Gill, J.. 2013. “We Have to be Discrete About This: A Non-Parametric Imputation Technique for Missing Categorical Data.” British Journal of Political Science 43 (2): 425449.CrossRefGoogle Scholar
Doove, L. L., Van Buuren, S., and Dusseldorp, E.. 2014. “Recursive Partitioning for Missing Data Imputation in the Presence of Interaction Effects.” Computational Statistics & Data Analysis 72: 92104.CrossRefGoogle Scholar
Franklin, J. M., Rassen, J. A., Ackermann, D., Bartels, D. B., and Schneeweiss, S.. 2014. “Metrics for Covariate Balance in Cohort Studies of Causal Effects.” Statistics in Medicine 33(10): 16851699.CrossRefGoogle ScholarPubMed
Hainmueller, J. 2012. “Entropy Balancing for Causal Effects: A Multivariate Reweighting Method to Produce Balanced Samples in Observational Studies.” Political Analysis 20 (1): 2546.CrossRefGoogle Scholar
Honaker, J., King, G., and Blackwell, M.. 2011. “Amelia II: A Program for Missing Data.” Journal of Statistical Software 45(7): 147.CrossRefGoogle Scholar
King, G., Honaker, J., Joseph, A., and Scheve, K.. 2001. “Analyzing Incomplete Political Science Data: An Alternative Algorithm for Multiple Imputation.” American Political Science Review 95 (1): 4969.CrossRefGoogle Scholar
Kropko, J., Goodrich, B., Gelman, A., and Hill, J.. 2014. “Multiple Imputation for Continuous and Categorical Data: Comparing Joint Multivariate Normal And Conditional Approaches.” Political Analysis 22 (4): 497519.CrossRefGoogle Scholar
Lall, R. 2016. “How Multiple Imputation Makes a Difference.” Political Analysis 24 (4): 414433.CrossRefGoogle Scholar
Little, R. J. 1988. “Missing-data Adjustments in Large Surveys.” Journal of Business & Economic Statistics 6 (3): 287296.Google Scholar
Little, R. J. A., and Rubin, D. B.. 2019. Statistical Analysis with Missing Data (3rd edn.). New York: Wiley.Google Scholar
Marbach, M. 2021. “Replication Data for: Choosing Imputation Models.” https://doi.org/10.7910/DVN/IIXGBM, Harvard Dataverse, V1.CrossRefGoogle Scholar
Mealli, F., and Rubin, D. B.. 2015. “Clarifying Missing at Random and Related Definitions, and Implications When Coupled With Exchangeability.” Biometrika 102 (4): 9951000.CrossRefGoogle Scholar
Rubin, D. B. 1976. “Inference and Missing Data.” Biometrika 63 (3): 581592.CrossRefGoogle Scholar
Rubin, D. B. 1987. Multiple Imputation for Nonresponse in Surveys. New York: Wiley.CrossRefGoogle Scholar
Rubin, D. B. 1996. “Multiple Imputation After 18+ Years.” Journal of the American Statistical Association 91 (434): 473489.CrossRefGoogle Scholar
Schafer, J. L. 1997. Analysis of Incomplete Multivariate Data. Boca Raton: Chapman & Hall.CrossRefGoogle Scholar
Seaman, S. R., White, I. R., Copas, A. J., and Li, L.. 2012. “Combining Multiple Imputation and Inverse-Probability Weighting.” Biometrics 68 (1): 129137.CrossRefGoogle ScholarPubMed
Stekhoven, D. J., and Bühlmann, P.. 2012. “MissForest—Non-parametric Missing Value Imputation for Mixed-Type Data.” Bioinformatics 28 (1): 112118.CrossRefGoogle ScholarPubMed
Van Buuren, S. 2007. “Multiple Imputation of Discrete and Continuous Data by Fully Conditional Specification.” Statistical Methods in Medical Research 16 (3): 219242.CrossRefGoogle ScholarPubMed
Van Buuren, S. 2018. Flexible Imputation of Missing Data. Boca Raton: Chapman & Hall.CrossRefGoogle Scholar
Van Buuren, S., Brand, J. P., Groothuis-Oudshoorn, C. G., and Rubin, D. B.. 2006. “Fully Conditional Specification in Multivariate Imputation.” Journal of Statistical Computation and Simulation 76 (12): 10491064.CrossRefGoogle Scholar
Van Buuren, S., and Groothuis-Oudshoorn, K.. 2011. “MICE: Multivariate Imputation by Chained Equations in R.” Journal of Statistical Software 45 (3): 167.Google Scholar
Zubizarreta, J. R. 2015. “Stable Weights that Balance Covariates for Estimation With Incomplete Outcome Data.” Journal of the American Statistical Association 110 (511): 910922.CrossRefGoogle Scholar
Supplementary material: Link

Marbach Dataset

Link
Supplementary material: PDF

Marbach supplementary material

Marbach supplementary material

Download Marbach supplementary material(PDF)
PDF 183 KB