Hostname: page-component-745bb68f8f-cphqk Total loading time: 0 Render date: 2025-01-07T18:11:42.906Z Has data issue: false hasContentIssue false

Multiple Imputation for Bounded Variables

Published online by Cambridge University Press:  01 January 2025

Marco Geraci*
Affiliation:
University of South Carolina
Alexander McLain
Affiliation:
University of South Carolina
*
Correspondence should be made to Marco Geraci, Department of Epidemiology and Biostatistics, Arnold School of Public Health, University of South Carolina, 915 Greene Street, Columbia, SC 29208, USA. Email: [email protected]

Abstract

Missing data are a common issue in statistical analyses. Multiple imputation is a technique that has been applied in countless research studies and has a strong theoretical basis. Most of the statistical literature on multiple imputation has focused on unbounded continuous variables, with mostly ad hoc remedies for variables with bounded support. These approaches can be unsatisfactory when applied to bounded variables as they can produce misleading inferences. In this paper, we propose a flexible quantile-based imputation model suitable for distributions defined over singly or doubly bounded intervals. Proper support of the imputed values is ensured by applying a family of transformations with singly or doubly bounded range. Simulation studies demonstrate that our method is able to deal with skewness, bimodality, and heteroscedasticity and has superior properties as compared to competing approaches, such as log-normal imputation and predictive mean matching. We demonstrate the application of the proposed imputation procedure by analysing data on mathematical development scores in children from the Millennium Cohort Study, UK. We also show a specific advantage of our methods using a small psychiatric dataset. Our methods are relevant in a number of fields, including education and psychology.

Type
Original Paper
Copyright
Copyright © The 2018 Psychometric Society

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Footnotes

Electronic supplementary materialThe online version of this article (https://doi.org/10.1007/s11336-018-9616-y) contains supplementary material, which is available to authorized users.

References

Aranda-Ordaz, F. J.(1981)On two families of transformations to additivity for binary response data.Biometrika,68(2),357363.CrossRefGoogle Scholar
Bech, P.,Rafaelsen, O. J.(1980)The use of rating scales exemplified by a comparison of the hamilton and the bech-rafaelsen melancholia scale.Acta Psychiatrica Scandinavica,62(S285),128132.CrossRefGoogle Scholar
Bottai, M., &Zhen, H.(2013)Multiple imputation based on conditional quantile estimation.Epidemiology, Biostatistics, and Public Health,10(1),e8758Google Scholar
Box, G. EP., &Cox, D. R.(1964)An analysis of transformations.Journal of the Royal Statistical Society B,26(2),211252.CrossRefGoogle Scholar
Buchinsky, M.(1995)Quantile regression, Box–Cox transformation model, and the US wage structure, 1963–1987.Journal of Econometrics,65(1),109154.CrossRefGoogle Scholar
Chamberlain, G.,Sims, C.(1994)Quantile regression, censoring, and the structure of wages.Advances in econometrics: Sixth world congress,Cambridge:Cambridge University Press.Google Scholar
de Jong, R.,van Buuren, S., &Spiess, M.(2016)Multiple imputation of predictor variables using generalized additive models.Communications in Statistics - Simulation and Computation,45(3),968985.CrossRefGoogle Scholar
Dehbi, H-M,Cortina-Borja, M., &Geraci, M.(2016)Aranda–Ordaz quantile regression for student performance assessment.Journal of Applied Statistics,43(1),5871.CrossRefGoogle Scholar
Demirtas, H.(2009)Multiple imputation under the generalized lambda distribution.Journal of Biopharmaceutical Statistics,19(1),7789.CrossRefGoogle ScholarPubMed
Demirtas, H., &Hedeker, D.(2008)Imputing continuous data under some non-Gaussian distributions.Statistica Neerlandica,62(2),193205.CrossRefGoogle Scholar
Demirtas, H., &Hedeker, D.(2008)Multiple imputation under power polynomials.Communications in Statistics - Simulation and Computation,37(8),16821695.CrossRefGoogle Scholar
Fitzenberger, B.,Wilke, R. A., &Zhang, X.(2010)Implementing Box–Cox quantile regression.Econometric Reviews,29(2),158181.CrossRefGoogle Scholar
Geraci, M.(2016)Estimation of regression quantiles in complex surveys with data missing at random: An application to birthweight determinants.Statistical Methods in Medical Research,25(4),13931421.CrossRefGoogle ScholarPubMed
Geraci, M.(2016)Qtools: A collection of models and tools for quantile inference.The R Journal,8(2),117138.CrossRefGoogle Scholar
Geraci, M. (2017). Qtools: Utilities for Quantiles.. R package version 1.2. URL: https://CRAN.R-project.org/package=Qtools.Google Scholar
Geraci, M., &Jones, M. C.(2015)Improved transformation-based quantile regression.Canadian Journal of Statistics,43(1),118132.CrossRefGoogle Scholar
He, Y., &Raghunathan, T. E.(2006)Tukey’s gh distribution for multiple imputation.The American Statistician,60(3),251256.CrossRefGoogle Scholar
He, Y., &Raghunathan, T. E.(2012)Multiple imputation using multivariate gh transformations.Journal of Applied Statistics,39(10),21772198.CrossRefGoogle Scholar
Johnson, J.(2008)Millennium third survey follow-up: A guide to the school assessment datasets,1London:Centre for Longitudinal Studies, University of London.Google Scholar
Kiernan, K. E., &Mensah, F. K.(2009)Poverty, maternal depression, family status and children’s cognitive and behavioural development in early childhood: A longitudinal study.Journal of Social Policy,38(4),569588.CrossRefGoogle Scholar
Koenker, R.(2005)Quantile regression,New York, NY:Cambridge University Press.CrossRefGoogle Scholar
Koenker, R. (2016). Quantreg: Quantile regression.. R package version 5.29. URL: https://CRAN.R-project.org/package=quantreg.Google Scholar
Koenker, R., &Bassett, G.(1978)Regression quantiles.Econometrica,46(1),3350.CrossRefGoogle Scholar
Lee, K. J., &Carlin, J. B.(2017)Multiple imputation in the presence of non-normal data.Statistics in Medicine,36(4),606617.CrossRefGoogle ScholarPubMed
Little, R. JA.(1988)Missing-data adjustments in large surveys.Journal of Business & Economic Statistics,6(3),287296.CrossRefGoogle Scholar
Little, R. JA., &Rubin, D. B.(2002)Statistical analysis with missing data,2Hoboken:Wiley.CrossRefGoogle Scholar
Machin, S., &McNally, S.(2005)Gender and student achievement in English schools.Oxford Review of Economic Policy,21(3),357372.CrossRefGoogle Scholar
Mensah, F. K., &Kiernan, K. E.(2010)Gender differences in educational attainment: Influences of the family environment.British Educational Research Journal,36(2),239260.CrossRefGoogle Scholar
Morris, T. P.,White, I. R., &Royston, P.(2014)Tuning multiple imputation by predictive mean matching and local residual draws.BMC Medical Research Methodology,14(1),75CrossRefGoogle ScholarPubMed
Mu, Y. M., &He, X. M.(2007)Power transformation toward a linear regression quantile.Journal of the American Statistical Association,102(477),269279.CrossRefGoogle Scholar
Muñoz, J. F., &Rueda, M.(2009)New imputation methods for missing data using quantiles.Journal of Computational and Applied Mathematics,232(2),305317.CrossRefGoogle Scholar
Nielsen, S. F.(2003)Proper and improper multiple imputation.International Statistical Review,71(3),593607.CrossRefGoogle Scholar
Powell, J. L.(1991)Estimation of monotonic regression models under quantile restrictions,New York:Cambridge University Press.357384.Google Scholar
Core Team, R.(2016)R: A language and environment for statistical computing,Vienna:R Foundation for Statistical Computing.Google Scholar
Reisby, N.,Gram, L. F.,Bech, P.,Nagy, A.,Petersen, G. O.,Ortmann, J.,Ibsen, I.,Dencker, S. J.,Jacobsen, O.,Krautwald, O.,Sondergaard, I., &Christiansen, J.(1977)Imipramine: Clinical effects and pharmacokinetic variability.Psychopharmacology,54(3),263–72.CrossRefGoogle ScholarPubMed
Rodwell, L.,Lee, K. J.,Romaniuk, H., &Carlin, J. B.(2014)Comparison of methods for imputing limited-range variables: A simulation study.BMC Medical Research Methodology,14 57CrossRefGoogle ScholarPubMed
Royston, P., &White, I. R.(2011)Multiple imputation by chained equations (MICE): Implementation in Stata.Journal of Statistical Software,45(4),120.CrossRefGoogle Scholar
Rubin, D. B.(1987)Multiple imputation for nonresponse in surveys,New York:Sons.CrossRefGoogle Scholar
Rubin, D. B., &Schenker, N.(1986)Multiple imputation for interval estimation from simple random samples with ignorable nonresponse.Journal of the American Statistical Association,81(394),366374.CrossRefGoogle Scholar
Smith, K., &Joshi, H.(2002)The millennium cohort study.Population Trends,107 30–4.Google Scholar
Smithson, M., &Shou, Y.(2017)CDF-quantile distributions for modelling random variables on the unit interval.British Journal of Mathematical and Statistical Psychology,CrossRefGoogle ScholarPubMed
Van Buuren, S.,Brand, J. PL.,Groothuis-Oudshoorn, C. GM., &Rubin, D. B.(2006)Fully conditional specification in multivariate imputation.Journal of Statistical Computation and Simulation,76(12),10491064.CrossRefGoogle Scholar
van Buuren, S., &Groothuis-Oudshoorn, K.(2011)Mice: Multivariate imputation by chained equations in R.Journal of Statistical Software,45(3),167.Google Scholar
von Hippel, P. T.(2013)Should a normal imputation model be modified to impute skewed variables?.Sociological Methods and Research,42(1),105138.CrossRefGoogle Scholar
White, I. R.,Royston, P., &Wood, A. M.(2011)Multiple imputation using chained equations: Issues and guidance for practice.Statistics in Medicine,30(4),377399.CrossRefGoogle ScholarPubMed
Supplementary material: File

Geraci and McLain supplementary material

Online Resource 1
Download Geraci and McLain supplementary material(File)
File 193.5 KB
Supplementary material: File

Geraci and McLain supplementary material

Online Resource 2
Download Geraci and McLain supplementary material(File)
File 159.9 KB