Hostname: page-component-745bb68f8f-b6zl4 Total loading time: 0 Render date: 2025-01-08T11:54:06.620Z Has data issue: false hasContentIssue false

A Nondegenerate Penalized Likelihood Estimator for Variance Parameters in Multilevel Models

Published online by Cambridge University Press:  01 January 2025

Yeojin Chung*
Affiliation:
School of Business Administration, Kookmin University
Sophia Rabe-Hesketh
Affiliation:
Graduate School of Education, University of California, Berkeley Institute of Education, University of London
Vincent Dorie
Affiliation:
Department of Statistics, Columbia University
Andrew Gelman
Affiliation:
Department of Statistics, Columbia University
Jingchen Liu
Affiliation:
Department of Statistics, Columbia University
*
Requests for reprints should be sent to Yeojin Chung, School of Business Administration, Kookmin University, Seoul, South Korea. E-mail: [email protected]

Abstract

Group-level variance estimates of zero often arise when fitting multilevel or hierarchical linear models, especially when the number of groups is small. For situations where zero variances are implausible a priori, we propose a maximum penalized likelihood approach to avoid such boundary estimates. This approach is equivalent to estimating variance parameters by their posterior mode, given a weakly informative prior distribution. By choosing the penalty from the log-gamma family with shape parameter greater than 1, we ensure that the estimated variance will be positive. We suggest a default log-gamma(2,λ) penalty with λ→0, which ensures that the maximum penalized likelihood estimate is approximately one standard error from zero when the maximum likelihood estimate is zero, thus remaining consistent with the data while being nondegenerate. We also show that the maximum penalized likelihood estimator with this default penalty is a good approximation to the posterior median obtained under a noninformative prior.

Our default method provides better estimates of model parameters and standard errors than the maximum likelihood or the restricted maximum likelihood estimators. The log-gamma family can also be used to convey substantive prior information. In either case—pure penalization or prior information—our recommended procedure gives nondegenerate estimates and in the limit coincides with maximum likelihood as the number of groups increases.

Type
Original Paper
Copyright
Copyright © 2013 The Psychometric Society

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Alderman, D., & Powers, D. (1980). The effects of special preparation on SAT-verbal scores. American Educational Research Journal, 17(2), 239251.CrossRefGoogle Scholar
Bates, D., & Maechler, M. (2010). lme4: Linear mixed-effects models using S4 classes. R. package version 0.999375-37.Google Scholar
Bell, W. (1999). Accounting for uncertainty about variances in small area estimation. In Bulletin of the International Statistical Institute, 52nd session, Helsinki.Google Scholar
Borenstein, M., Hedges, L., Higgins, J., & Rothstein, H. (2009). Introduction to meta-analysis. Chichester: Wiley.CrossRefGoogle Scholar
Box, G., & Cox, D. (1964). An analysis of transformations. Journal of the Royal Statistical Society. Series B, 26(2), 211252.CrossRefGoogle Scholar
Browne, W., & Draper, D. (2006). A comparison of Bayesian and likelihood methods for fitting multilevel models. Bayesian Analysis, 1(3), 473514.CrossRefGoogle Scholar
Ciuperca, G., Ridolfi, A., & Idier, J. (2003). Penalized maximum likelihood estimator for normal mixtures. Skandinavian Journal of Statistics, 30(1), 4559.CrossRefGoogle Scholar
Crainiceanu, C., & Ruppert, D. (2004). Likelihood ratio tests in linear mixed models with one variance component. Journal of the Royal Statistical Society. Series B, 66(1), 165185.CrossRefGoogle Scholar
Crainiceanu, C., Ruppert, D., & Vogelsang, T. (2003). Some properties of likelihood ratio tests in linear mixed models (Technical report). Available at http://www.orie.cornell.edu/~davidr/papers.Google Scholar
Curcio, D., & Verde, P. (2011). Comment on: Efficacy and safety of tigecycline: a systematic review and meta-analysis. Journal of Antimicrobial Chemotherapy, 66(12), 28932895.CrossRefGoogle ScholarPubMed
DerSimonian, R., & Laird, N. (1986). Meta-analysis in clinical trials. Controlled Clinical Trials, 7(3), 177188.CrossRefGoogle ScholarPubMed
Dorie, V. (2013). Mixed methods for mixed models: Bayesian point estimation and classical uncertainty measures in multilevel models. PhD thesis, Columbia University.Google Scholar
Dorie, V., Liu, J., & Gelman, A. (2013). Bridging between point estimation and Bayesian inference for generalized linear models (Technical report). Department of Statistics, Columbia University.Google Scholar
Draper, D. (1995). Assessment and propagation of model uncertainty. Journal of the Royal Statistical Society. Series B, 57(1), 4597.CrossRefGoogle Scholar
Drum, M., & McCullagh, P. (1993). [Regression models for discrete longitudinal responses]: comment. Statistical Science, 8(3), 300301.CrossRefGoogle Scholar
Fay, R.E., & Herriot, R.A. (1979). Estimates of income for small places: an application of James–Stein procedures to census data. Journal of the American Statistical Association, 74(366), 269277.CrossRefGoogle Scholar
Fu, J., & Gleser, L. (1975). Classical asymptotic properties of a certain estimator related to the maximum likelihood estimator. Annals of the Institute of Statistical Mathematics, 27(1), 213233.CrossRefGoogle Scholar
Galindo-Garre, F., & Vermunt, J. (2006). Avoiding boundary estimates in latent class analysis by Bayesian posterior mode estimation. Behaviormetrika, 33(1), 4359.CrossRefGoogle Scholar
Galindo-Garre, F., Vermunt, J., & Bergsma, W. (2004). Bayesian posterior mode estimation of logit parameters with small samples. Sociological Methods & Research, 33(1), 88117.CrossRefGoogle Scholar
Gelman, A. (2006). Prior distributions for variance parameters in hierarchical models. Bayesian Analysis, 1(3), 515533.CrossRefGoogle Scholar
Gelman, A., Carlin, J., Stern, H., & Rubin, D. (2004). Bayesian data analysis (2nd ed.). London: Chapman & Hall/CRC.Google Scholar
Gelman, A., Jakulin, A., Pittau, M.G., & Su, Y.S. (2008). A weakly informative default prior distribution for logistic and other regression models. The Annals of Applied Statistics, 2(4), 13601383.CrossRefGoogle Scholar
Gelman, A., & Meng, X. (1996). Model checking and model improvement. In Markov chain Monte Carlo in practice (pp. 189201). London: Chapman & Hall.Google Scholar
Gelman, A., Shor, B., Bafumi, J., & Park, D. (2007). Rich state, poor state, red state, blue state: what’s the matter with Connecticut?. Quarterly Journal of Political Science, 2(4), 345367.CrossRefGoogle Scholar
Greenland, S. (2000). When should epidemiologic regressions use random coefficients?. Biometrics, 56(3), 915921.CrossRefGoogle ScholarPubMed
Hardy, R., & Thompson, S. (1998). Detecting and describing heterogeneity in meta-analysis. Statistics in Medicine, 17(8), 841856.3.0.CO;2-D>CrossRefGoogle ScholarPubMed
Harville, D.A. (1974). Bayesian inference for variance components using only error contrasts. Biometrika, 61(2), 383385.CrossRefGoogle Scholar
Harville, D.A. (1977). Maximum likelihood approaches to variance components estimation and related problems. Journal of the American Statistical Association, 72(358), 320338.CrossRefGoogle Scholar
Higgins, J.P.T., Thompson, S.G., & Spiegelhalter, D.J. (2009). A re-evaluation of random-effects meta-analysis. Journal of the Royal Statistical Society. Series A, 172(1), 137159.CrossRefGoogle ScholarPubMed
Huber, P.J. (1967). The behavior of maximum likelihood estimation under nonstandard condition. In LeCam, L.M., & Neyman, J. (Eds.), Proceedings of the fifth Berkeley symposium on mathematical statistics and probability (Vol. 1, pp. 221233). Berkeley: University of California Press.Google Scholar
Kenward, M., & Roger, J.H. (1997). Small-sample inference for fixed effects from restricted maximum likelihood. Biometrics, 53(3), 983997.CrossRefGoogle ScholarPubMed
Laird, N.M., & Ware, J.H. (1982). Random effects models for longitudinal data. Biometrics, 38(4), 963974.CrossRefGoogle ScholarPubMed
Li, H., & Lahiri, P. (2010). An adjusted maximum likelihood method for solving small area estimation problems. Journal of Multivariate Analysis, 101(4), 882892.CrossRefGoogle Scholar
Longford, N.T. (2000). On estimating standard errors in multilevel analysis. Journal of the Royal Statistical Society. Series D, 49(3), 389398.Google Scholar
Maris, E. (1999). Estimating multiple classification latent class models. Psychometrika, 64(2), 187212.CrossRefGoogle Scholar
Miller, J. (1977). Asymptotic properties of maximum likelihood estimates in the mixed model of the analysis of variance. The Annals of Statistics, 5(4), 746762.CrossRefGoogle Scholar
Mislevy, R.J. (1986). Bayes modal estimation in item response models. Psychometrika, 51(2), 177195.CrossRefGoogle Scholar
Morris, C. (2006). Mixed model prediction and small area estimation (with discussions). Test, 15(1), 7276.Google Scholar
Morris, C., & Tang, R. (2011). Estimating random effects via adjustment for density maximization. Statistical Science, 26(2), 271287.CrossRefGoogle Scholar
Neyman, J., & Scott, E.L. (1948). Consistent estimates based on partially consistent observations. Econometrica, 16(1), 132.CrossRefGoogle Scholar
O’Hagan, A. (1976). On posterior joint and marginal modes. Biometrika, 63(2), 329333.CrossRefGoogle Scholar
Overton, R. (1998). A comparison of fixed-effects and mixed (random-effects) models for meta-analysis tests of moderator variable effects. Psychological Methods, 3(3), 354.CrossRefGoogle Scholar
Patterson, H.D., & Thompson, R. (1971). Recovery of inter-block information when block sizes are unequal. Biometrika, 58(3), 545554.CrossRefGoogle Scholar
Rabe-Hesketh, S., & Skrondal, A. (2012). Multilevel and longitudinal modeling using Stata (3rd ed.). College Station: Stata Press.Google Scholar
Rabe-Hesketh, S., Skrondal, A., & Pickles, A. (2005). Maximum likelihood estimation of limited and discrete dependent variable models with nested random effects. Journal of Econometrics, 128(2), 301323.CrossRefGoogle Scholar
Raudenbush, S., & Bryk, A. (1985). Empirical Bayes meta-analysis. Journal of Educational Statistics, 10(2), 7598.CrossRefGoogle Scholar
Rubin, D.B. (1981). Estimation in parallel randomized experiments. Journal of Educational Statistics, 6(4), 377401.CrossRefGoogle Scholar
Self, S.G., & Liang, K.Y. (1987). Asymptotic properties of maximum likelihood estimators and likelihood ratio tests under nonstandard conditions. Journal of the American Statistical Association, 82(398), 605610.CrossRefGoogle Scholar
Snijders, T., & Bosker, R. (1993). Standard errors and sample sizes for two-level research. Journal of Educational and Behavioral Statistics, 18(3), 237259.CrossRefGoogle Scholar
Stram, D.O., & Lee, J.W. (1994). Variance components testing in the logitudinal mixed effects model. Biometrics, 50(4), 11711177.CrossRefGoogle Scholar
Swallow, W., & Monahan, J. (1984). Monte Carlo comparison of ANOVA, MIVQUE, REML, and ML estimators of variance components. Technometrics, 26(1), 4757.CrossRefGoogle Scholar
Swaminathan, H., & Gifford, J.A. (1985). Bayesian estimation in the two-parameter logistic model. Psychometrika, 50(3), 349364.CrossRefGoogle Scholar
Tsutakawa, R.K., & Lin, H.Y. (1986). Bayesian estimation of item response curves. Psychometrika, 51(2), 251267.CrossRefGoogle Scholar
Verbeke, G., & Molenberghs, G. (2000). Linear mixed models for longitudinal data. Berlin: Springer.Google Scholar
Vermunt, J., & Magidson, J. (2005). Technical guide for Latent Gold 4.0: basic and advanced (Technical report). Statistical Innovations Inc., Belmont, Massachusetts.Google Scholar
Viechtbauer, W. (2005). Bias and efficiency of meta-analytic variance estimators in the random-effects model. Journal of Educational and Behavioral Statistics, 30(3), 261293.CrossRefGoogle Scholar
Warton, D.I. (2008). Penalized normal likelihood and ridge regularization of correlation and covariance matrices. Journal of the American Statistical Association, 103(481), 340349.CrossRefGoogle Scholar
Weiss, R.E. (2005). Modeling longitudinal data. New York: Springer.Google Scholar
Whaley, S., Sigman, M., Neumann, C.G., Bwibo, N.O., Guthrie, D., Weiss, R.E., Alber, S., & Murphy, S.P. (2003). Animal source foods improve dietary quality, micronutrient status, growth and cognitive function in Kenyan school children: background, study design and baseline findings. The Journal of Nutrition, 133(11), 39653971.CrossRefGoogle Scholar
White, H. (1990). A heteroskedasticity-consistent covariance matrix estimator and a direct test for heteroskedasticity. Econometrica, 48(4), 817838.CrossRefGoogle Scholar