Practical and Effective Approaches to Dealing With Clustered Data

Justin Esarey; Andrew Menger

doi:10.1017/psrm.2017.42

Practical and Effective Approaches to Dealing With Clustered Data

Published online by Cambridge University Press: 19 January 2018

Justin Esarey and

Andrew Menger

Article contents

Abstract
Footnotes
References

Get access

Rights & Permissions

Abstract

Cluster-robust standard errors (as implemented by the eponymous cluster option in Stata) can produce misleading inferences when the number of clusters G is small, even if the model is consistent and there are many observations in each cluster. Nevertheless, political scientists commonly employ this method in data sets with few clusters. The contributions of this paper are: (a) developing new and easy-to-use Stata and R packages that implement alternative uncertainty measures robust to small G, and (b) explaining and providing evidence for the advantages of these alternatives, especially cluster-adjusted t-statistics based on Ibragimov and Müller. To illustrate these advantages, we reanalyze recent work where results are based on cluster-robust standard errors.

Type: Original Articles
Information: Political Science Research and Methods , Volume 7 , Issue 3 , July 2019 , pp. 541 - 559

DOI: https://doi.org/10.1017/psrm.2017.42 [Opens in a new window]
Copyright: © The European Political Science Association 2018

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Article purchase

Temporarily unavailable

Footnotes

Justin Esarey is an Assistant Professor of Political Science, Rice University, 6100 Main St, MS-24, Houston, TX 77005 ([email protected]). Andrew Menger, Ph.D. Candidate, Department of Political Science, Rice University, 6100 Main St, MS-24, Houston, TX 77005 ([email protected]). The authors thank Ulrich Müller, Carlisle Rainey, Jonathan Kropko, Matthew Webb, Neal Beck, Jens Hainmueller, Shuai Jin, Jens Grosser, Ernesto Reuben, our anonymous reviewers, and participants at the 2015 Annual Meeting of the Midwest Political Science Association, the 2015 Annual Meeting of the Society for Political Methodology, and the 2016 Annual Meeting of the Southern Political Science Association for helpful comments and suggestions on earlier drafts of this paper. To view supplementary material for this article, please visit https://doi.org/10.1017/psrm.2017.42

References

Anderson, Theodore W. 2003. An Introduction to Multivariate Statistical Analysis 3rd ed. New York, NY: Wiley.Google Scholar

Angrist, Joshua D., and Pischke, Jorn-Steffen. 2009. Mostly Harmless Econometrics: An Empiricist's Companion. Princeton, NJ: Princeton University Press.Google Scholar

Arellano, Manuel. 1987. ‘Computing Robust Standard Errors for Within-Groups Estimators’. Oxford Bulletin of Economics and Statistics 49(4):431–434.Google Scholar

Bafumi, Joseph, and Gelman, Andrew. 2006. ‘Fitting Multilevel Models When Predictors and Group Effects Correlate’. Available at http://goo.gl/usvQsn, accessed 21 December 2017.Google Scholar

Bakirov, Nail K., and Szekely, Gabor J.. 2006. ‘Student’s t-Test for Gaussian Scale Mixtures’. Journal of Mathematical Sciences 139(3):6497–6505.Google Scholar

Bates, Douglas, Maechler, Martin, Bolker, Ben, and Walker, Steven. 2014. ‘lme4: Linear Mixed Effects Models Using Eigen and S4’. R package version 1.1-7. Available at http://CRAN.R-project.org/package=lme4, accessed 21 December 2017.Google Scholar

Beck, Nathaniel, and Katz, Jonathan N.. 1995. ‘What To Do (And Not To Do) With Time-Series Cross-Section Data’. American Political Science Review 89(3):634–647.Google Scholar

Beck, Nathaniel L., Katz, Jonathan N., and Mignozzetti, Umberto G.. 2014. ‘Of Nickell Bias and its Cures: Comment on Gaibulloev, Sandler, and Sul’. Political Analysis 22(2):274–278.Google Scholar

Bertrand, Marianne, Duflo, Esther, and Mullainathan, Sendhil. 2004. ‘How Much Should We Trust Differences-In-Differences Estimates?’. The Quarterly Journal of Economics 119(1):249–275.Google Scholar

Brambor, Thomas, Clark, William Roberts, and Golder, Matthew. 2006. ‘Understanding Interaction Models: Improving Empirical Analyses’. Political Analysis 14(1):63–82.Google Scholar

Cameron, A. Colin, and Miller, Douglas L.. 2015. ‘A Practitioner’s Guide to Cluster-Robust Inference’. Journal of Human Resources 50(2):317–372.Google Scholar

Cameron, A. Colin, Gelbach, Jonah B., and Miller, Douglas L.. 2008. ‘Bootstrap-Based Improvements for Inference With Clustered Errors’. Review of Economics and Statistics 90(3):414–427.Google Scholar

Cameron, A. Colin, and Trivedi, Pravin K.. 2005. Microeconometrics: Methods and Applications. Cambridge, UK: Cambridge University Press.Google Scholar

Canay, Ivan A., Romano, Joseph P., and Shaikh, Azeem M.. 2014. ‘Randomization Tests Under an Approximate Symmetry Assumption’. Working Paper (version: December 19, 2014). Available at https://goo.gl/TUEQee, accessed 29 January 2017.Google Scholar

Clark, Tom S., and Linzer, Drew A.. 2015. ‘Should I Use Fixed or Random Effects?’. Political Science Research and Methods 3(2):399–408.Google Scholar

Croissant, Yves. 2015. ‘Package “mlogit”.’ CRAN. Available at http://cran.r-project.org/web/packages/mlogit/mlogit.pdf, accessed 21 December 2017.Google Scholar

Croissant, Yves, and Millo, Giovanni. 2008. ‘Panel Data Econometrics in R: The plm Package’. Journal of Statistical Software 27(2):1–43.Google Scholar

Donald, Stephen G., and Lang, Kevin. 2007. ‘Inference With Difference-in-Differences and Other Panel Data’. The Review of Economics and Statistics 89(2):221–233.Google Scholar

Donner, Allan. 1998. ‘Some Aspects of the Design and Analysis of Cluster Randomization Trials’. Journal of the Royal Statistical Society: Series C (Applied Statistics) 47(1):95–113.Google Scholar

Efron, Bradley. 1979. ‘Bootstrap Methods: Another Look at the Jackknife’. Annals of Statistics 7(1):1–26.Google Scholar

Field, Chris A., and Welsh, Alan H.. 2007. ‘Bootstrapping Clustered Data’. Journal of the Royal Statistical Society: Series B 69(3):369–390.Google Scholar

Gaibulloev, Khusrav, Sandler, Todd, and Sul, Donggyu. 2014. ‘Dynamic Panel Analysis Under Cross-Sectional Dependence’. Political Analysis 22:258–273.Google Scholar

Green, Donald P., and Vavreck, Lynn. 2008. ‘Analysis of Cluster-Randomized Experiments: A Comparison of Alternative Estimation Approaches’. Political Analysis 16(2):138–152.Google Scholar

Grosser, Jens, Reuben, Ernesto, and Tymula, Agnieszka. 2013. ‘Political Quid Pro Quo Agreements: An Experimental Study’. American Journal of Political Science 57:582–597.Google Scholar

Hainmueller, Jens, Hiscox, Michael, and Sequeira, Sandra. 2015. ‘Consumer Demand for the Fair Trade Label: Evidence from a Multistore Field Experiment’. Review of Economics and Statistics 97(2):242–256.Google Scholar

Hansen, Christian B. 2007. ‘Asymptotic Properties of a Robust Variance Matrix Estimator for Panel Data When T is Large’. Journal of Econometrics 141(2):597–620.Google Scholar

Harden, Jeffrey J. 2011. ‘A Bootstrap Method for Conducting Statistical Inference With Clustered Data’. State Politics & Policy Quarterly 11(2):223–246.Google Scholar

Hardin, James W., and Hilbe, Joseph M.. 2003. Generalized Estimating Equations. Boca Raton, FL: Chapman & Hall/CRC.Google Scholar

Horowitz, Joel L. 1997. ‘Bootstrap Methods in Econometrics: Theory and Numerical Performance’. In David M. Kreps and Kenneth F. Wallis (eds), Advances in Economics and Econometrics: Theory and Applications: Seventh World Congress, 189–222. Cambridge, UK: Cambridge University Press.Google Scholar

Hu, Feifang, and Kalbeisch, John D.. 2000. ‘The Estimating Function Bootstrap’. Canadian Journal of Statistics 28(3):449–481.Google Scholar

Ibragimov, Rustam, and Müller, Ulrich K.. 2010. ‘t-Statistic Based Correlation and Heterogeneity Robust Inference’. Journal of Business & Economic Statistics 28(4):453–468.Google Scholar

Imbens, Guido W., and Kolesar, Michal. 2012. ‘Robust Standard Errors in Small Samples: Some Practical Advice’ 98(4):701–12.Google Scholar

Judge, George G., Hill, R. Carter, Griffths, William E., Lutkepohl, Helmut, and Lee, Tsoung-Chao. 1988. Introduction to the Theory and Practice of Econometrics. New York, NY: Wiley.Google Scholar

Kezdi, Gabor. 2004. ‘Robust Standard Error Estimation in Fixed-Effects Panel Models’. Hungarian Statistical Review 9:95–116.Google Scholar

King, Gary, and Roberts, Margaret E.. 2014. ‘How Robust Standard Errors Expose Methodological Problems They Do Not Fix, and What to Do About it’. Political Analysis 23:159–179.Google Scholar

Klar, Neil, and Donner, Allan. 2001. ‘Current and Future Challenges in the Design and Analysis of Cluster Randomization Trials’. Statistics in Medicine 20(24):3729–3740.Google Scholar

Lacina, Bethany. 2014. ‘How Governments Shape the Risk of Civil Violence: India’s Federal Reorganization, 1950–56’. American Journal of Political Science 58(3):720–738.Google Scholar

Liang, Kung-Yee, and Zeger, Scott L.. 1986. ‘Longitudinal Data Analysis Using Generalized Linear Models’. Biometrika 73(1):13–22.Google Scholar

Liang, Kung-Yee, and Zeger, Scott L.. 1993. ‘Regression Analysis for Correlated Data’. Annual Review of Public Health 14(1):43–68.Google Scholar

Liu, Regina Y. 1988. ‘Bootstrap Procedures Under Some Non-I.I.D. Models’. The Annals of Statistics 16(4):1696–1708.Google Scholar

Liu, Regina Y., and Singh, Kesar. 1987. ‘On a Partial Correction by the Bootstrap’. The Annals of Statistics 15(4):1713–1718.Google Scholar

MacKinnon, James G. 2015. ‘Wild Cluster Bootstrap Confidence Intervals’. L’Actualité économique 91(1-2):11–33.Google Scholar

MacKinnon, James G., and Webb, Matthew D.. 2017. ‘Wild Bootstrap Inference for Wildly Different Cluster Sizes’. Journal of Applied Econometrics 32(2):233–254.Google Scholar

Mancl, Lloyd A., and DeRouen, Timothy A.. 2001. ‘A Covariance Estimator for GEE With Improved Small-Sample Properties’. Biometrics 57(1):126–134.Google Scholar

Moulton, Brent R. 1986. ‘Random Group Effects and the Precision of Regression Estimates’. Journal of Econometrics 32(3):385–397.Google Scholar

Moulton, Brent R. 1990. ‘An Illustration of a Pitfall in Estimating the Effects of Aggregate Variables on Micro Units’. The Review of Economics and Statistics 72(2):334–338.Google Scholar

Nickell, Stephen. 1981. ‘Biases in Dynamic Models With Fixed Effects’. Econometrica 49:1417–1426.Google Scholar

Rogers, William. 1993. ‘Regression Standard Errors in Clustered Samples’. Stata Technical Bulletin 13:19–23.Google Scholar

van der Vaart, Aad W. 1998. Asymptotic Statistics. Cambridge, UK: Cambridge University Press.Google Scholar

White, Halbert. 1980. ‘A Heteroskedasticity-Consistent Covariance Matrix Estimator and a Direct Test for Heteroskedasticity’. Econometrica 48(4):817–838.Google Scholar

Williams, Rick L. 2000. ‘A Note on Robust Variance Estimation for Cluster-Correlated Data’. Biometrics 56(2):645–646.Google Scholar

Wooldridge, Jeffrey M. 2002. Econometric Analysis of Cross Section and Panel Data. Cambridge, MA: MIT Press.Google Scholar

Wu, C. F. Jeff. 1986. ‘Jackknife, Bootstrap and Other Resampling Methods in Regression Analysis’. The Annals of Statistics 14(4):1261–1295.Google Scholar

Esarey and Menger Dataset

Dataset

http://dx.doi.org/10.7910/DVN/OGXF4X

Link

Esarey and Menger supplementary material

Esarey and Menger supplementary material 1

PDF 594.5 KB

Article contents

Practical and Effective Approaches to Dealing With Clustered Data

Abstract

Access options

Article purchase

Temporarily unavailable

Footnotes

References

Esarey and Menger Dataset

Esarey and Menger supplementary material

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests