Hostname: page-component-cd9895bd7-fscjk Total loading time: 0 Render date: 2024-12-26T17:21:12.949Z Has data issue: false hasContentIssue false

Generalized Full Matching

Published online by Cambridge University Press:  23 November 2020

Fredrik Sävje*
Affiliation:
Department of Political Science, and Department of Statistics & Data Science, Yale University, New Haven, CT, USA. Email: [email protected]
Michael J. Higgins
Affiliation:
Department of Statistics, Kansas State University, Manhattan, KS, USA. Email: [email protected]
Jasjeet S. Sekhon
Affiliation:
Travers Department of Political Science, and Department of Statistics, UC Berkeley, Berkeley, CA, USA. Email: [email protected]
*
Corresponding author Fredrik Sävje

Abstract

Matching is a conceptually straightforward method to make groups of units comparable on observed characteristics. The method is, however, limited to settings where the study design is simple and the sample is moderately sized. We illustrate these limitations by asking what the causal effects would have been if a large-scale voter mobilization experiment that took place in Michigan for the 2006 election were scaled up to the full population of registered voters. Matching could help us answer this question, but no existing matching method can accommodate the six treatment arms and the 6,762,701 observations involved in the study. To offer a solution for this and similar empirical problems, we introduce a generalization of the full matching method that can be used with any number of treatment conditions and complex compositional constraints. The associated algorithm produces near-optimal matchings; the worst-case maximum within-group dissimilarity is guaranteed to be no more than four times greater than the optimal solution, and simulation results indicate that it comes considerably closer to the optimal solution on average. The algorithm’s ability to balance the treatment groups does not sacrifice speed, and it uses little memory, terminating in linearithmic time using linear space. This enables investigators to construct well-performing matchings within minutes even in complex studies with samples of several million units.

Type
Article
Copyright
© The Author(s) 2020. Published by Cambridge University Press on behalf of the Society for Political Methodology

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Footnotes

Edited by Daniel Hopkins

References

Abadie, A., and Imbens, G. W.. 2006. “Large Sample Properties of Matching Estimators for Average Treatment Effects.” Econometrica 74(1):235267.CrossRefGoogle Scholar
Arya, S., Mount, D. M., Netanyahu, N. S., Silverman, R., and Wu, A. Y.. 1998. “An Optimal Algorithm for Approximate Nearest Neighbor Searching Fixed Dimensions.” Journal of the ACM 45(6):891923.CrossRefGoogle Scholar
Bennett, M., Vielma, J. P., and Zubizarreta, J. R.. 2020. “Building Representative Matched Samples with Multi-valued Treatments in Large Observational Studies.” Journal of Computational and Graphical Statistics. doi:10.1080/10618600.2020.1753532.CrossRefGoogle Scholar
Buchanan, A. L., et al. 2018. “Generalizing Evidence from Randomized Trials Using Inverse Probability of Sampling Weights.” Journal of the Royal Statistical Society: Series A (Statistics in Society) 181(4): 11931209.CrossRefGoogle ScholarPubMed
Cochran, W. G., and Rubin, D. B.. 1973. “Controlling Bias in Observational Studies: A Review.” Sankhyā: The Indian Journal of Statistics, Series A 35(4):417446.Google Scholar
Dehejia, R., Pop-Eleches, C., and Samii, C.. 2019. “From Local to Global: External Validity in a Fertility Natural Experiment.” Journal of Business and Economic Statistics. doi:10.1080/07350015.2019.1639407.Google Scholar
Diamond, A., and Sekhon, J. S.. 2013. “Genetic Matching for Estimating Causal Effects: A General Multivariate Matching Method for Achieving Balance in Observational Studies.” Review of Economics and Statistics 95(3):932945.CrossRefGoogle Scholar
Downs, A. 1957. An Economic Theory of Democracy. New York: Harper & Row.Google Scholar
Friedman, J. H., Bentley, J. L., and Finkel, R. A.. 1977. “An Algorithm for Finding Best Matches in Logarithmic Expected Time.” ACM Transactions on Mathematical Software 3(3):209226.CrossRefGoogle Scholar
Gerber, A. S., Green, D. P., and Larimer, C. W.. 2008. “Social Pressure and Voter Turnout: Evidence from a Large-Scale Field Experiment.” American Political Science Review 102(1):3348.CrossRefGoogle Scholar
Graham, B. S., De Xavier Pinto, C. C., and Egel, D.. 2012. “Inverse Probability Tilting for Moment Condition Models with Missing Data.” The Review of Economic Studies 79(3):10531079.CrossRefGoogle Scholar
Hainmueller, J. 2012. “Entropy Balancing for Causal Effects: A Multivariate Reweighting Method to Produce Balanced Samples in Observational Studies.” Political Analysis 20(1):2546.CrossRefGoogle Scholar
Hansen, B. B. 2004. “Full Matching in an Observational Study of Coaching for the SAT.” Journal of the American Statistical Association 99(467):609618.CrossRefGoogle Scholar
Hansen, B. B., and Klopfer, S. O.. 2006. “Optimal Full Matching and Related Designs Via Network Flows.” Journal of Computational and Graphical Statistics 15(3):609627.CrossRefGoogle Scholar
Hartman, E., Grieve, R., Ramsahai, R., and Sekhon, J. S.. 2015. “From Sample Average Treatment Effect to Population Average Treatment Effect on the Treated: Combining Experimental with Observational Studies to Estimate Population Treatment Effects.” Journal of the Royal Statistical Society: Series A (Statistics in Society) 178(3):757778.CrossRefGoogle Scholar
Higgins, M. J., Sävje, F., and Sekhon, J. S.. 2016. “Improving Massive Experiments with Threshold Blocking.” Proceedings of the National Academy of Sciences 113(27):73697376.CrossRefGoogle ScholarPubMed
Ho, D. E., Imai, K., King, G., and Stuart, E. A.. 2007. “Matching as Nonparametric Preprocessing for Reducing Model Dependence in Parametric Causal Inference.” Political Analysis 15(03):199236.CrossRefGoogle Scholar
Iacus, S. M., King, G., and Porro, G.. 2011. “Multivariate Matching Methods That Are Monotonic Imbalance Bounding.” Journal of the American Statistical Association 106(493):345361.CrossRefGoogle Scholar
Iacus, S. M., King, G., and Porro, G.. 2012. “Causal Inference Without Balance Checking: Coarsened Exact Matching.” Political Analysis 20(1):124.CrossRefGoogle Scholar
Imai, K., and Ratkovic, M.. 2014. “Covariate Balancing Propensity Score.” Journal of the Royal Statistical Society: Series B (Statistical Methodology) 76(1):243263.CrossRefGoogle Scholar
Imbens, G. W., and Rubin, D. B.. 2015. Causal Inference for Statistics, Social, and Biomedical Sciences: An Introduction. New York: Cambridge University Press.CrossRefGoogle Scholar
Kern, H. L., Stuart, E. A., Hill, J., and Green, D. P.. 2016. “Assessing Methods for Generalizing Experimental Impact Estimates to Target Populations.” Journal of Research on Educational Effectiveness 9(1):103127.CrossRefGoogle ScholarPubMed
Li, S., Vlassis, N., Kawale, J., and Fu, Y.. 2016. “Matching via dimensionality reduction for estimation of treatment effects in digital marketing campaigns.” In Proceedings of the 25th International Joint Conference on Artificial Intelligence, 3768–3774.Google Scholar
Pimentel, S. D., Kelz, R. R., Silber, J. H., and Rosenbaum, P. R.. 2015. “Large, Sparse Optimal Matching with Refined Covariate Balance in an Observational Study of the Health Outcomes Produced by New Surgeons.” Journal of the American Statistical Association 110(510):515527.CrossRefGoogle Scholar
Rosenbaum, P. R. 1991. “A Characterization of Optimal Designs for Observational Studies.” Journal of the Royal Statistical Society. Series B (Methodological) 53(3):597610.CrossRefGoogle Scholar
Rosenbaum, P. R. 2002. Observational Studies. 2nd edn. New York: Springer.CrossRefGoogle Scholar
Rosenbaum, P. R. 2010. Design of Observational Studies. New York: Springer.CrossRefGoogle ScholarPubMed
Rosenbaum, P. R. 2017. “Imposing Minimax and Quantile Constraints on Optimal Matching in Observational Studies.” Journal of Computational and Graphical Statistics 26(1):6678.CrossRefGoogle Scholar
Rosenbaum, P. R., Ross, R. N., and Silber, J. H.. 2007. “Minimum Distance Matched Sampling with Fine Balance in an Observational Study of Treatment for Ovarian Cancer.” Journal of the American Statistical Association 102(477):7583.CrossRefGoogle Scholar
Rosenbaum, P. R., and Rubin, D. B.. 1983. “The Central Role of the Propensity Score in Observational Studies for Causal Effects.” Biometrika 70(1):4155.CrossRefGoogle Scholar
Sävje, F., Higgins, M., and Sekhon, J.. 2020. “Replication Data for: Generalized Full Matching.” https://doi.org/10.7910/DVN/1YIX0D, Harvard Dataverse, V1.CrossRefGoogle Scholar
Sekhon, J. S. 2011. “Multivariate and Propensity Score Matching Software with Automated Balance Optimization: The Matching Package for R.” Journal of Statistical Software 42(7):152.CrossRefGoogle Scholar
Silber, J. H., et al. 2014. “Template Matching for Auditing Hospital Cost and Quality.” Health Services Research 49(5):14461474.CrossRefGoogle ScholarPubMed
Sipser, M. 2012. Introduction to the Theory of Computation. 3rd edn. Boston, MA: Cengage.Google Scholar
Stuart, E. A. 2010. “Matching Methods for Causal Inference: A Review and a Look Forward.” Statistical Science 25(1):121.CrossRefGoogle Scholar
Stuart, E. A., Cole, S. R., Bradshaw, C. P., and Leaf, P. J.. 2011. “The Use of Propensity Scores to Assess the Generalizability of Results from Randomized Trials.” Journal of the Royal Statistical Society: Series A (Statistics in Society) 174(2):369386.CrossRefGoogle Scholar
Tipton, E. 2013. “Improving Generalizations from Experiments Using Propensity Score Subclassification: Assumptions, Properties, and Contexts.” Journal of Educational and Behavioral Statistics 38(3):239266.CrossRefGoogle Scholar
Yu, R., Silber, J. H., and Rosenbaum, P. R.. 2019. “Matching Methods for Observational Studies Derived from Large Administrative Databases.” Statistical Science 35(3):338355.Google Scholar
Zubizarreta, J. R. 2012. “Using Mixed Integer Programming for Matching in an Observational Study of Kidney Failure After Surgery.” Journal of the American Statistical Association 107(500):13601371.CrossRefGoogle Scholar
Supplementary material: PDF

Sävje et al. supplementary material

Sävje et al. supplementary material

Download Sävje et al. supplementary material(PDF)
PDF 271.6 KB