Voter Registration Databases and MRP: Toward the Use of Large-Scale Databases in Public Opinion Research

Yair Ghitza; Andrew Gelman

doi:10.1017/pan.2020.3

Voter Registration Databases and MRP: Toward the Use of Large-Scale Databases in Public Opinion Research

Published online by Cambridge University Press: 20 March 2020

Yair Ghitza and

Andrew Gelman

Show author details

Yair Ghitza*: Affiliation:
Catalist, 1310 L St. NW, Suite 500, Washington, DC20005, USA. Email: [email protected]
Andrew Gelman: Affiliation:
Columbia University, Department of Political Science, 1255 Amsterdam Avenue, Room 1003, New York, NY10027, USA
*: *Email: [email protected]

Article contents

Abstract
Footnotes
References

Get access

Rights & Permissions

Abstract

Declining telephone response rates have forced several transformations in survey methodology, including cell phone supplements, nonprobability sampling, and increased reliance on model-based inferences. At the same time, advances in statistical methods and vast amounts of new data sources suggest that new methods can combat some of these problems. We focus on one type of data source—voter registration databases—and show how they can improve inferences from political surveys. These databases allow survey methodologists to leverage political variables, such as party registration and past voting behavior, at a large scale and free of overreporting bias or endogeneity between survey responses. We develop a general process to take advantage of this data, which is illustrated through an example where we use multilevel regression and poststratification to produce vote choice estimates for the 2012 presidential election, projecting those estimates to 195 million registered voters in a postelection context. Our inferences are stable and reasonable down to demographic subgroups within small geographies and even down to the county or congressional district level. They can be used to supplement exit polls, which have become increasingly problematic and are not available in all geographies. We discuss problems, limitations, and open areas of research.

Keywords

Bayesian methods data augmentation hierarchical modeling poststratification response bias survey design

Type: Articles
Information: Political Analysis , Volume 28 , Issue 4 , October 2020 , pp. 507 - 531

DOI: https://doi.org/10.1017/pan.2020.3 [Opens in a new window]
Copyright: Copyright © The Author(s) 2020. Published by Cambridge University Press on behalf of the Society for Political Methodology.

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Article purchase

Temporarily unavailable

Footnotes

Contributing Editor: Jonathan Nagler

References

AAPOR Cell Phone Task Force. 2010. “New Considerations for Survey Researchers when Planning and Conducting RDD Telephone Surveys in the U.S. with Respondents Reached via Cell Phone Numbers.” Prepared for AAPOR Council by the Cell Phone Task Force operating under the auspices of the AAPOR Standards Committee.Google Scholar

AAPOR Task Force. 2013. “Report of the AAPOR Task Force on Non-Probability Sampling.” Working Paper.Google Scholar

Ansolabehere, S., and Hersh, E.. 2012. “Validation: What Big Data Reveal About Survey Misreporting and the Real Electorate.” Political Analysis 20(4):437–459.CrossRef Google Scholar

Barreto, M. A., Guerra, F., Marks, M., Nuño, S. A., and Woods, N. D.. 2006. “Controversies in Exit Polling: Implementing a Racially Stratified Homogeneous Precinct Approach.” PS: Political Science & Politics 39(3):477–483.Google Scholar

Barreto, M. A., Segura, G. M., and Woods, N. D.. 2004. “The Mobilizing Effect of Majority-Minority Districts on Latino Turnout.” American Political Science Review 98(1):65–75.CrossRef Google Scholar

Campbell, A., Converse, P. E., Miller, W. E., and Stokes, D. E.. 1964. The American Voter . New York: Wiley.Google Scholar

Coppock, A., and Green, D. P.. 2016. “Is Voting Habit Forming? New Evidence from Experiments and Regression Discontinuities.” American Journal of Political Science 60(4):1044–1062.CrossRef Google Scholar

Duane, S., Kennedy, A. D., Pendleton, B. J., and Roweth, D.. 1987. “Hybrid Monte Carlo.” Physics Letters B 195(2):216–222.CrossRef Google Scholar

Enos, R. D., and Fowler, A.. 2014. The Effects of Large-Scale Campaigns on Voter Turnout: Evidence from 400 Million Voter Contacts. Unpublished manuscript, Harvard University.Google Scholar

Erikson, R. S., Panagopoulos, C., and Wlezien, C.. 2004. “Likely (and Unlikely) Voters and the Assessment of Campaign Dynamics.” Public Opinion Quarterly 68(4):588–601.10.1093/poq/nfh041CrossRef Google Scholar

Fraga, B. L. 2016. “Candidates or Districts? Reevaluating the Role of Race in Voter Turnout.” American Journal of Political Science 60(1):97–122.CrossRef Google Scholar

Gelman, A., and Hill, J.. 2007. Data Analysis Using Regression and Multilevel/Hierarchical Models . New York: Cambridge University Press.Google Scholar

Gelman, A., Carlin, J. B., Stern, H. S., and Rubin, D. B.. 2004. Bayesian Data Analysis . Boca Raton, FL: Chapman and Hall/CRC.Google Scholar

Ghitza, Y., and Gelman, A.. 2013. “Deep Interactions with MRP: Election Turnout and Voting Patterns Among Small Electoral Subgroups.” American Journal of Political Science 57(3):762–776.CrossRef Google Scholar

Ghitza, Y., and Gelman, A.. 2019. “Replication Data for: Voter Registration Databases and MRP: Toward the Use of Large Scale Databases in Public Opinion Research.” doi:10.7910/DVN/H9X2AB, Harvard Dataverse, V1, UNF:6:PRdwfPnZN/+X+RTkDmOdpQ== [fileUNF].Google Scholar

Green, D. P., and Gerber, A. S.. 2006. “Can Registration-Based Sampling Improve the Accuracy of Midterm Election Forecasts?” Public Opinion Quarterly 70(2):197–223.CrossRef Google Scholar

Hersh, E. D., and Schaffner, B. F.. 2013. “Targeted Campaign Appeals and the Value of Ambiguity.” The Journal of Politics 75(02):520–534.10.1017/S0022381613000182CrossRef Google Scholar

Hersh, E. D., and Nall, C.. 2016. “The Primacy of Race in the Geography of Income-Based Voting: New Evidence from Public Voting Records.” American Journal of Political Science 60(2):289–303.CrossRef Google Scholar

Hoffman, M. D., and Gelman, A.. 2014. “The No-U-Turn Sampler: Adaptively Setting Path Lengths in Hamiltonian Monte Carlo.” Journal of Machine Learning Research 15:1351–1381.Google Scholar

Hur, A., and Achen, C. H.. 2013. “Coding Voter Turnout Responses in the Current Population Survey.” Public Opinion Quarterly 77(4):985–993.CrossRef Google Scholar

Issenberg, S.2012a. “How Obama’s Team Used Big Data to Rally Voters.” MIT Technology Review, December 19. https://www.technologyreview.com/s/509026/how-obamas-team-used-big-data-to-rally-voters/.Google Scholar

Issenberg, S. 2012b. The Victory Lab: The Secret Science of Winning Campaigns . New York: Random House.Google Scholar

Jackman, S., and Spahn, B.. 2015. “Unlisted in America.” Unpublished paper.Google Scholar

Lamb, A., Fuller, M., Varadarajan, R., Tran, N., Vandiver, B., Doshi, L., and Bear, C.. 2012. “The Vertica Analytic Database: C-Store 7 Years Later.” Proceedings of the VLDB Endowment 5(12):1790–1801.CrossRef Google Scholar

Malchow, H. 2008. Political Targeting . Washington, DC: Campaigns and Elections.Google Scholar

Mann, C. B., and Klofstad, C. A.. 2015. “The Role of Call Quality in Voter Mobilization: Implications for Electoral Outcomes and Experimental Design.” Political Behavior 37(1):135–154.CrossRef Google Scholar

McDonald, M. P. 2007. “The True Electorate: A Cross-Validation of Voter Registration Files and Election Survey Demographics.” Public Opinion Quarterly 71(4):588–602.CrossRef Google Scholar

Metropolis, N., Rosenbluth, A. W., Rosenbluth, M. N., Teller, A. H., and Teller, E.. 1953. “Equation of State Calculations by Fast Computing Machines.” The Journal of Chemical Physics 21(6):1087–1092.10.1063/1.1699114CrossRef Google Scholar

Olivella, S., and Montgomery, J. M.. 2018. “Tree-based models for political Science Data.” American Journal of Political Science 62(3):729–744.Google Scholar

R Core Team. 2012. R: A Language and Environment for Statistical Computing . Vienna, Austria: R Foundation for Statistical Computing. http://www.R-project.org/.Google Scholar

Rao, J. N. K. 2005. Small Area Estimation . New York: Wiley.Google Scholar

Rivers, D.2007. “Sampling for Web Surveys.” White paper prepared from presentation given at the 2007 Joint Statistical Meetings, Salt Lake City, Utah, July–August. https://pdfs.semanticscholar.org/fffa/a7e52c5d163a0944974a68160ee6e0a6b481.pdf.Google Scholar

Rogers, T., and Aida, M.. 2011. “Why Bother Asking? The Limited Value of Self-Reported Vote Intention.” HKS Working Paper RWP12-001.10.2139/ssrn.1971846CrossRef Google Scholar

Stan Development Team. 2013. “Stan: A C++ Library for Probability and Sampling.” http://mc-stan.org/, Version 1.3.Google Scholar

Urbanek, S.2012. “Package RJDBC.” http://cran.r-project.org/web/packages/RJDBC/index.html.Google Scholar

Waksberg, J. 1978. “Sampling Methods for Random Digit Dialing.” Journal of the American Statistical Association 73(361):40–46.CrossRef Google Scholar

Wang, W., Rothschild, D., Goel, S., and Gelman, A.. 2015. “Forecasting Elections with Non-Representative Polls.” International Journal of Forecasting 31(3):980–991.10.1016/j.ijforecast.2014.06.001CrossRef Google Scholar

Article contents

Voter Registration Databases and MRP: Toward the Use of Large-Scale Databases in Public Opinion Research

Abstract

Keywords

Access options

Article purchase

Temporarily unavailable

Footnotes

References

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests