Misclassification and Bias in Predictions of Individual Ethnicity from Administrative Records

LISA P. ARGYLE; MICHAEL BARBER

doi:10.1017/S0003055423000229

Misclassification and Bias in Predictions of Individual Ethnicity from Administrative Records

Published online by Cambridge University Press: 15 May 2023

LISA P. ARGYLE

and

MICHAEL BARBER

Show author details

LISA P. ARGYLE*: Affiliation:
Brigham Young University, United States
MICHAEL BARBER*: Affiliation:
Brigham Young University, United States
*: Lisa P. Argyle, Assistant Professor, Department of Political Science, Brigham Young University, United States, [email protected].
Michael Barber, Associate Professor, Department of Political Science, Brigham Young University, [email protected].

Article contents

Abstract
References

Get access

Rights & Permissions

Abstract

We show that a common method of predicting individuals’ race in administrative records, Bayesian Improved Surname Geocoding (BISG), produces misclassification errors that are strongly correlated with demographic and socioeconomic factors. In addition to the high error rates for some racial subgroups, the misclassification rates are correlated with the political and economic characteristics of a voter’s neighborhood. Racial and ethnic minorities who live in wealthy, highly educated, and politically active areas are most likely to be misclassified as white by BISG. Inferences about the relationship between sociodemographic factors and political outcomes, like voting, are likely to be biased in models using BISG to infer race. We develop an improved method in which the BISG estimates are incorporated into a machine learning model that accounts for class imbalance and incorporates individual and neighborhood characteristics. Our model decreases the misclassification rates among non-white individuals, in some cases by as much as 50%.

Type: Letter
Information: American Political Science Review , Volume 118 , Issue 2 , May 2024 , pp. 1058 - 1066

DOI: https://doi.org/10.1017/S0003055423000229 [Opens in a new window]
Copyright: © The Author(s), 2023. Published by Cambridge University Press on behalf of the American Political Science Association

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Article purchase

Temporarily unavailable

References

Adjaye-Gbewonyo, Dzifa, Bednarczyk, Robert A., Davis, Robert L., and Omer, Saad B.. 2014. “Using the Bayesian Improved Surname Geocoding Method (BISG) to Create a Working Classification of Race and Ethnicity in a Diverse Managed Care Population: A Validation Study.” Health Services Research 49 (1): 268–83.CrossRef Google Scholar

Ansolabehere, Stephen, Fraga, Bernard L., and Schaffner, Brian F.. 2020. “The CPS Voting and Registration Supplement Overstates Minority Turnout.” Journal of Politics 84 (3): 1850–5.Google Scholar

Argyle, Lisa P., and Barber, Michael. 2023. “Replication Data for: Misclassification and Bias in Predictions of Individual Ethnicity from Administrative Records.” Harvard Dataverse. Dataset. https://doi.org/10.7910/DVN/FEOKT6CrossRef Google Scholar

Baines, Arthur P., and Courchane, Marsha J.. 2014. “Fair Lending: Implications for the Indirect Auto Finance Market.” Study Prepared for the American Financial Services Association.Google Scholar

Craig, Maureen A., and Richeson, Jennifer A.. 2018. “Majority No More? The Influence of Neighborhood Racial Diversity and Salient National Population Changes on Whites’ Perceptions of Racial Discrimination.” RSF: The Russell Sage Foundation Journal of the Social Sciences 4 (5): 141–57.CrossRef Google Scholar

Curiel, John, and Dagonel, Angelo. 2020. “Wisconsin Election Analysis.” Stanford-MIT Healthy Elections Project 6: 2020–08.Google Scholar

Edwards, Frank, Esposito, Michael H., and Lee, Hedwig. 2018. “Risk of Police-Involved Death by Race/Ethnicity and Place, United States, 2012–2018.” American Journal of Public Health 108 (9): 1241–8.CrossRef Google Scholar

Edwards, Frank, Lee, Hedwig, and Esposito, Michael. 2019. “Risk of Being Killed by Police Use of Force in the United States by Age, Race–Ethnicity, and Sex.” Proceedings of the National Academy of Sciences 116 (34): 16793–98.CrossRef Google Scholar PubMed

Einstein, Katherine Levine, Glick, David M., and Palmer, Maxwell. 2019. Neighborhood Defenders: Participatory Politics and America’s Housing Crisis. Cambridge: Cambridge University Press.CrossRef Google Scholar

Elliott, Marc N., Morrison, Peter A., Fremont, Allen, McCaffrey, Daniel F., Pantoja, Philip, and Lurie, Nicole. 2009. “Using the Census Bureau’s Surname List to Improve Estimates of Race/Ethnicity and Associated Disparities.” Health Services and Outcomes Research Methodology 9 (2): 69–83.CrossRef Google Scholar

Enos, Ryan D. 2016. “What the Demolition of Public Housing Teaches Us about the Impact of Racial Threat on Political Behavior.” American Journal of Political Science 60 (1): 123–42.CrossRef Google Scholar

Enos, Ryan D., Kaufman, Aaron R., and Sands, Melissa L.. 2019. “Can Violent Protest Change Local Policy Support? Evidence from the Aftermath of the 1992 Los Angeles Riot.” American Political Science Review 113 (4): 1012–28.CrossRef Google Scholar

Fiscella, Kevin, and Fremont, Allen M.. 2006. “Use of Geocoding and Surname Analysis to Estimate Race and Ethnicity.” Health Services Research 41 (4p1): 1482–500.CrossRef Google Scholar PubMed

Fraga, Bernard, and Holbein, John. 2020. “Measuring Youth and College Student Voter Turnout.” Electoral Studies 65: 102086. https://doi.org/10.1016/j.electstud.2019.102086CrossRef Google Scholar

Fraga, Bernard L. 2018. The Turnout Gap: Race, Ethnicity, and Political Inequality in a Diversifying America. Cambridge: Cambridge University Press.CrossRef Google Scholar

Grinberg, Nir, Joseph, Kenneth, Friedland, Lisa, Swire-Thompson, Briony, and Lazer, David. 2019. “Fake News on Twitter during the 2016 US Presidential Election.” Science 363 (6425): 374–78.CrossRef Google Scholar PubMed

Grumbach, Jacob M., and Sahn, Alexander. 2020. “Race and Representation in Campaign Finance.” American Political Science Review 114 (1): 206–21.CrossRef Google Scholar

Henninger, Phoebe, Meredith, Marc, and Morse, Michael. 2018. “Who Votes without Identification? Using Affidavits from Michigan to Learn about the Potential Impact of Strict Photo Voter Identification Laws.” Working Paper.CrossRef Google Scholar

Herring, Cedric, and Henderson, Loren. 2016. “Wealth Inequality in Black and White: Cultural and Structural Sources of the Racial Wealth Gap.” Race and Social Problems 8 (1): 4–17.CrossRef Google Scholar

Hofstra, Bas, and de Schipper, Niek C.. 2018. “Predicting Ethnicity with First Names in Online Social Media Networks.” Big Data & Society 5 (1). https://doi.org/10.1177/2053951718761141CrossRef Google Scholar

Imai, Kosuke, and Khanna, Kabir. 2016. “Improving Ecological Inference by Predicting Individual Ethnicity from Voter Registration Records.” Political Analysis 24 (2): 263–72.CrossRef Google Scholar

Martino, Steven C., Weinick, Robin M., Kanouse, David E., Brown, Julie A., Haviland, Amelia M., Goldstein, Elizabeth, Adams, John L., et al. 2013. “Reporting CAHPS and HEDIS Data by Race/Ethnicity for Medicare Beneficiaries.” Health Services Research 48 (2pt1): 417–34.CrossRef Google Scholar

Nguyen, Vy T., Zafonte, Ross D., Chen, Jarvis T., Kponee-Shovein, Kalé Z., Paganoni, Sabrina, Pascual-Leone, Alvaro, Speizer, Frank E., et al. 2019. “Mortality among Professional American-Style Football Players and Professional American Baseball Players.” JAMA Network Open 2 (5): e194223. https://doi.org/10.1001/jamanetworkopen.2019.4223CrossRef Google Scholar PubMed

Thomas, Timothy Andrew. 2017. “Forced Out: Race, Market, and Neighborhood Dynamics of Evictions.” PhD diss. Department of Sociology, University of Washington.Google Scholar

Voicu, Ioan. 2018. “Using First Name Information to Improve Race and Ethnicity Classification.” Statistics and Public Policy 5 (1): 1–13.CrossRef Google Scholar

Wolfinger, Raymond E., and Rosenstone, Steven J.. 1980. Who Votes? New Haven, CT: Yale University Press.Google Scholar

Argyle and Barber Dataset

Dataset

https://doi.org/10.7910/DVN/FEOKT6

Link

Argyle and Barber supplementary material

PDF 1.7 MB

Submit a response

Comments

No Comments have been published for this article.

Article contents

Misclassification and Bias in Predictions of Individual Ethnicity from Administrative Records

Abstract

Access options

Article purchase

Temporarily unavailable

References

Argyle and Barber Dataset

Argyle and Barber supplementary material

Comments

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests