Recognizing Sample-Selection Bias in Historical Data

Ariell Zimran

doi:10.1017/ssh.2020.11

Recognizing Sample-Selection Bias in Historical Data

Published online by Cambridge University Press: 06 July 2020

Ariell Zimran

Show author details

Ariell Zimran*: Affiliation:
Vanderbilt University and the National Bureau of Economic Research
*: Email: [email protected]

Article contents

Abstract
References

Get access

Rights & Permissions

Abstract

Recent research has ignited a debate in social science history over whether and how to draw conclusions for whole populations from sources that describe only select subsets of these populations. The idiosyncratic availability and survival of historical sources create a threat of sample-selection bias—an error that arises when there are systematic differences between the observed sample and the population of interest. This danger is common in studying trends in health as measured by average stature—scholars can often observe these trends only for soldiers and other similar groups; but whether these patterns are representative of those of the broader population is unclear. This article illustrates what simple patterns in a potentially selected sample can be used to recognize the presence of sample-selection bias in a source, and to understand how such bias might affect conclusions drawn from this source. Applying this intuition to the use of military data to describe stature in the antebellum United States, I present several simple empirical exercises based on these patterns. Finally, I use the results of these exercises to describe how sample-selection bias might affect the use of these data in testing for differences in average stature between the Northeast and the Midwest.

Type: Special Issue Article
Information: Social Science History , Volume 44 , Special Issue 3: Selection Bias and Social Science History , Fall 2020 , pp. 525 - 554

DOI: https://doi.org/10.1017/ssh.2020.11 [Opens in a new window]
Copyright: © The Author(s), 2020. Published by Cambridge University Press on behalf of the Social Science History Association

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Article purchase

Temporarily unavailable

References

Abramitzky, Ran (2015) “Economics and the modern economic historian.” Journal of Economic History 75 (4): 1240–51.CrossRef Google Scholar

Biavaschi, Costanza, Guilietti, Corrado, and Siddique, Zahra (2017) “The economic payoff of name Americanization.” Journal of Labor Economics 35 (4): 1089–116.CrossRef Google Scholar

Bodenhorn, Howard, Guinnane, Timothy W, and Mroz, Thomas A (2017) “Sample-selection biases and the industrialization puzzle.” Journal of Economic History 77 (1): 171–207.CrossRef Google Scholar

Bodenhorn, Howard, Guinnane, Timothy W, and Mroz, Thomas A (2019) “Diagnosing sample-selection bias in historical heights: A reply to Komlos and A’Hearn.” Journal of Economic History 79 (4): 1154–75.CrossRef Google Scholar

Bushway, Shawn, Johnson, Brian D, and Slocum, Lee Ann (2007) “Is the magic still there? The use of the Heckman two-step correction for selection bias in criminology.” Journal of Quantitative Criminology 23 (2): 151–78.CrossRef Google Scholar

Coffman, Edward M. (1986) The Old Army: A Portrait of the American Army in Peacetime, 1784–1898. New York: Oxford University Press.Google Scholar

Collins, William J. (2015) “Looking forward: Positive and normative views of economic history’s future.” Journal of Economic History 75 (4): 1228–33.CrossRef Google Scholar

Cosslett, Stephen R. (1981) “Efficient estimation of discrete-choice models,” in Manski, Charles F and McFadden, Daniel (eds.) Structural Analysis of Discrete Data with Econometric Applications. Cambridge, MA: MIT Press: 51–111.Google Scholar

Costa, Dora L., and Kahn, Matthew E (2003) “Cowards and heroes: Group loyalty in the American Civil War.” Quarterly Journal of Economics 118 (2): 519–48.CrossRef Google Scholar

Costa, Dora L., and Kahn, Matthew E (2007) “Deserters, social norms, and migration.” Journal of Law and Economics 50 (2): 323–53.CrossRef Google Scholar

Cuff, Timothy (2005) The Hidden Cost of Economic Development: The Biological Standard of Living in Antebellum Pennsylvania. Burlington, VT: Ashgate.Google Scholar

Easterlin, Richard (1960) “Interregional differences in per capita income, population, and total income, 1840–1950,” in National Bureau of Economic Research Council (eds.) Trends in the American Economy in the Nineteenth Century. Conference on Research in Income and Wealth. Princeton, NJ: Princeton University Press: 73–140.Google Scholar

Eli, Shari, Salisbury, Laura, and Shertzer, Allison (2018) “Ideology and migration after the American Civil War.” Journal of Economic History 78 (3): 822–61.CrossRef Google Scholar

Ferrie, Joseph P. (1997) “Migration to the frontier in mid-nineteenth century America: A re-examination of Turner’s ‘safety valve.’” Mimeograph, Northwestern University.Google Scholar

Floud, Roderick, Fogel, Robert W, Harris, Bernard, and Hong, Sok Chul (2011) The Changing Body: Health, Nutrition, and Human Development in the Western World since 1700. New York: Cambridge University Press.CrossRef Google Scholar

Fogel, Robert W. (1986) “Nutrition and the decline in mortality since 1700: Some preliminary findings,” in Engerman, Stanley L and Gallman, Robert E (eds.) Long-Term Factors in American Economic Growth. Chicago: University of Chicago Press: 439–556.Google Scholar

Fogel, Robert W., Costa, Dora L, Haines, Michael R, Lee, Chulhee, Nguyen, Louis, Pope, Clayne, Rosenberg, Irvin, Scrimshaw, Nevin, Trussell, James, Wilson, Sven, Wimmer, Larry T, Kim, John, Bassett, Julene, Burton, Joseph, and Yetter, Noelle (2000) Aging of Veterans of the Union Army: Version M-5. Chicago: Center for Population Economics, University of Chicago Graduate School of Business, Department of Economics, Brigham Young University, and the National Bureau of Economic Research.Google Scholar

Fogel, Robert W., and Engerman, Stanley L (1974) Time on the Cross: The Economics of American Negro Slavery. Boston: Little, Brown, and Co.Google Scholar

Fogel, Robert W., Engerman, Stanley L, Floud, Roderick, Friedman, Gerald, Margo, Robert A., KennethSokoloff, , Steckel, Richard H., Trussell, T. James, Villaflor, Georgia, and Wachter, Kenneth W (1983) “Secular changes in American and British Stature and Nutrition.” Journal of Interdisciplinary History 14 (2): 445–81.CrossRef Google Scholar PubMed

Foner, Jack D. (1970) The United States Soldier between Two Wars: Army Life and Reforms, 1865–1898. New York: Humanities Press.Google Scholar

Gallman, Robert E. (1996) “Dietary change in antebellum America.” Journal of Economic History 56 (1): 193–201.CrossRef Google Scholar

Gould, Benjamin Apthorp (1869) Investigations in the Military and Anthropological Statistics of American Soldiers. Sanitary Memoirs of the War of the Rebellion. Collected and Published by the United States Sanitary Commission. New York: Hurd and Houghton.Google Scholar

Heckman, James J. (1979) “Sample selection bias as a specification error.” Econometrica 47 (1): 153–61.CrossRef Google Scholar

ICPSR (1999) United States Historical Election Returns, 1824–1968 (ICPSR 1) [machine-readable database]. Ann Arbor, MI.Google Scholar

Komlos, John (2004) “How to (and how not to) analyze deficient height samples: An introduction.” Historical Methods 37 (4): 160–73.CrossRef Google Scholar

Komlos, John (2012) “A three-decade history of the antebellum puzzle: Explaining the shrinking of the US Population at the Onset of Modern Economic Growth.” Journal of the Historical Society 12 (4): 395–445.CrossRef Google Scholar

Komlos, John (2019) “Shrinking in a growing economy is not so puzzling after all.” Economics and Human Biology 32: 40–55.CrossRef Google Scholar

Komlos, John (2020) “Multicollinearity in the presence of errors-in-variables can increase the probability of type-I error.” Journal of Economics and Econometrics 63 (1): 1–17.Google Scholar

Komlos, John, and A’Hearn, Brian (2019) “Clarifications of a puzzle: The decline in nutritional status at the onset of modern economic growth in the United States.” Journal of Economic History 79 (4): 1129–53.CrossRef Google Scholar

Kosack, Edward, and Ward, Zachary (2014) “Who crossed the border? Self-selection of Mexican migrants in the early twentieth century.” Journal of Economic History 74 (4): 1015–44.CrossRef Google Scholar

Logan, Trevon D., and Pritchett, Jonathan B (2018) “On the marital status of US slaves: Evidence from Touro Infirmary, New Orleans, Louisiana.” Explorations in Economic History 69: 50–63.CrossRef Google Scholar

Manson, Steven, Schroeder, Jonathan, Van Riper, David, and Ruggles, Steven (2017) IPUMS National Historical Geographic Information System: Version 12.0 [Database]. Minneapolis: University of Minnesota.Google Scholar

Margo, Robert A., and Steckel, Richard H (1983) “Heights of native-born whites during the antebellum period.” Journal of Economic History 43 (1): 167–74.CrossRef Google Scholar PubMed

McKeown, Thomas (1976) The Modern Rise of Population. London: Arnold.Google Scholar

Mitch, David (1993) “The role of human capital in the first Industrial Revolution,” in Mokyr, Joel (ed.) The British Industrial Revolution: An Economic Perspective. Boulder, CO: Westview: 267–307.Google Scholar

Mokyr, Joel, and Gráda, Cormac Ó (1996) “Height and health in the United Kingdom 1815–1860: Evidence from the East India Company Army.” Explorations in Economic History 33: 141–68.CrossRef Google Scholar

Records of the Adjutant General’s Office (1861–1865) Regimental records, including descriptive rolls, order, and letter books, and morning reports, of volunteer organizations, Civil War, 1861–65. Records relating to volunteers and volunteer organizations. Record Group 94.2.4. Washington, DC: National Archives Building.Google Scholar

Ruggles, Steven, Genadek, Katie, Goeken, Ronald, Grover, Josiah, and Sobek, Matthew (2015) Integrated Public Use Microdata Series: Version 6.0 [machine-readable database]. Minneapolis: University of Minnesota.Google Scholar

Steckel, Richard H., and Ziebarth, Nicolas (2016) “Selectivity and measured catch-up growth of American Slaves.” Journal of Economic History 76 (1): 104–38.CrossRef Google Scholar

Stewart, James I. (2006) “Migration to the agricultural frontier and wealth accumulation.” Explorations in Economic History 43: 547–77.CrossRef Google Scholar

Vella, Francis (1998) “Estimating models with sample selection bias: A survey.” Journal of Human Resources 33 (1): 127–69.CrossRef Google Scholar

Weigley, Russell F. (1967) History of the United States Army. New York: The Macmillan Company.Google Scholar

Zehetmayer, Matthias (2011) “The continuation of the antebellum puzzle: Stature in the US, 1847–1894.” European Review of Economic History 15: 313–27.CrossRef Google Scholar

Zimran, Ariell (2018) “Replication: Sample-selection bias and height trends in the nineteenth-century United States.” Ann Arbor, MI: Inter-University Consortium for Political and Social Research [distributor]. http://doi.org/10.3886/E107742V1 CrossRef Google Scholar

Zimran, Ariell (2019) “Sample-selection bias and height trends in the nineteenth-century United States.” Journal of Economic History 79 (1): 99–138.CrossRef Google Scholar

Zimran, Ariell (2020) “Transportation and health in the antebellum United States, 1820–1847.” Journal of Economic History, Forthcoming.CrossRef Google Scholar

Article contents

Recognizing Sample-Selection Bias in Historical Data

Abstract

Access options

Article purchase

Temporarily unavailable

References

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests