INTRODUCTION
Salmonellosis, shigellosis, and Escherichia coli O157:H7 infection are the three most commonly reported nationally notifiable enteric bacterial diseases in the United States [Reference Gupta1, Reference Olsen2]. Between 1993 and 2002, the annual incidence rates (cases per 100 000 persons) for salmonellosis, shigellosis, and E. coli O157:H7 infection in the US ranged from 14·5 to 17·7, from 6·4 to 12·5, and from 1·0 to 1·8, respectively [3]. These pathogens are typically transmitted via food or directly from an infected animal or human. Investigations of outbreaks have found that infections caused by these pathogens may be associated with poor personal hygiene, improper infection control practices within nursing homes or day-care centres, and inappropriate production or preparation of food (e.g. inadequate cooking or keeping food at the wrong holding temperature) [Reference Mead4–Reference Warren, Parish and Schneider7].
In recent years, several societal and behavioural factors that contribute to the epidemiology of enteric diseases have changed [Reference Hedberg, MacDonald and Osterholm8, Reference Kaferstein and Abdussalam9]. For example, increasing numbers of people are eating raw or uncooked foods as they pursue healthier lifestyles. In addition, new methods of food production have been implemented and networks for food distribution have expanded. More day-care facilities and nursing homes have been established for the increasing number of children and older people requiring care in institutionalized settings. These changes in society and industry have increased the potential risk of exposure to the pathogens described – often by increasing the opportunity for contaminated food or for person-to-person exposure.
For the organisms of interest, only limited data has been collected and reported at the national level on demographic or other risk factors. Previous analyses of surveillance data on enteric diseases have demonstrated considerable variability in incidence by demographic and socioeconomic risk factors, such as age [Reference Wang10–Reference Koutsotoli12], race/ethnicity [Reference Gupta1, Reference Shiferaw13–Reference Robins-Browne15], sex [Reference Gupta1, Reference Ethelberg11, Reference Hasin16], educational attainment [Reference Younus17], poverty status [Reference Simonsen, Frisch and Ethelberg18, Reference Kelly-Hope19], household composition and size [Reference Simonsen, Frisch and Ethelberg18, Reference Ethelberg20–Reference Johnson22], and geographic distribution [Reference Gupta1, Reference Olsen23, Reference Bender24]. Many of these analyses have used individual-level factors to identify risk factors for infection, but ecological analysis, which focuses on groups, can also be useful [Reference Koopman and Longini25–Reference Morgenstern28], as it may identify community-level factors that are associated with the risk of enteric illness. Identifying sociodemographic and economic factors associated with the incidence of enteric disease may lead to new hypotheses concerning vehicles and routes of disease transmission in the community and interventions that may prevent transmission. We conducted an ecological study to identify community-level sociodemographic factors associated with county-specific incidence rates for salmonellosis, shigellosis, and E. coli O157:H7 infection in the United States.
METHODS
Data and sources
We analysed data from the National Notifiable Diseases Surveillance System (NNDSS) that was voluntarily reported to the Centers for Disease Control and Prevention (CDC) from state health departments and the health departments of New York City and the District of Columbia (DC) from 1993 to 2002 for salmonellosis and shigellosis and from 1995 to 2002 for E. coli O157:H7 [29]. Salmonellosis and shigellosis were designated as nationally notifiable throughout the study period; E. coli O157:H7 became nationally notifiable in 1994. Although surveillance data was available for E. coli O157:H7 from 1993 and 1994, we excluded it to allow national reporting practices for this infection to stabilize. Our analysis included cases reported from states and other jurisdictions (New York City, DC) in which the disease was reportable by law or statute. For E. coli O157:H7 infection, data from states in which this disease was not reportable by law or statute (12 states in 1995 and six states in 1996) were excluded from the analysis. For salmonellosis and shigellosis, data were reportable by law or statute from all states during 1993–2002 and were included in the analysis. Surveillance data reported from the US territories were excluded.
County-specific sociodemographic, economic, and occupational data collected by the U.S. Bureau of the Census for 2000 and data on the health-care workforce and indexes of capacity collected by the Health Resources and Services Administration were used as the independent variables in the analysis (Table 1) [30–34]. To examine the incidence of disease by geographic distribution, counties were categorized into four US regions (Northeast, Midwest, South, and West) [35].
1. United States Bureau of the Census. County and City Data Book: 2000 (http://www.census.gov/prod/www/ccdb.html). Accessed 15 January 2004.
2. United States Department of Commerce, Economics and Statistics Administration, United States Bureau of the Census. 1997 Census of Governments, Volume 4, No. 5, Compendium of Government Finances (http://www.census.gov/prod/gc97/gc974-5.pdf). Accessed 25 June 2004.
3. United States Bureau of the Census. Census 2000 summary file 1 100-percent data, Detail tables (http://factfinder.census.gov/servlet/DTGeoSearchByListServlet?ds_name=DEC_2000_SF1_U&state=dt&mt_name=DEC_2000_SF1_U_P002&_lang=en&_ts=97683722680). Obtained using CDC WONDER.
4. United States Bureau of the Census. Census 2000 special equal employment opportunity (EEO) file (http://www.eeoc.gov/stats/census/index.html). Obtained using CDC WONDER.
5. Area Resource File Access System 2003. Health Resources and Services Administration, Bureau of Health Professions, National Center for Health Workforce Analysis. Rockville, Maryland. Prepared by: Quality Resource Systems, Inc, Fairfax, Virginia (http://www.arfsys.com).
6. United States Bureau of the Census. Geographic terms and definitions (http://www.census.gov/popest/geographic/estimates_geography.html). Accessed 18 August, 2006.
Data analysis
County- and disease-specific mean incidence rates for salmonellosis, shigellosis, and E. coli O157:H7 infection were calculated as the sum of annual disease-specific case counts reported to NNDSS divided by the sum of the annual county-specific population estimates over the period evaluated in this analysis (10 years for salmonellosis and shigellosis; 8 years for E. coli O157:H7 infection). The county-specific mean incidence rates served as dependent variables in the disease-specific models. County-level bridged-race population estimates from the U.S. Bureau of the Census for 1993–2002 were used as denominators to calculate county incidence rates [36]. To avoid using rates that might be unstable for counties with small populations and extreme rates, we excluded data for counties with a population below 1000 or with incidence rates above the 99th percentile of ranked county incidence rates by disease and year (resulting in the exclusion of 1·3%, 1·2%, and 1·1% of US counties for salmonellosis, shigellosis, and E. coli O157:H7 infection, respectively).
We calculated Spearman's rank correlation coefficients to examine the associations between the county-level sociodemographic variables and the county-specific 10-year mean annual incidence rates for each study condition. Simple linear regression analyses were performed between mean annual incidence rates for each study condition and each of the 26 independent sociodemographic variables. To stabilize the variance of the independent variables and to normalize their distribution, independent variables were transformed by taking either the square root or natural log of the value. If the value of the independent variable equalled 0, 1 was added to the value before taking the natural log. Extreme values were excluded to reduce the variance in the model. Multivariate linear regression analysis was also conducted for each selected condition by using a forward stepwise regression procedure with P values of 0·25 and 0·05 as thresholds for a variable to enter the model and stay in the model, respectively [Reference Kleinbaum, Kupper and Muller37].
Because no adequate county-specific data on rates of food service employees or of day-care workers was available for counties with population <50 000, these two variables were excluded in the multivariate analysis. The independent variables for which county-specific data was missing for >5% of the counties were excluded in the multivariate analysis: these included reported violent crime rate, local per capita expenditures for social services, local per capita expenditures for education services, and rates of food service employees and day-care workers. To avoid collinearity between independent variables in fitting reliable regression models, several variables were dropped from the models, including percentage of the population aged 18–44 years, percentage of the households with one or more persons aged <18 years, and percentage of the households with one or more persons aged ⩾65 years.
RESULTS
From 1993 to 2002, a total of 403 464 cases of salmonellosis were reported from 3070 US counties, 234 148 cases of shigellosis were reported from 2717 US counties, and 26 411 cases of E. coli O157:H7 infection were reported from 2068 US counties (1995–2002) (see Table 2 for the distribution of US counties by annual number of reported cases). Twenty percent of US counties reported a mean of <1 salmonellosis case per year, for shigellosis and E. coli O157:H7 infection the comparable figures were 40% and 44%, respectively (Table 2). One-third (1075) of US counties had no reports on E. coli O157:H7 infection during 1995–2002, far more than the 73 (2%) and 426 (14%) counties that did not report salmonellosis and shigellosis cases, respectively, during 1993–2002. For those counties reporting at least one case during the study period, the average county-specific annual incidence rates were 13·5 (median 12·0, range 0·8–49·0), 6·6 (median 4·0, range 0·2–57·5), and 2·1 (median 1·4, range 0·0–15·9) per 100 000 persons for salmonellosis, shigellosis, and E. coli O157:H7 infections, respectively.
* Includes data reported from 1995 to 2002.
The highest incidence rates for salmonellosis, shigellosis, and E. coli O157:H7 infection were observed in counties in the Northeast, South, and West, respectively (Fig. 1). Across all four regions, higher incidence rates for salmonellosis and shigellosis were seen in counties where >50% of the population lived in urban areas. In contrast, incidence of E. coli O157:H7 was highest in counties where <50% of the population lived in urban communities.
In the Spearman's rank correlation analysis, the incidence of salmonellosis was moderately correlated (0·2 ⩽r<0·3) with the percentage of the population that was black or African American (r=0·2), the physician rate per 100 000 persons (r=0·2), and the percentage of the population aged 45–64 years (r=−0·2). The three leading correlated factors for incidence of shigellosis were percentage of the population Hispanic or Latino (r=0·3), percentage of the population aged <5 years (r=0·3), and the percentage aged 45–64 years (r=−0·3). For E. coli O157:H7 infection, these factors were the percentage of the population that was black (r=−0·5), residence in the South (r=−0·5), and percentage of the population living on a farm (r=0·4).
In the simple linear regression analysis, salmonellosis and shigellosis were generally similar to each other in their patterns of positive and negative associations with selected sociodemographic factors (Table 3). Many sociodemographic and economic factors (e.g. population distribution by selected age groups, race, ethnicity, urbanization, poverty level, crime rate, and physician rate) were positively associated with the incidence of salmonellosis and shigellosis but negatively associated with the incidence of E. coli O157:H7 infection. In contrast, population distribution by education level, population living on a farm, local per capita expenditures for education, and Medicare enrolment rates showed inconsistent associations with these three diseases.
+, Positive association; −, negative association; ×, no association.
* From univariate regression models. Significance was assessed using a P value of ⩽0·05.
In the multivariate regression analysis, the sociodemographic and economic variables included in Table 4 accounted for 12%, 17%, and 33% of the variation in incidence of salmonellosis, shigellosis, and E. coli O157:H7 infection in US counties, respectively. Much of the attributed variation was due to the three leading factors for each condition. For salmonellosis, the percentage of the population that was black, the percentage unemployed (negative association), and percentage of the population that was Hispanic or Latino accounted for 7% of the total variation. For shigellosis, the three leading factors were percentage of the population aged <5 years, percentage of population below poverty level, and percentage unemployed (negative association), and these three factors accounted for 12% of the variation. For E. coli O157:H7 infection, the leading factors were percentage of population living on a farm, percentage of adults with less than a ninth-grade education (negative association), and residence in the South (negative association), which accounted for 33% of the variation.
* Models were built using a forward selection stepwise regression procedure with a P value of ⩽0·05 as a significant threshold to retain a variable in the model. Variables were excluded from the analysis if: (1) more than 5% of the county-specific data on variables in Table 1 were not available including reported violent crime rate, local per capita expenditures for social services, local per capita expenditures for education services, and rates of food service employees and day-care workers. (2) To avoid collinearity between variables, percentage of the population aged 18–44 years, percentage of the households with one or more persons aged <18 years, and percentage of the households with one or more persons aged ⩾65 years were also excluded from the analysis.
DISCUSSION
In this analysis we found that variation in the incidence of salmonellosis, shigellosis, and E. coli O157:H7 infection in US counties was due in part to a diverse set of sociodemographic and economic factors, illustrating the complex relationship between community characteristics and the dynamics of disease transmission. During the study period, salmonellosis had a higher incidence and was more widely dispersed geographically than shigellosis or E. coli O157:H7 infection. The county-level characteristics most closely associated with incidence of these enteric diseases included measures of race, ethnicity, place of residence, age group, poverty, unemployment, and urbanization. The variation in incidence rates attributed to these county-level variables ranged from only 12% for salmonellosis to 33% for E. coli O157:H7.
In general, salmonellosis and shigellosis had similar group-level associations in the county sociodemographic, economic, and workforce characteristics evaluated. Geographically, the incidence of salmonellosis was higher in the Northeast and South, and the incidence of shigellosis was higher in the South. In contrast, the incidence of E. coli O157:H7 was higher in the West and Midwest regions and lowest in the South. These findings suggest that the incidence of both salmonellosis and shigellosis was higher in counties with higher urban populations in the Eastern coast region in which communities have more health-care facilities and more physicians available, and have better access to medical care. In contrast, the incidence of E. coli O157:H7 was consistently higher in counties with a higher percentage of the population living on a farm or in non-urban settings in the US Mountain region. This last association (at least in terms of farms) may be due to more direct or indirect contact with cattle or other ruminant animals, the primary reservoir for E. coli O157:H57 [Reference Crump38]. The physician rate per 100 000 population, a surrogate measure of access to health care accounted for <2% of the variation in the incidence of salmonellosis and <0·5% of the variation in the incidence of shigellosis or E. coli O157:H7 (Table 4). Factors associated with greater health-care resources (such as rates of physicians and community hospital beds) may result in higher rates of diagnosis and case reporting, but in our study the explanatory value of the physician rate was quite small, as noted.
We found that the incidence of salmonellosis was higher in communities with a higher percentage of children aged <5 years or a greater percentage of persons aged ⩾65 years. The higher incidence of salmonellosis at the extremes of age may be due to these groups tending to get more severe Salmonella infections. Patients with more severe infections may be more likely to seek medical care and be diagnosed and reported. In addition, parents may be more likely to seek medical care for their young child with a diarrhoeal disease than they would for themselves [39]. Furthermore, the elderly and children may have better access to care than other age groups due to higher insurance coverage rates [Reference Heyman, Schiller and Barnes40]. Last, the relatively higher incidence of salmonellosis at the extremes of age may also be due to factors not evaluated in this analysis, such as greater susceptibility of the host or certain environmental exposures.
Slightly higher incidence of salmonellosis was reported from communities with more black or Hispanic residents, a finding that may be due to socioeconomic and cultural differences, knowledge and practices of food safety, and personal hygiene in population subgroups. Higher incidence of salmonellosis was reported in blacks than in whites in a state registry-based study [Reference Arshad14]; however, this association was not significant at the geographic block group level [Reference Younus17]. For shigellosis, incidence was higher in communities with more children aged <5 years, more residents living below the poverty level, and more Hispanic residents. Reasons for the association of the incidence of shigellosis and salmonellosis with the proportion of racial/ethnic subpopulations in counties are not known [Reference Gupta1, Reference Shiferaw13], but may, in part, relate to higher poverty rates and lower education rates in Hispanic populations compared with other racial/ethnic groups [Reference Hayes-Bautista, Baezconde-Garbanati and Hayes-Bautista41, Reference Hayes-Bautista42]. High shigellosis rates in young children may be attributable to difficulties in teaching and maintaining good hygiene practices (e.g. effective hand washing), lack of acquired immunity to Shigella infection, or exposure to congregate settings such as day-care facilities [Reference Gupta1, Reference Mohle-Boetani43, Reference Shane44]. Finally, because the associations identified were attributed to county populations and not to individuals of certain racial or ethnic groups, other characteristics common to racially or ethnically diverse counties may have been responsible for higher rates of illnesses, but were not included in this analysis.
Previous epidemiological studies have demonstrated a higher incidence of enteric disease in demographic groups with lower socioeconomic status [Reference Olowokure45–Reference Cifuentes48]. Our study identified a lower incidence of salmonellosis and shigellosis in communities with higher unemployment. Unemployment may limit access to health care and lead to under-diagnosis of these conditions in unemployed persons.
We found that the incidence of all three enteric diseases we investigated was higher in communities with more people educated at or above the ninth-grade level. These findings seem counterintuitive, but other studies examining risk factors for foodborne disease and the prevalence of practices for consuming or handling food have found that people with a college or university degree beyond a bachelor's were more likely to consume undercooked hamburger and to handle raw meat in an unsafe manner than were persons reporting less education [Reference Shiferaw49, Reference Roseman and Kurzynske50]. In addition, a meta-analysis of 20 studies assessing the association between consumers' knowledge and practices regarding food safety and their demographics noted that higher-income and more educated persons reported greater consumption of raw foods, less knowledge of hygiene, and poorer practices in terms of cross contamination of food [Reference Patil, Cates and Morales51]. It seems also possible that the association between lower socioeconomic status and lower incidence of salmonellosis and shigellosis may be due to less access to health-care services and to stool culture in this group (i.e. under-detection, a surveillance artifact) [Reference Scallan52]. Individuals of higher education who may also have more discretionary income may eat outside the home more frequently and be more likely to own pets, both of which are previously identified risk factors for salmonellosis [Reference Younus17, Reference Marcus53].
There are several limitations to this study. First, while the epidemiology of salmonellosis and shigellosis may vary by serotype [Reference Jones54], the NNDSS does not differentiate Salmonella or Shigella species by serotype. Accordingly, we were unable to identify community-level determinants that may account for serotype-specific variation in the incidence of Salmonella or Shigella. In contrast, the NNDSS monitors a single serotype of E. coli, E. coli O157:H7, a significant cause of diarrhoea, bloody diarrhoea, and haemolytic uraemic syndrome, thus allowing for a more direct measurement of the variation related to specific exposure factors of this bacterial pathogen. Variation in the specificity of information on the pathogen may have led to less specific associations between the community-level determinants and reported enteric disease incidence for salmonellosis and shigellosis.
Another concern is that the burden of enteric bacterial disease is underreported [Reference Jones, Scallan and Angulo55] through the NNDSS [Reference Thomas56], a passive surveillance system that relies on physicians and laboratories to report to state and local health departments. Although enteric illnesses can be severe or even fatal, many infected persons may have mild clinical illness and thus not seek care; these persons would be neither diagnosed nor reported through routine surveillance. On the other hand, increased county-specific incidence rates may reflect a true disease outbreak which occurred during the study period or merely reflect surveillance artifact. For example, NNDSS-based disease incidence reported from the counties participating in FoodNet (http://www.cdc.gov/FoodNet/), the Foodborne Diseases Active Surveillance Network of CDC's Emerging Infections Program, may be higher than the incidence reported from other counties due to greater completeness of reporting as a result of FoodNet's active surveillance methodology.
Finally, as is common with ecological analyses, we were not able to adequately assess confounding or bias due to misclassification [Reference Guthrie and Sheppard57–Reference Greenland60]. Theoretically, using the results of ecological studies to make inferences about individual health risks can be problematic. The group-level data approach can address many of the sources of bias due to misspecification of confounders, confounder measurement errors, and the lack of information about the within-group distribution of exposures and potential confounders. Not all community-level determinants were available for all county comparisons in this analysis, which to some degree weakened our assessment of the associations between these determinants and incidence of the diseases of interest. Additionally, the time periods during which the independent community-level variables and the county incidence data were collected were not concordant for several variables. For example, data was collected in 1990 for the percentage of the population living on a farm, in 1997 for local per capita expenditures for social services, and in 1999 for the reported violent crime rate per 100 000 persons. In each case, more recent data was not available for the study period.
The perspective of group-level analysis acknowledges the contribution of both individual (indirectly) and community (directly) factors in determining population health status, although not to the extent possible in individual-level analyses (e.g. case-control or cohort studies). Enteric disease risk factor effects are commonly manifest upon contact between infected and susceptible individuals or following exposure to contaminated food. However, some of the exposure–infection relationships missed at the individual level may be demonstrated in ecological analysis [Reference Koopman and Longini25]. As opposed to the individual level, group-level analysis accounts for community factors in addition to geographic location and population distribution. This study identified several county-level sociodemographic and economic factors associated with the risk of enteric illness which may help identify effective interventions. For example, a strong positive association between a county with a higher percentage of the population living on a farm and E. coli O157:H7 incidence could lead to recommendations at the county level, such as health departments in counties with a higher percentage of population living on a farm should provide education about prevention of E. coli O157:H7 transmission in farm settings. Interventions aimed at controlling the contamination of foods by various pathogens and improving hygienic conditions in certain subpopulations or specific occupations may reduce the risk of enteric diseases. The variable associations noted between county sociodemographic factors and the incidence of salmonellosis and shigellosis may be due to the lack of specificity with respect to information on serotype for salmonellae and shigellae. Future ecological analyses should use serotype-specific incidence data, which may be available from laboratory-based surveillance systems. Further investigation of the significance of these factors and the mechanisms by which they account for variation in the incidence of these diseases in community-based studies is needed. Counties with high incidence of enteric disease are also likely to be overburdened by other infectious and chronic diseases. Over the long term, addressing larger community issues related to social, legal, economic, and political factors may be necessary to reduce enteric bacterial disease incidence.
ACKNOWLEDGEMENTS
The authors are grateful to the staff of the US state and territorial health departments who support surveillance activities for notifiable diseases. Additionally, we thank the staff of the Division of Integrated Surveillance Systems and Service, National Center for Public Health Informatics, CDC, for maintaining and disseminating state-based data reported to the National Notifiable Diseases Surveillance System. The findings and conclusions in this report are those of the authors and do not necessarily represent the official position of the Centers for Disease Control and Prevention.
DECLARATION OF INTEREST
None.