Hostname: page-component-586b7cd67f-t8hqh Total loading time: 0 Render date: 2024-11-25T09:54:10.508Z Has data issue: false hasContentIssue false

Predicting food insecurity in a pediatric population using the electronic health record

Published online by Cambridge University Press:  28 October 2024

Joseph Rigdon
Affiliation:
Department of Biostatistics and Data Science, Wake Forest School of Medicine, Winston-Salem, USA
Kimberly Montez
Affiliation:
Department of Pediatrics, Wake Forest School of Medicine, Winston-Salem, USA Maya Angelou Center for Health Equity, Wake Forest School of Medicine, Winston-Salem, USA
Deepak Palakshappa
Affiliation:
Department of Pediatrics, Wake Forest School of Medicine, Winston-Salem, USA Maya Angelou Center for Health Equity, Wake Forest School of Medicine, Winston-Salem, USA Department of Epidemiology and Prevention, Wake Forest School of Medicine, Winston-Salem, USA Center for Healthcare Innovation, Wake Forest School of Medicine, Winston-Salem, USA Department of Internal Medicine, Wake Forest School of Medicine, Winston-Salem, USA
Callie Brown
Affiliation:
Department of Pediatrics, Wake Forest School of Medicine, Winston-Salem, USA Department of Epidemiology and Prevention, Wake Forest School of Medicine, Winston-Salem, USA
Stephen M. Downs
Affiliation:
Department of Pediatrics, Wake Forest School of Medicine, Winston-Salem, USA Center for Biomedical Informatics, Wake Forest School of Medicine, Winston-Salem, USA
Laurie W. Albertini
Affiliation:
Department of Pediatrics, Wake Forest School of Medicine, Winston-Salem, USA
Alysha Taxter*
Affiliation:
Department of Pediatric Rheumatology, Nationwide Children’s Hospital, Columbus, USA Department of Clinical Informatics, Nationwide Children’s Hospital, Columbus, USA
*
Corresponding author: A. Taxter; Email: [email protected]
Rights & Permissions [Opens in a new window]

Abstract

Introduction:

More than 5 million children in the United States experience food insecurity (FI), yet little guidance exists regarding screening for FI. A prediction model of FI could be useful for healthcare systems and practices working to identify and address children with FI. Our objective was to predict FI using demographic, geographic, medical, and historic unmet health-related social needs data available within most electronic health records.

Methods:

This was a retrospective longitudinal cohort study of children evaluated in an academic pediatric primary care clinic and screened at least once for FI between January 2017 and August 2021. American Community Survey Data provided additional insight into neighborhood-level information such as home ownership and poverty level. Household FI was screened using two validated questions. Various combinations of predictor variables and modeling approaches, including logistic regression, random forest, and gradient-boosted machine, were used to build and validate prediction models.

Results:

A total of 25,214 encounters from 8521 unique patients were included, with FI present in 3820 (15%) encounters. Logistic regression with a 12-month look-back using census block group neighborhood variables showed the best performance in the test set (C-statistic 0.70, positive predictive value 0.92), had superior C-statistics to both random forest (0.65, p < 0.01) and gradient boosted machine (0.68, p = 0.01), and showed the best calibration. Results were nearly unchanged when coding missing data as a category.

Conclusions:

Although our models could predict FI, further work is needed to develop a more robust prediction model for pediatric FI.

Type
Research Article
Creative Commons
Creative Common License - CCCreative Common License - BY
This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted re-use, distribution and reproduction, provided the original article is properly cited.
Copyright
© The Author(s), 2024. Published by Cambridge University Press on behalf of Association for Clinical and Translational Science

Introduction

More than 13.5 million households and 33 million people experience food insecurity (FI), including more than 5 million children in the United States [1]. Notably, FI, or the reduced quality, variety, desirability of diet, or reduced food intake, disproportionately affects households with children under the age of 6 years, headed by Black and Hispanic persons, and headed by a single woman [1]. FI is associated with prematurity; chronic illnesses including asthma, depression, and obesity; missed routine medical care including immunizations; and higher emergency department utilization and hospitalizations [Reference Thomas, Miller and Morrissey2Reference Ortiz-Marron, Ortiz-Pinto and Urtasun Lanza4]. Additionally, FI is associated with 20% greater health care expenditures among families with FI compared to families without FI [Reference Palakshappa, Garg, Peltz, Wong, Cholera and Berkowitz5].

Given the prevalence and association with poor outcomes, an increasing number of health systems and health insurers are investing in strategies to identify and address FI to improve patient care and population health. Although many healthcare systems are screening for FI, it is not yet universal and little guidance exists regarding the optimal frequency or clinical department, so predicting FI could be useful for healthcare systems and practices that are not yet screening. Additionally, if practices are only screening at well visits, given the transient nature of FI [Reference Insolera6], FI may be missed at acute visits, especially among families that are not attending well visits. Furthermore, families may choose not to disclose FI for many reasons, including screening fatigue, shame or stigma, fear of consequences for reporting FI, or perception of need [Reference Insolera6Reference Cullen, Attridge and Fein9], so predicting who is at risk of FI and offering resources rather than screening may be useful.

Although there is increasing ability to leverage machine learning techniques using the electronic health record (EHR) to earlier identify diseases and to prevent adverse outcomes [Reference Kogan, Didden and Lee10,Reference Hobensack, Song, Scharp, Bowles and Topaz11], it is unknown how to best utilize such models to predict which patients are at highest risk of FI to offer resources and interventions. Studies show it is possible to routinely screen for FI during routine clinical care in emergency room, primary care, and subspecialty care settings [Reference Gonzalez, Hartford, Moore and Brown12,Reference Oates, Lock and Tarn13]. Given the rich data contained within EHR, including demographics, medical histories, and historic patient-reported information, including unmet health-related social needs, these data could be valuable resources to predict FI. However, such prediction models are not yet validated or integrated into the EHR to guide clinical decision support and patient care.

This study aims to develop a machine learning algorithm to predict families that will report FI using demographic, geographic, medical, and unmet health-related social needs data that are typically available within most EHRs.

Materials and methods

Setting

We conducted a retrospective longitudinal cohort study conducted at Atrium Health Wake Forest Baptist between January 2017 and June 2021. This study was approved by the Wake Forest University Health Sciences Institutional Review Board (IRB00071608).

Study population

We included all patients in one academic pediatric primary care clinic with patients who were screened at least once for FI between January 2017 and August 2021. Encounters were included if FI was screened for and documented in the EHR (N = 25,214) and were excluded if FI was not screened for or documented in the EHR (N = 3,296). The clinic is in an urban neighborhood and serves a predominantly racial and ethnic minority population of which more than 90% are covered by Medicaid insurance. The medical system is also part of a tertiary care hospital, has multiple board-certified pediatric subspecialties, a pediatric emergency department, and serves the majority of children across western North Carolina.

Measurement of FI

The clinic screens all patients for FI during routine clinical care, including well-child and acute visits. This screening occurs on paper in English or Spanish and is verbally asked through a certified interpreter if the parent/caregiver speaks another language or is unable to read. Household FI is screened using two validated questions, “In the past year, did you worry that your food would run out before you got money or Food Stamps to buy more?” and “In the past year, did the food you bought just not last and you didn’t have money to get more?” with dichotomous response options (yes or no) [Reference Hager, Quigg and Black14]. An affirmative response to either question is considered a positive screen. These questions are recommended by the American Academy of Pediatrics [15]. At the time of a clinic visit, the provider discusses the screening results with the family and enters the results into discrete data fields within the EHR. When a patient has a positive screen, the provider asks the family if they would like a bag of food from the clinic’s food pharmacy, referral to a food navigator, connection with federal nutrition programs, and/or a list of community resources.

Demographics

Patient’s age, sex, race, and ethnicity at the time of the visit were extracted from the EHR. Race and ethnicity, which were self-reported by parents/caregivers and recorded in the EHR, were combined into one variable and categorized as non-Hispanic White, non-Hispanic Black, Hispanic, or other (includes American Indian, Alaska Native, Asian, Native Hawaiian, Other Pacific Islander, Other, Patient Refused, Unknown).

Other Unmet Health-Related Social Needs Screening

As part of routine clinical care, the clinic also screens for other unmet health-related social needs over the prior 12 months including intimate partner violence, risk of homelessness, transportation difficulties, legal needs, caregiver coping concerns, and caregiver substance use. These needs are screened alongside FI. All screening results and resources provided are documented within the EHR (Supplementary Table 1).

Chronic conditions and healthcare Utilization

All problem lists and encounter diagnoses from visits occurring at the primary care clinic and any department across the medical center since 2012, when our EHR was adopted, were included. Diagnoses data included in the problem list or encounter diagnoses based on the respective International Classification of Diseases 10th Revision (ICD-10), were extracted, including attention-deficit hyperactivity disorder (ADHD), failure to gain weight, hypertension, asthma, eczema, prematurity, depression, and anxiety. Emergency department visits and hospitalizations were also extracted.

Anthropometrics

Children’s height and weight are recorded as standard of care at each clinic visit, and body mass index (BMI) percentiles were calculated from the encounter at which FI was measured. Underweight was defined as BMI of <5th percentile. Healthy weight was defined as BMI between ≥ 5 to<85th percentile. Overweight and obesity were defined as BMI of ≥ 85 to <95th, and ≥ 95th percentile, respectively.

Neighborhood information

We used the US Census Bureau’s 2010 American Community Survey (ACS) 5-year estimates to identify neighborhood information. The ACS includes neighborhood-level information at both the zip code level and the census block level including income, poverty, employment, and home ownership estimates [16]. Zip code predictors included total population count, number of housing units, median age, percentage 25 and older who had a high school education or less, median income, percentage of households with incomes under 100 percent of the federal poverty level in last 12 months, percentage of households where the person who lives in the household owns the house, percentage 16 years of age and older who work and commute 60 minutes or more to work in an area, percentage 16 years of age and older who are unemployed, percent minority, percentage of the area considered to be urban, and percentage of the area considered to be rural. Census block predictors included the same variables as zip code but measured at the smaller geographic level of census block. The home address of all patients in the health system is automatically geocoded by an automated system through the Wake Forest Clinical and Translational Science Institute’s Translational Data Warehouse. We merged geocode data from the EHR with zip code level and census block level data from the ACS.

Statistical analysis

The analysis cohort was created by merging the above data sources (demographics, anthropometrics, social needs, chronic conditions, and neighborhood information) by encounter number. The analysis cohort was randomly split at the patient level into 80% training set and 20% testing/validation set. Each patient could contribute more than one record to the analysis cohort. Various combinations of predictor variables and modeling approaches were used to train and validate prediction models. Demographic predictor variables included age in years, female sex, Black race, Hispanic ethnicity, other race and ethnicity (defined as non-black, non-Hispanic, and/or nonwhite), and BMI percentile. Twelve-month “look back” indicator variables included previous FI, intimate partner violence, risk of homelessness, transportation difficulties, legal needs, caregiver coping concerns, caregiver substance use, overweight/obesity, ADHD, failure to gain weight, hypertension, asthma, eczema, prematurity, depression, anxiety, hospitalizations, and emergency department visits. The look-back variables were initially coded as absent and were recoded as present if they appeared within the look-back window. The look-back window was expanded for the same set of variables to 18 and 24 months. Six different iterations of each logistic regression, random forest, and gradient-boosted model, respectively, were developed. The modeling approaches were defined by ACS census block variables with a 12, 18, or 24-month clinical data look-back period, or ACS zip code variables using 12, 18, or 24-month clinical data look-back period. All models included demographics.

Modeling approaches included logistic regression, random forest, and gradient-boosted machine. Default values of hyperparameters were used for random forest (500 trees, number of variables to split at each node equal to the rounded down square root of the number of predictors, minimum node size of 1) and gradient boosted machine (learning rate of 0.3, maximum tree depth of 6) [Reference Breiman17Reference Chen, Benesty and Khotilovich20]. No variable selection was performed, as all variables considered are readily available in the EHR. All models were built on the training set using 10-fold cross-validation to estimate out-of-sample performance for the metrics of accuracy, area under the curve (C-statistic), precision-recall area under the curve, sensitivity, specificity, positive predictive value (PPV), and negative predictive value (NPV). In the 10-fold cross-validation, each patient’s observations were uniquely mapped to only one of the folds to prevent overly optimistic estimates of performance. Final models from the training set were then evaluated on the test set. Test set evaluations included measures of discrimination and calibration. Model discrimination was measured using the C-statistic and compared between models using DeLong’s test [Reference DeLong, DeLong and Clarke-Pearson21,Reference Robin, Turck and Hainard22]. Model calibration was measured using the Hosmer–Lemeshow slope statistic [Reference Hosmer and Lemesbow23]. The Hosmer–Lemeshow statistic breaks the data into deciles of predicted FI and examines the relationship by decile between average predicted FI and observed rate of FI. A slope of 1 for a line connecting the deciles indicates a well-calibrated model.

Missing data were addressed in two ways. First, a single imputation was employed to fill in all missing data, using multiple imputations by chained equations [Reference van Buuren and Groothuis-Oudshoorn24]. As a sensitivity analysis, missing data were coded as a category [Reference Josse, Scornet and Varoquaux25]. Continuous variables were summarized using tertiles, with a fourth level added to reflect missing data. Categorical variables included an extra level to reflect missing, e.g., sex was coded as male, female, or missing. Additional sensitivity analyses included using only data from before the onset of the COVID-19 pandemic (visits occurring before 3/11/2020) and removing previous FI as a predictor from the model.

All analyses were conducted using R version 4.3.2 [26]. Machine learning prediction models were trained and validated using the R package “mlr3” [Reference Lang, Binder and Richter27]. All models were built and described using the principles outlined in the Transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD) statement [Reference Collins, Reitsma, Altman and Moons28] (Supplementary Table 2).

Results

A total of 25,214 encounters from 8521 unique patients collected from 1/3/2017 to 8/2/2021 met inclusion criteria for this study. The number of encounters per patient ranged from 1–12 with a median of 2 (IQR: 1, 4). Patients were partitioned into 80% training (N = 6816 patients with 20,171 encounters) and 20% test/validation (N = 1705 patients with 5043 encounters). The median age was 3.17 years (IQR: 0.75, 9.17), and the sample was 52% male, 22.4% Black, and 62.0% Hispanic. The median BMI percentile was 76 (IQR 47.15, 93.77), and 26.2% of the sample was under 100% of the federal poverty level in last 12 months. Training and test sets showed similar demographics (Tables 13). Overall, 3820 (15.2%) encounters reported FI, with similar proportions of FI in the training (N = 3054, 15.1%) and test (N = 766, 15.4%) sets.

Table 1. Demographic predictor variables in model training and test data sets

Legend: N (%) or mean (SD) are reported. BMI = body mass index. HS = high school. Mins = minutes. SD = standard deviation. ZIP = zip code. yo = year old.

Table 2. Social determinants of health predictor variables in model training and test data sets

Legend: FI = food insecurity.

Table 3. ICD-10 predictor variables in model training and test data sets

Legend: ADHD = attention deficit and hyperactivity disorder.

Logistic regression with a 12-month look-back using census block group neighborhood variables showed the best performance in the training set (Table 4). This approach had a C-statistic of 0.68 and an accuracy of 0.85. The most important variables in this model (Supplementary Table 3) included previous FI (odds ratio 4.09; p-value < .0001), domestic violence (odds ratio 0.24; p-value < .0001), failure to gain weight (odds ratio 1.43; p-value 0.0007), and prematurity (odds ratio 1.24; p-value 0.0264). BMI percentile, age, and income had the highest variable importance in the random forest model, whereas domestic violence, previous FI, and age had the highest variable importance values in the gradient boosting model.

Table 4. Results from 10-fold cross-validation on training set, by geographic level (census block vs. zip code) and look-back time (12, 18, and 24 months)

Legend: AUC = area under the cure. PR-AUC = precision recall- area under the curve. PPV = positive predictive value. NPV = negative predictive value. LR = logistic regression. RF = random forest. GBM = gradient boosted model.

Logistic regression with 12-month lookback using census block group neighborhood variables performed similarly in the test set (C-statistic 0.70 and accuracy 0.84) and had superior C-statistics to both random forest (0.65, p = 0.01) and gradient boosted machine (0.68, p < 0.01) (Table 5, Figure 1). Furthermore, this approach showed the best calibration (Figure 2), with a predicted versus true intercept of 0 and 95% CI (-0.02, 0.03) that includes 0, reflecting good calibration, and slope of 1.03 and 95% CI (0.88, 1.18) that includes 1, also reflecting good calibration. Sensitivity and specificity were maximized when predicting FI for all encounters with probability of 0.13 or higher (Supplementary Tables 4 and 5). Notably, at the optimal cutpoint of 0.13, PPV was equal to 0.92.

Figure 1. Discrimination statistics (C-statistics) for logistic regression, random forest, and gradient boosted models in test set to predict food insecurity. Legend: C-stat = C-statistic.

Figure 2. Calibration statistics for logistic regression, random forest, and gradient boosted model in test set to predict food insecurity. Legend: FI = food insecurity.

Table 5. Model performance in test set. Predictors chosen are those at the census block level and a 12-month look-back, the best-performing set in Table 3

Legend: AUC = area under the cure. PR-AUC = precision recall- area under the curve. PPV = positive predictive value. NPV = negative predictive value. LR = logistic regression. RF = random forest. GBM = gradient boosted model.

Model Performance was similar when treating missing data as a category rather than imputing (Supplementary Tables 6 and 7). With missing data coded as a category, the logistic regression with a 24-month look back using census block group posted a C-statistic of 0.70 and an accuracy of 0.85, showing nearly identical performance to handling missing data using a single imputation.

Two additional sensitivity analyses were performed. First, only data from before the onset of the COVID-19 pandemic (before 3/11/2020) were included in model training and testing. In this analysis, there were 14,712 encounters from 6180 patients in the training set and 3573 encounters from 1538 patients in the test set. Again using 12-month lookback data with census block group neighborhood variables, logistic regression had a C-statistic of 0.69 in the test set, indicating similar performance to the model built on the full data set. A second sensitivity analysis removed previous FI as a predictor from the model. Doing so had a large impact on the predictive ability of the model, reducing the C-statistic in the test set to 0.63.

Discussion

This is one of the first studies to utilize machine learning to predict pediatric FI. Our study shows logistic regression modeling with a 12-month look-back using census block group neighborhood variables, demographics, unmet health-related social needs, and clinical data from the EHR showed the best performance in both the training and validation sets to predict FI with a C-statistic of 0.70. Interestingly, when missing data were encoded as a categorical variable rather than imputed, the model exhibited similar performance, with a C-statistic of 0.70.

Notably, despite evaluating more than 80 variables, leveraging historical longitudinal data, routine health-related social screening at well and acute visits, and including a 12-month look-back period, our most robust model did not perform as well as expected with a C-statistic of only 0.70. This is particularly noteworthy in a pediatric population, when we first start evaluating patients at birth, and provide routine well-child visits every 2–3 months for the first two years. It is possible that prediction modeling of FI in families that have infants and toddlers is less accurate. Similarly, given that our clinic serves a racial and ethnic minority population, it is also possible that families in our clinic under-report FI due to fear of consequences, or perception of need [Reference Insolera6Reference Cullen, Attridge and Fein9]. Comparable clinics have a wide positive FI screening rates ranging from 10% to more than 70% [Reference Smith, Malinak and Chang29Reference Hendrickson, O’Riordan, Arpilleda and Heneghan31]. This study also highlights that despite advanced modeling, there are unmeasured variables, and screening for FI remains important given the lack of effective predictive models. This is in accordance with recent guidance from regulatory agencies for screening health-related social needs, including FI, to reduce disparities [3234].

Given our model’s performance, it is also possible that expanded FI screening to other departments outside of our clinic may create a more robust model. Patients and families with unmet health-related social needs seek acute medical care more frequently than those without unmet needs [Reference Rigdon, Montez and Palakshappa35,Reference Li, Gogia and Tatem36], so it is possible families are not seeing their primary care provider, which was necessary to be included in our study. Likewise, our clinic has access to federal and community-based nutrition supports, such as the Special Supplemental Nutrition Program for Women, Infants and Children, Supplemental Nutrition Assistance Programs, food pantries, and other hunger relief programs embedded within our clinic and clinical workflows, it is possible that some FI needs were being partially addressed. Similarly, although there are brief, validated surveys for FI that could be used universally, this simply adds to the large amount of screening recommended but not being done, including depression, anxiety, homelessness, domestic violence, etc. Rather than add one more screen that may supplant another, we propose targeting the highest-risk patients to address FI using a statistical model that runs in the background and does not require doctor-patient interaction unless patients meet a threshold of likely FI.

Strengths of this study include that it leverages the EHR and community data, which are rich data sources containing demographic, unmet health-related social needs, and health information. This study is an example of the real-world applicability of integrating social determinants of health and the electronic health record to enable predictive models for social needs screening and intervention. We also had a large sample size in an urban pediatric health center, which is an ideal location to screen for FI and assist with addressing unmet health-related social needs.

A limitation of the study is that we did not address potential correlation between multiple encounters within the same person. Potential approaches could include longitudinal modeling [Reference Liang and Zeger37,Reference Fitzmaurice, Laird and Ware38] or creating a number of encounter variables and including them in the model [Reference McGee, Haneuse, Coull, Weisskopf and Rotem39]. Similarly, other models to predict FI include variables on education and school performance, patient substance use, and family relationships; we attempted to address these by including parental substance use screening but did not have access to school or family data outside of the ACS data. It is also possible ACS survey has nonresponse bias; however, the US Bureau takes steps in sampling and weights to account for nonresponse bias and to improve the quality of data. Our data are retrospective; there could be selection bias in who was screened for FI, but we standardized clinic procedures to collect these data [40]. Lastly, FI is commonly measured at the family level; however, we were unable to accurately identify sibling and family linkages, and whether or not children were split between households at different encounters.

Conclusion

This is one of the first pediatric studies leveraging machine learning to predict FI. Although such models can be integrated into the EHR to predict FI, guide clinical decision support, and streamline workflows to address FI in a pediatric population, further work is needed to develop a more robust prediction model. Future considerations include more widespread integration of FI screening to assist with model development.

Supplementary material

The supplementary material for this article can be found at https://doi.org/10.1017/cts.2024.645.

Author contributions

JR conducted the statistical analysis and drafted the manuscript. KM conceptualized and designed the study. DP conceptualized and designed the study. CB conceptualized and designed the study. SMD conceptualized and designed the study. LWA conceptualized and designed the study. AT conceptualized and designed the study, drafted the manuscript, and takes responsibility for the manuscript as a whole.

Funding statement

The authors gratefully acknowledge use of the services and facilities of the Center for Biomedical Informatics, funded by the National Center for Advancing Translational Sciences, National Institutes of Health, through Grant Award Number UL1TR001420.

Dr Brown is supported by a grant from the National Institute for Child Health (grant K23HD099249). Dr Palakshappa is supported by the National Heart, Lung, and Blood Institute of the National Institutes of Health under Award Number K23HL146902. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health. The funding organization has no role in the design and conduct of the study; collection, management, analysis, and interpretation of the data; preparation, review, or approval of the manuscript; and decision to submit the manuscript for publication.

Competing interests

The authors do not have any conflicts of interest to disclose.

References

United States Department of Agriculture. Food Insecurity in the U.S. October 17, 2022, Accessed Feburary 8, 2023, https://www.ers.usda.gov/topics/food-nutrition-assistance/food-security-in-the-u-s/key-statistics-graphics/.Google Scholar
Thomas, MMC, Miller, DP, Morrissey, TW. Food insecurity and child health. Pediatrics. 2019;144(4):e20190397.Google Scholar
Sandoval, VS, Jackson, A, Saleeby, E, Smith, L, Schickedanz, A. Associations between prenatal food insecurity and prematurity, pediatric health care utilization, and postnatal social needs. Acad Pediatr. 2021;21(3):455461.Google Scholar
Ortiz-Marron, H, Ortiz-Pinto, MA, Urtasun Lanza, M, et al. Household food insecurity and its association with overweight and obesity in children aged 2 to 14 years. BMC Public Health. 2022;22(1):1930.Google Scholar
Palakshappa, D, Garg, A, Peltz, A, Wong, CA, Cholera, R, Berkowitz, SA. Food insecurity was associated with greater family health care expenditures in the US. 2016-17. Health Aff (Millwood). 2023;42(1):4452.Google Scholar
Insolera, N. Chronic Food Insecurity in US Families With Children. JAMA Pediatr;2023.Google Scholar
Bernal, J, Frongillo, EA, Jaffe, K. Food insecurity of children and shame of others knowing they are without food. J Hunger Environ Nut. 2016;11(2):180194.Google Scholar
Palakshappa, D, Doupnik, S, Vasan, A, et al. Suburban families’ experience with food insecurity screening in primary care practices. Pediatrics. 2017;140(1):e20170320.Google Scholar
Cullen, D, Attridge, M, Fein, JA. Food for thought: a qualitative evaluation of caregiver preferences for food insecurity screening and resource referral. Acad Pediatr. 2020;20(8):11571162.Google Scholar
Kogan, E, Didden, EM, Lee, E, et al. A machine learning approach to identifying patients with pulmonary hypertension using real-world electronic health records. Int J Cardiol. 2023;374:9599.Google Scholar
Hobensack, M, Song, J, Scharp, D, Bowles, KH, Topaz, M. Machine learning applied to electronic health record data in home healthcare: a scoping review. Int J Med Inform. 2023;170:104978.Google Scholar
Gonzalez, JV, Hartford, EA, Moore, J, Brown, JC. Food insecurity in a pediatric emergency department and the feasibility of universal screening. West J Emerg Med. 2021;22(6):12951300.Google Scholar
Oates, GR, Lock, L, Tarn, V, et al. Electronic screening for unmet social needs in a pediatric pulmonary clinic: acceptability and associations with health outcomes. Pediatr Pulmonol. 2023;58(5):14441453.Google Scholar
Hager, ER, Quigg, AM, Black, MM, et al. Development and validity of a 2-item screen to identify families at risk for food insecurity. Pediatrics. 2010;126(1):e2632.Google Scholar
Council On Community P, Committee On N. Promoting food security for all children. Pediatrics. 2015;136(5):e14311438.Google Scholar
United States Census Bureau. American Community Survey (ACS). March 16, 2023, https://www.census.gov/programs-surveys/acs, Accessed March 19, 2023.Google Scholar
Breiman, L. Random forests. Machine Learning. 2001;45(1):532.Google Scholar
Wright, MN, Ziegler, A. Ranger: a fast implementation of random forests for high dimensional data in C++ and R. J Stat Softw. 2017;77(1):117.Google Scholar
Chen, T, Guestrin, C. XGBoost: A scalable tree boosting system. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2016.Google Scholar
Chen, T, Benesty, M, Khotilovich, M et al. xgboost: Extreme gradient boosting, March 30, 2023, https://CRAN.R-project.org/package=xgboost, Accessed Feburary 8, 2023.Google Scholar
DeLong, ER, DeLong, DM, Clarke-Pearson, DL. Comparing the areas under two or more correlated receiver operating characteristic curves: a nonparametric approach. Biometrics. 1988;44(3):837845.Google Scholar
Robin, X, Turck, N, Hainard, A, et al.pROC: an open-source package for R and S+ to analyze and compare ROC curves. BMC Bioinformatics. 2011;12(1):77.Google Scholar
Hosmer, DW, Lemesbow, S. Goodness of fit tests for the multiple logistic regression model. Commun Stat - Theory and Methods. 1980;9(10):10431069.Google Scholar
van Buuren, S, Groothuis-Oudshoorn, K. Mice: multivariate imputation by chained equations in R. J Stat Softw. 2011;45(3):167.Google Scholar
Josse, JPN, Scornet, E, Varoquaux, G. On the consistency of supervised learning with missing values, July 3, 2020, Accessed April 19, 2023, https://arxiv.org/abs/1902.06931.Google Scholar
The R Foundation. R project for statistical computing. Published 2021, Accessed Feburary 8, 2023, https://www.R-project.org/.Google Scholar
Lang, M, Binder, M, Richter, J, et al. mlr3: a modern object-oriented machine learning framework in R. J Open Source Soft. 2019;1903:4.Google Scholar
Collins, GS, Reitsma, JB, Altman, DG, Moons, KG. Transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD): the TRIPOD statement. BMC Med. 2015;13(1): 1.Google Scholar
Smith, S, Malinak, D, Chang, J, et al. Implementation of a food insecurity screening and referral program in student-run free clinics in San Diego, California. Prev Med Rep. 2017;5:134139.Google Scholar
Burkhardt, MC, Beck, AF, Conway, PH, Kahn, RS, Klein, MD. Enhancing accurate identification of food insecurity using quality-improvement techniques. Pediatrics. 2012;129(2):e504510.Google Scholar
Hendrickson, MA, O’Riordan, MA, Arpilleda, JC, Heneghan, AM. Effects of food insecurity on asthma outcomes in the pediatric emergency department. Pediatr Emerg Care. 2010;26(11):823829.Google Scholar
Services DoHaH. SMD #: 23-001 Additional Guidance on Use of In Lieu of Services and Settings in Medicaid Managed Care. January 4, 2023. Accessed April 19, 2023. https://www.hhs.gov/guidance/document/additional-guidance-use-lieu-services-and-settings-medicaid-managed-care-smd-23-001.Google Scholar
A National Committee for Quality Assurance. Social Need: New HEDIS Measure Uses Electronic Data to Look at Screening, Intervention. November 2, 2022. Accessed April 19, 2023. https://www.ncqa.org/blog/social-need-new-hedis-measure-uses-electronic-data-to-look-at-screening-intervention/.Google Scholar
The Joint Commission. R3 Report- New Requirements to Reduce Health Care Disparities. June 20, 2022. Accessed April 19, 2023. https://www.jointcommission.org/-/media/tjc/documents/standards/r3-reports/r3_disparities_july2022-6-20-2022.pdf.Google Scholar
Rigdon, J, Montez, K, Palakshappa, D, et al. Social risk factors influence pediatric emergency department utilization and hospitalizations. J Pediatr. 2022;249:3542 e34.Google Scholar
Li, Z, Gogia, S, Tatem, KS, et al. Developing a model to predict high health care utilization among patients in a new York City safety net system. Med Care. 2023;61(2):102108.Google Scholar
Liang, K-Y, Zeger, Sl. Longitudinal data analysis using generalized linear models. Biometrika. 1986;73(1):1322.Google Scholar
Fitzmaurice, G, Laird, N, Ware, J. Applied Longitudinal Analysis. John Wiley & Sons; 2012.Google Scholar
McGee, G, Haneuse, S, Coull, BA, Weisskopf, MG, Rotem, RS. On the nature of informative presence bias in analyses of electronic health records. Epidemiology. 2022;33(1):105113.Google Scholar
United States Census Bureau. American Community Survey and Puerto Rico Community Survey Design and Methodology. November 2022. Accessed March 23, 2024. https://www2.census.gov/programs-surveys/acs/methodology/design_and_methodology/2022/acs_design_methodology_report_2022.pdf.Google Scholar
Figure 0

Table 1. Demographic predictor variables in model training and test data sets

Figure 1

Table 2. Social determinants of health predictor variables in model training and test data sets

Figure 2

Table 3. ICD-10 predictor variables in model training and test data sets

Figure 3

Table 4. Results from 10-fold cross-validation on training set, by geographic level (census block vs. zip code) and look-back time (12, 18, and 24 months)

Figure 4

Figure 1. Discrimination statistics (C-statistics) for logistic regression, random forest, and gradient boosted models in test set to predict food insecurity. Legend: C-stat = C-statistic.

Figure 5

Figure 2. Calibration statistics for logistic regression, random forest, and gradient boosted model in test set to predict food insecurity. Legend: FI = food insecurity.

Figure 6

Table 5. Model performance in test set. Predictors chosen are those at the census block level and a 12-month look-back, the best-performing set in Table 3

Supplementary material: File

Rigdon et al. supplementary material

Rigdon et al. supplementary material
Download Rigdon et al. supplementary material(File)
File 35.8 KB