Hostname: page-component-586b7cd67f-t7fkt Total loading time: 0 Render date: 2024-11-24T15:50:01.576Z Has data issue: false hasContentIssue false

Prospective evaluation of data-driven models to predict daily risk of Clostridioides difficile infection at 2 large academic health centers

Published online by Cambridge University Press:  19 September 2022

Meghana Kamineni*
Affiliation:
Electrical Engineering and Computer Science Department, Massachusetts Institute of Technology, Cambridge, Massachusetts
Erkin Ötleş
Affiliation:
Medical Scientist Training Program, University of Michigan Medical School, Ann Arbor, Michigan Department of Industrial and Operations Engineering, University of Michigan College of Engineering, Ann Arbor, Michigan
Jeeheh Oh
Affiliation:
Division of Computer Science and Engineering, University of Michigan College of Engineering, Ann Arbor, Michigan
Krishna Rao
Affiliation:
Department of Internal Medicine, Division of Infectious Diseases, University of Michigan Medical School, Ann Arbor, Michigan
Vincent B. Young
Affiliation:
Department of Internal Medicine, Division of Infectious Diseases, University of Michigan Medical School, Ann Arbor, Michigan
Benjamin Y. Li
Affiliation:
Medical Scientist Training Program, University of Michigan Medical School, Ann Arbor, Michigan Division of Computer Science and Engineering, University of Michigan College of Engineering, Ann Arbor, Michigan
Lauren R. West
Affiliation:
Infection Control Unit, Massachusetts General Hospital, Boston, Massachusetts
David C. Hooper
Affiliation:
Infection Control Unit, Massachusetts General Hospital, Boston, Massachusetts Division of Infectious Diseases, Massachusetts General Hospital, Boston, Massachusetts Harvard Medical School, Boston, Massachusetts
Erica S. Shenoy
Affiliation:
Infection Control Unit, Massachusetts General Hospital, Boston, Massachusetts Division of Infectious Diseases, Massachusetts General Hospital, Boston, Massachusetts Harvard Medical School, Boston, Massachusetts
John G. Guttag
Affiliation:
Electrical Engineering and Computer Science Department, Massachusetts Institute of Technology, Cambridge, Massachusetts
Jenna Wiens
Affiliation:
Division of Computer Science and Engineering, University of Michigan College of Engineering, Ann Arbor, Michigan
Maggie Makar
Affiliation:
Electrical Engineering and Computer Science Department, Massachusetts Institute of Technology, Cambridge, Massachusetts Division of Computer Science and Engineering, University of Michigan College of Engineering, Ann Arbor, Michigan
*
Author for correspondence: Meghana Kamineni, Massachusetts Institute of Technology, 77 Massachusetts Ave, Cambridge, MA 02142. E-mail: [email protected]
Rights & Permissions [Opens in a new window]

Abstract

Many data-driven patient risk stratification models have not been evaluated prospectively. We performed and compared the prospective and retrospective evaluations of 2 Clostridioides difficile infection (CDI) risk-prediction models at 2 large academic health centers, and we discuss the models’ robustness to data-set shifts.

Type
Concise Communication
Copyright
© The Author(s), 2022. Published by Cambridge University Press on behalf of The Society for Healthcare Epidemiology of America

Many data-driven risk prediction models offering the promise of improved patient outcomes have been evaluated retrospectively, but few have been evaluated prospectively. Reference Kelly, Karthikesalingam and Suleyman1Reference Nagendran, Chen and Lovejoy4 Models that are not evaluated prospectively are susceptible to degraded performance because of data-set shifts. Reference Finlayson, Subbaswamy and Singh5 Shifts in data can arise from changes in patient populations, hospital procedures, care delivery approaches, epidemiology, and information technology (IT) infrastructure. Reference Brajer, Cozzi and Gao2,Reference Menon, Perry and Motyka6

In this work, we prospectively evaluated a data-driven approach for Clostridioides difficile infection (CDI) risk prediction that had previously been shown to achieve high performance in retrospective evaluations at 2 large academic health centers. Reference Nagendran, Chen and Lovejoy4 This approach models the likelihood of acquiring CDI as a function of patient characteristics. However, this evaluation occurred on retrospective data, and prospective validation is necessary because other models that have not been prospectively evaluated often performed worse when deployed. Reference Wong, Ötleş and Donnelly7 Risk predictions can guide clinical interventions, including antibiotic de-escalation and duration, β-lactam allergy evaluation, and isolation. Reference Dubberke, Carling and Carrico8

Using this approach, we trained models for both institutions on initial retrospective cohorts and performed evaluations on retrospective and prospective cohorts. We compared the prospective performance of these models to their retrospective evaluations to determine their robustness with respect to data-set shifts. By showcasing the robustness of this approach, we provide support for utilizing this approach in clinical workflows.

Methods

This study included retrospective and prospective periods for adult inpatient admissions to Massachusetts General Hospital (MGH) and Michigan Medicine (MM). As previously described, Reference Oh, Makar and Fusco9 patient demographics, admission details, patient history, daily hospitalization information, and exposure and susceptibility to the pathogen (eg, antibiotic therapy) were extracted from the electronic health record (EHR) of each institution and were preprocessed. To consider hospital-onset CDI, we excluded patients who tested positive in the first 2 calendar days of their admission, stayed <3 days, or tested positive in the 14 days before admission. Testing protocols are described in the supplement. A data-driven model to predict risk of hospital-onset CDI was developed for each institution. Each model was based on regularized logistic regression and included 799 and 8,070 variables at MGH and MM, respectively. More aggressive feature selection was applied at MGH to prioritize computational efficiency. Reference Oh, Makar and Fusco9 For the retrospective evaluation, data were extracted from May 5, 2019, to October 31, 2019, at MGH and from July 1, 2019, to June 30, 2020, at MM. For the prospective evaluation, we generated daily extracts of information for all adult inpatients from May 5, 2021, to October 31, 2021, at MGH and from July 1, 2020, to June 30, 2021, at MM, keeping the months consistent across validation periods. We used different periods at the 2 institutions because of differences in data availability.

When applied to retrospective and prospective data at each institution, the models generated a daily risk score for each patient. We evaluated the discriminative performance of each model at the encounter level using the area under the receiver operator characteristic curve (AUROC). Using thresholds based on the 95th percentile of the retrospective training cohort, we measured the sensitivity, specificity, and positive predictive value (PPV) for each model. 95% confidence intervals were computed using 1,000 Monte-Carlo case-resampled bootstraps. We compared the models’ retrospective and prospective performances to understand the impact of any shifts in the data set.

This study was approved by the institutional review boards of both participating sites (University of Michigan, Michigan Medicine nos. HUM00147185 and HUM00100254 and Mass General Brigham no. 2012P002359) with waivers of informed consent.

Results

After applying exclusion criteria, the final retrospective cohort included 18,030 admissions (138 CDI cases) at MGH and 25,341 admissions (158 CDI cases) at MM. The prospective cohort included 13,712 admissions (119 CDI cases) at MGH and 26,864 admissions (190 CDI cases) at MM. The demographic characteristics of the study populations are provided (Supplementary Table 1 online).

At MGH, the model achieved AUROCs of 0.744 (95% confidence interval [CI], 0.707–0.781) in the retrospective cohort and 0.748 (95% CI, 0.707–0.791) in the prospective cohort. At MM, the model achieved AUROCs of 0.778 (95% CI, 0.744–0.814) in the retrospective cohort and 0.767 (95% CI, 0.737–0.801) in the prospective cohort. The AUROCs for predicting CDI risk on both retrospective and prospective cohorts were similar each month and did not exhibit significant monthly variation throughout either evaluation period (Fig. 1). At MGH, the classifiers’ sensitivity, specificity, and PPV were 0.138, 0.951, and 0.021 on the retrospective data and 0.210, 0.949, and 0.035 on the prospective data. At MM, the classifiers’ sensitivity, specificity, and PPV were 0.215, 0.964, and 0.036 on the retrospective data and 0.189, 0.950, and 0.026 on the prospective data (Fig. 2).

Fig. 1. Area under the receiver operator characteristic curve (AUROC) at Massachusetts General Hospital (MGH) and Michigan Medicine (MM) in retrospective and prospective evaluations. The figures on the left show a comparison of AUROC in retrospective and prospective evaluations MGH (upper) and MM (lower). The 95% confidence intervals (CI) for the AUROC are shaded. The figures on the right show a monthly AUROC comparison at MGH (upper) and MM (lower). The 95% CI for the AUROC are represented by error bars.

Fig. 2. Confusion matrices at Massachusetts General Hospital (MGH) and Michigan Medicine (MM) in retrospective and prospective evaluations. The figures on the left display confusion matrices, sensitivity, specificity, and positive predictive values for retrospective evaluations at MGH (upper) and MM (lower). The figures on the right display the same metrics for prospective evaluations at MGH (upper) and MM (lower).

Discussion

We evaluated 2 data-driven institution-specific CDI risk prediction models on prospective cohorts, demonstrating how the models would perform if applied in real-time; that is, how the models would perform generating daily risk predictions for adult inpatients if they were implemented with daily data extracts. The models at both MGH and MM were robust to shifts in the data set. Notably, the prospective cohorts included patients admitted during the coronavirus disease 2019 (COVID-19) pandemic, whereas the retrospective cohorts did not. Surges in hospital admissions and staff shortages throughout the pandemic affected patient populations and hospital procedures related to infection control. The consistent performance of the models during the COVID-19 pandemic increases confidence that the models are likely to perform well when integrated into clinical workflows. Clinicians can utilize risk predictions to guide interventions, such as isolation and modifying antibiotic administration, and limited resources must be allotted among patients most at risk. Reference Dubberke, Carling and Carrico8 These models should be applied to patients meeting the inclusion criteria, and application to a broader cohort may affect the results.

Because implementing this methodology requires significant IT support, initial deployment is likely to occur through larger hospitals or EHR vendors, a common approach for risk-prediction models. Reference Wong, Ötleş and Donnelly7 Although the methodology is complex, it is handled by the software developers. The interface with clinicians can be quite simple; the end user only receives a prediction for each patient.

The PPV was calculated using a threshold based on the 95th percentile of retrospective cohorts. The PPV is between 2.625 and 6 times higher than the pre-test probability, an appropriate level for some interventions, such as β-lactam allergy evaluations. For interventions requiring higher PPVs, higher thresholds should be used.

Despite the importance of evaluating models prior to deployment, models are rarely prospectively or externally validated. Reference Kelly, Karthikesalingam and Suleyman1Reference Nagendran, Chen and Lovejoy4 Other prior retrospective external validation attempts of models for incident CDI did not replicate the original performance. Reference Perry, Shirley and Micic10 When performed, prospective and external validation can highlight model shortcomings before integration into clinical workflows. For instance, an external retrospective validation of a widely utilized sepsis prediction model showed that the computed scores at a new institution differed significantly from the model developer’s reported validation performance. Reference Wong, Ötleş and Donnelly7 This model was not tailored to specific institutions, but such discrepancies may still arise with institution-specific models. Especially, when there are many covariates, models can overfit to training data and are therefore susceptible to shifts in the data set. In our case, the differences between retrospective and prospective performances of both models in terms of AUROC were small with large overlapping confidence intervals.

Although the successful prospective performance of 2 institution-specific CDI risk prediction models is encouraging, it does not guarantee that the models will perform well in the face of future shifts in the data set. Epidemiology, hospital populations, workflows, and IT infrastructure are constantly changing; thus, deployed models should be carefully monitored for performance over time. Reference Ötleş, Oh and Li11

Supplementary material

To view supplementary material for this article, please visit https://doi.org/10.1017/ice.2022.218

Acknowledgments

The authors thank Noah Feder, BA, for assistance with manuscript preparation and administrative support.

Financial support

This study was funded by Quanta as well as grants from the National Institutes of Health (grant nos. T32GM007863 to E.Ö. and AI124255 to V.B.Y., K.R. and J.W.)

Conflicts of interest

E.Ö. reports a patent pending for the University of Michigan for an artificial intelligence-based approach for the dynamic prediction of health states for patients with occupational injuries. Dr Rao is supported in part from an investigator-initiated grant from Merck; he has consulted for Bio-K+ International, Roche Molecular Systems, Seres Therapeutics, and Summit Therapeutics.

References

Kelly, CJ, Karthikesalingam, A, Suleyman, M, et al. Key challenges for delivering clinical impact with artificial intelligence. BMC Med 2019;17:195.CrossRefGoogle ScholarPubMed
Brajer, N, Cozzi, B, Gao, M, et al. Prospective and external evaluation of a machine learning model to predict in-hospital mortality of adults at time of admission. JAMA Netw Open 2020;3:e1920733.CrossRefGoogle ScholarPubMed
Fleuren, LM, Klausch, TLT, Zwager, CL, et al. Machine learning for the prediction of sepsis: a systematic review and meta-analysis of diagnostic test accuracy. Intensive Care Med 2020;46:383400.CrossRefGoogle ScholarPubMed
Nagendran, M, Chen, Y, Lovejoy, CA, et al. Artificial intelligence versus clinicians: systematic review of design, reporting standards, and claims of deep learning studies. BMJ 2020;368:m689.CrossRefGoogle ScholarPubMed
Finlayson, SG, Subbaswamy, A, Singh, K, et al. The clinician and dataset shift in artificial intelligence. N Engl J Med 2021;385:283286.CrossRefGoogle ScholarPubMed
Menon, A, Perry, DA, Motyka, J, et al. Changes in the association between diagnostic testing method, polymerase chain reaction ribotype, and clinical outcomes from Clostridioides difficile infection: one institution’s experience. Clin Infect Dis 2021;73:e2883e2889.CrossRefGoogle ScholarPubMed
Wong, A, Ötleş, E, Donnelly, JP, et al. External validation of a widely implemented proprietary sepsis prediction model in hospitalized patients. JAMA Intern Med 2021;181:10651070.CrossRefGoogle ScholarPubMed
Dubberke, ER, Carling, P, Carrico, R, et al. Strategies to prevent Clostridium difficile infections in acute-care hospitals: 2014 update. Infect Control Hosp Epidemiol 2014;35:628645.CrossRefGoogle ScholarPubMed
Oh, J, Makar, M, Fusco, C, et al. A generalizable, data-driven approach to predict daily risk of Clostridium difficile infection at two large academic health centers. Infect Control Hosp Epidemiol 2018;39:425433.CrossRefGoogle ScholarPubMed
Perry, DA, Shirley, D, Micic, D, et al. External validation and comparison of Clostridioides difficile severity scoring systems. Clin Infect Dis 2022;74:20282035.CrossRefGoogle ScholarPubMed
Ötleş, E, Oh, J, Li, B, et al. Mind the performance gap: examining dataset shift during prospective validation. Proceedings of the 6th Machine Learning for Healthcare Conference. PMLR 2021;149:506534.Google Scholar
Figure 0

Fig. 1. Area under the receiver operator characteristic curve (AUROC) at Massachusetts General Hospital (MGH) and Michigan Medicine (MM) in retrospective and prospective evaluations. The figures on the left show a comparison of AUROC in retrospective and prospective evaluations MGH (upper) and MM (lower). The 95% confidence intervals (CI) for the AUROC are shaded. The figures on the right show a monthly AUROC comparison at MGH (upper) and MM (lower). The 95% CI for the AUROC are represented by error bars.

Figure 1

Fig. 2. Confusion matrices at Massachusetts General Hospital (MGH) and Michigan Medicine (MM) in retrospective and prospective evaluations. The figures on the left display confusion matrices, sensitivity, specificity, and positive predictive values for retrospective evaluations at MGH (upper) and MM (lower). The figures on the right display the same metrics for prospective evaluations at MGH (upper) and MM (lower).

Supplementary material: File

Kamineni et al. supplementary material

Kamineni et al. supplementary material

Download Kamineni et al. supplementary material(File)
File 10.7 KB