Background
Healthcare-associated Clostridioides difficile infection (HA-CDI) represents about two-thirds of CDI cases in the United States. 1,Reference Lessa, Mu and Bamberg2 Retrospective epidemiologic studies have focused on identifying risk factors, evaluating diagnosis and treatment appropriateness, and measuring attributable outcomes for HA-CDI. Reference Kang, Abeles and El-Kareh3,Reference Kelly, Yarrington and Zembower4 However, the validity of this research relies on accurate identification of HA-CDI cases, and previous studies of other infections have demonstrated that reliance on administrative or laboratory data may lead to misclassification. Reference Longtin, Trottier and Brochu5,Reference Marra, Edmond, Ford, Herwaldt, Algwizani and Diekema6
We developed and validated a CDI case definition to accurately detect CDI cases using antibiotic treatment, laboratory test, and diagnosis code data in the electronic health record (EHR) to specifically be used for retrospective research. We hypothesized that a multi-component case definition would more accurately detect patients with CDI compared to any single-component case definition.
Method
Study design and data source
This validation study was conducted including all Oregon Health & Science University (OHSU) inpatient hospital encounters between January 2018 and March 2020. OHSU is a 576-bed academic, quaternary-care hospital in Portland, Oregon. We excluded patients under age 18, those with known recurrent or community-acquired CDI, and those with hospital stays of less than four calendar days. Eligible subjects were sampled for chart review as described below and in Supplemental Figure 1. This project was approved by OHSU’s institutional review board.
Case definition algorithm to identify incident hospital-associated CDI
EHR data were collected from our institution’s previously validated research data repository. Reference Furuno, Tallman and Noble7 To identify putative cases of incident, non-recurrent HA-CDI, we combined medication, diagnosis code, and laboratory testing data (Box 1). We defined hospital-associated CDI as incident CDI when the onset date, defined as the date of first anti-C. difficile antibiotic administration or C. difficile positive stool specimen, whichever occurred first, fell on hospital day 4 or later. We defined CDI as non-recurrent if no prior CDI events were identified at our institution in the 8 weeks before the index CDI diagnosis applying the same diagnostic criteria.
Collection of gold-standard incident HA-CDI data
We randomly selected 80 algorithm-identified HA-CDI cases and 80 non-CDI cases for chart review to identify the gold-standard “true” case status. We determined a priori that this sample size would be sufficient to achieve 94% power to discern cases from non-cases. Reference Pepe8 We (MJR, KLL, MRL, KR) manually reviewed each encounter medical record (Epic EHR system). To be ruled a true case of HA-CDI and establish our gold standard, there must have been documentation of, on hospital day 4 or later, at least three loose/liquid/unformed stools with no alternative explanation documented for diarrhea symptoms, initiation of CDI-specific antibiotic treatment, and any positive laboratory test (C. difficile toxin or PCR if toxin test indeterminant) for C. difficile or C. difficile-specific diagnosis code. We flagged the record for further review by an infectious disease physician (LCS) or pharmacist (KR) if the initial reviewer was unable to reach a CDI ruling. We utilized REDCap to collect study data.
Data analysis
We calculated our case algorithm’s sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV), and overall percent accuracy (ie, percent of cases/non-cases correctly identified) with 95% confidence intervals (CI) for each. Reference Pepe8 Chart review assessment was considered the gold standard and case identification based on electronic data were considered “test” data. We also examined the diagnostic performance of individual algorithm components (eg, laboratory test only, diagnosis code only, oral vancomycin only) and various modifications to the algorithm (Box 1).Our power calculation was performed using Stata (version 16, StataCorp., College Station, TX) and all other analysis using SAS (v9.4, SAS Corporation, Cary, NC).
Results
Of the 103,275 inpatient encounters evaluated, 50,394 (49%) were eligible for inclusion. Overall, 5,039 (10%) of included encounters involved CDI treatment (metronidazole, oral vancomycin, or fidaxomicin), with 710 (14.1%) receiving oral vancomycin or fidaxomicin. A positive test for C. difficile was identified in 396 encounters (0.8%), and 487 (1%) had an ICD-10 code for non-recurrent CDI. Per our case definition (Box 1), we identified 190 putative cases of incident, HA-CDI. Among these, 157 (83%) encounters had all three components of our case definition (anti-CDI therapy, positive laboratory test, and ICD-10 code). Of the 80 algorithm-identified HA-CDI cases that we sampled for review, 66 (83%) had all three criteria, while 9 (11%) had a positive laboratory test and no ICD-10 code, and 5 (6%) had an ICD-10 code and no positive laboratory test.
Among our chart review sample, our algorithm identified HA-CDI cases with 94% accuracy (95% CI: 88%–97%). We achieved 100% sensitivity (94%–100%), 89% specificity (81%–95%), 88% PPV (78%–94%), and 100% NPV (95%–100%). Performance of the individual algorithm components is summarized in Table 1. Adapting the initial algorithm to require a positive laboratory test (as opposed to an optional positive test if an ICD-10 code for CDI was included) improved diagnostic performance across all measures by avoiding 5 false positives, compared to the original algorithm, improving specificity to 94% (87%–98%), PPV to 93% (84% –98%), and overall accuracy to 97% (93%–99%).
Abbreviations: CI, confidence interval; PPV, positive predictive value; NPV, negative predictive value.
Discussion
Our study suggests that the best strategy to identify inpatient HA-CDI cases relies on a combination of drug administration and laboratory testing test; use of ICD-10 code data does not improve case identification. Requiring a positive C. difficile laboratory test further improved the diagnostic accuracy by avoiding 5 false positives, and use of multiple components performed better than any individual component.
Our study advances the methodological foundation for future retrospective epidemiologic studies of HA-CDI by providing a validated, accurate method for case identification. Much of the literature to date examines the utility of using a single component to detect cases. For example, Litvin et al. observed a “pseudo-outbreak” of CDI using a laboratory-test-only-based definition, which was, in reality, due to a faulty assay lot leading to a perceived 32% facilitywide increase in CDI incidence. Reference Litvin, Reske and Mayfield9 Pfister et al. reported that the ICD-10-CM code for non-recurrent CDI had 85% sensitivity and 80% PPV when applied to a provincewide (Alberta, CA) discharge database. Reference Pfister, Rennert-May, Ellison, Bush and Leal10 These studies identify important pitfalls of single-component case detection, thus motivating our study.
The primary limitation to this study is the assessment of the gold-standard HA-CDI diagnosis, which relies on EHR documentation and may not align with a prospective case evaluation, had that been feasible. Further assessment at additional facilities is necessary to determine if these results are generalizable, given differences in patient acuity, CDI incidence, testing, and antibiotic utilization. Additionally, while we calculated power/sample size a priori, it is possible that we underestimated our denominator for sensitivity and NPV calculations, though this would not affect our specificity and PPV calculations. Finally, we are unable to elucidate if an individual had CDI at another institution. Thus, we could be misclassifying recurrent CDI as initial episodes.
Our study has important implications. Our CDI case definition algorithm can be applied as a gold standard to readily available EHR information to accurately detect HA-CDI cases. Accurate retrospective identification of CDI cases is crucial for research as misclassification could lead to biased estimates of risk. Our algorithm detected HA-CDI cases with perfect sensitivity and high overall accuracy. Requiring a positive laboratory test further improved our algorithm’s diagnostic accuracy. We recommend considering both a CDI-specific medication and a positive laboratory test as the new standard research definition when classifying HA-CDI cases from EHR data.
Supplementary material
The supplementary material for this article can be found at https://doi.org/10.1017/ice.2024.140
Acknowledgements
The authors acknowledge Caitlin M. McCracken for assisting with data abstraction and preparation.
Financial support
This project received support from the National Institutes of Health grants UL1TR002369 and RL5GM118963.
Competing interests
All authors report no conflicts of interest relevant to this article.