Background: Candidemia is a leading cause of bloodstream infections (BSIs), and community-onset candidemia is being recognized as a public health problem. In the era of electronic health records (EHRs), we can use machine learning to detect patterns in patient data that may predict infections. Objective: We aimed to predict community-onset candidemia in patients admitted to the University of Iowa Hospital & Clinics (UIHC) using machine-learning algorithms. Methods: We retrospectively reviewed data for patients admitted to UIHC during 2015–2018. All adult inpatients who had a requested blood culture were included. Candidemia was defined as a blood culture positive for Candida within 48 hours after admission. Variables of interest were extracted from the EHR: age, sex, body mass index, and month of admission. We also included comorbidities upon admission defined by the International Classification of Diseases, 10th Revision, Clinical Modification (ICD-10-CM): cardiovascular diseases, neurological disorders, chronic pulmonary disease, dementia, rheumatoid disease, peptic ulcer disease, liver disease, diabetes mellitus, hypothyroidism, renal failure, coagulopathy, obesity, weight loss, fluid and electrolyte disorders, anemia, alcohol abuse, drug abuse, psychiatric diseases, malignancy, and HIV/AIDS. We calculated Charlson and Elixhauser scores based on ICD-10-CM codes. We also included prehospitalization conditions (90 days before admission): Candida-positive cultures from sites other than blood, antibiotics/antifungals, hemodialysis, central lines, corticosteroids, surgeries, and intensive care unit (ICU) admissions. Mode and median imputation were used for missing information. Random forests with resampled training sets were used for prediction, and results were evaluated using 10-fold cross validation. Results: In total, 30,528 adult admissions were extracted; 73 admissions had an episode of candidemia (<1%). Median admission age was 61 years, and nearly half of admissions were female patients (44.7%). Mean BMI was 27.67. The most admissions occurred during the months of March, August, and November. The 3 most common ICD-10-CM codes were diabetes mellitus, hypertension, and cancer. Median Charlson and Elixhauser scores were 1 and 2, respectively. The model used 103 variables. The 3 most predictive variables were Elixhauser score on admission, and characteristics in the 90 days prior to admission were Candida from sites other than blood, use of a central line, and recent use of antibiotics/antifungals. The model’s area under the receiver operating characteristic curve was 0.72. Conclusions: Preadmission patient characteristics predicted community-onset candidemia. Machine-learning models may help detect patients eligible for screening for candidemia and prompt empiric antifungal therapy in high-risk patients in the first 48 hours of their admission.
Funding: None
Disclosures: None