Predicting the clinical progress of hospitalized coronavirus disease 2019 (COVID-19) patients in the face of emerging variants is a crucial challenge for the medical community. This undertaking has far-reaching consequences for not only the determination of high-risk groups for the disease in a dynamic situation but also the efficient allocation of medical resources (especially when they are scarce). We sought to estimate the duration of hospital stay and in-hospital mortality of COVID-19, analyses that are vulnerable to severe competing risks and time-dependent biases Reference Wolkewitz and Schumacher1 that can invalidate traditional methods of estimation. Furthermore, we sought to avert the selection bias that occurs when current cases are removed from the analysis; a bias that impaired some reports at the outset of the pandemic. Reference Zhou, Yu and Du2
Methods
Data from patients with a hospital admission 7 days before or 14 days after a positive severe acute respiratory coronavirus virus 2 (SARS-CoV-2) test were extracted from electronic health records (EHRs). Cases were categorized by sex, age, and phases of the pandemic based on the predominant variants (sequenced samples >50%) as published by the German health authorities. 3 The following periods were used as surrogate variables for the variants and treatment options in the specified calendar period. January 27, 2020, to February 14, 2021, was characterized by SARS-CoV-2 wild-type dominance, with possible treatments with dexamethasone and remdesivir as well as minimal vaccination. February 15, 2021, to June 20, 2021, was characterized by SARS-CoV-2 α (alpha) variant dominance, with initial antibody therapy availability and vaccination of high-risk groups. June 21, 2021, to December 26, 2021, was characterized by SARS-CoV-2 δ (delta) variant dominance, with the availability of inpatient intravenous antiviral therapy and vaccination of the general population as well as booster vaccinations for high-risk groups. Finally, December 27, 2021, to March 1, 2022, was characterized by SARS-CoV-2 ο (omicron) variant dominance, with additional oral antiviral therapy availability and booster vaccination of the general population.
We employed a multistate model to estimate the durations of hospital stays and mortality as described previously Reference Hazard, Kaier and von Cube4 with 2 transitory states (ie, nonsevere, related to regular ward admission, and severe, related to intensive care unit admission) Reference Klann, Estiri and Weber5 and 2 terminal states (in-hospital death and discharge alive). In addition, we performed multivariable Cox regression on the competing terminal states.
Results
For the entire cohort at day 30 after admission, the predicted mortality was 9.3%; 5.1% were predicted to have severe illness, and 1.4% were predicted to have nonsevere illness. Also, 84.1% were predicted to be discharged alive at day 30 (Supplementary Material 1). The total expected length of hospital stay at day 30 for the full cohort was 8.62 days (95% CI, 7.95–9.28), of which 5.09 days were estimated for nonsevere illness plus 3.52 days for severe illness.
The calendar comparison revealed a decreasing expected length of stay with each subsequent phase. This trend is illustrated in the stacked probability plots shown in Figure 1 (see Supplementary Material 3 and 4 for sex and age category). The first 2 phases displayed the highest predicted death probability on day 30. Multivariable Cox results are provided in Table 1. After adjusting for sex and age category, the SARS-CoV-2 α (alpha) phase had the highest mortality hazard ratio (1.40; 95% CI, 1.03–1.90).
Note. ELOS, expected length of stay; aHR, adjusted hazard ratio; CI, confidence interval calculated with robust standard errors; SARS-CoV-2 wild-type period, January 27, 2020–February 14, 2021; SARS-CoV-2 α (alpha) period, February 15, 2021–June 20, 2021; SARS-CoV-2 δ (delta) period, June 21, 2021–December 26, 2021; SARS-CoV-2 ο (omicron) period, December 27, 2021–March 1, 2022. Periods are a surrogate for the variants and treatment options in the specified calendar period.
Discussion
We calculated duration of stay, etiological, and risk estimates from a large data set of hospitalized COVID-19 patients based on multistate methodology avoiding many pitfalls. The strength of multistate models is highlighted in their divergence from their traditional, yet biased alternatives. A naive (ie, not accounting for competing risks) Kaplan-Meier estimate would have shown a 30-day survival probability of 72% based on the full cohort. This finding contrasts with a survival probability of 91% from the multistate model: a massive 19% difference.
Although median length-of-stay estimates have been provided in many publications, Reference Veneti, Bøås and Kristoffersen6 these estimates are invalid when there is a high degree of censoring. Notably, a median length of hospital stay for severely ill patients of 13 days could be determined by ignoring the time dependency of patients transitioning into severe illness. This naive estimate is based on knowing at admission which patients will move into the severe state (an impossible proposition) and does not give information on when or for how long they will remain in this state. The multistate result of an expected 3.5 days of severe illness (95% CI, 3.1–3.9) conditioned on patients being admitted as nonsevere on day 0 is not grounded in knowing the future and is far more informative. ‘Severe’ and ‘nonsevere’ illness have starkly different consequences in terms of costs and resources. In further contrast to traditional methods, the multistate analysis can be extended to condition on different days and states (Supplementary Material 4).
Improper treatment of current cases (by removing them from the data set) would have resulted in an adjusted odds ratio of 0.81 (95% CI, 0.59–1.14) to be discharged alive for patients in the SARS-CoV-2 ο (omicron) phase using logistic regression. This biased point estimate implies that patients in the SARS-CoV-2 ο (omicron) phase had a lower odds of being discharged alive, contrary to the properly calculated hazard ratio 1.56 (95% CI, 1.39–1.76) using Cox regression. These patients, in fact, had a higher discharge rate.
The interplay among the duration estimates, plots, and regression results is illustrative of why all are needed to attain a full account. The expected lengths of stay in the initial pandemic phase were longer than for the 3 subsequent phases. Analyzing a stacked probability plot, we observed a higher predicted death risk during the SARS-CoV-2 α (alpha) period. The influence of death would imply a shorter length of stay. However, we did not observe the higher death risk in the SARS-CoV-2 δ (delta) and SARS-CoV-2 ο (omicron) periods. In the last 2 phases, the opposing force of discharge alive (ie, quantified in the significantly higher hazard ratios) resulted in shorter hospital stays. Moreover, determining whether discharge alive (clinical improvement) or death (clinical failure) causes shorter stays is crucial for treatment comparisons. Reference Martinuka, von Cube and Wolkewitz7
It is a strength that the proposed methods can be conducted in real time, thus giving an immediate impression of changing circumstances. The use of EHRs is also advantageous in that extraction could be standardized to allow data conglomeration and meta-analyses. Although the positive-result time window facilitates this standardization, it is likely that a good proportion of patients were admitted for reasons other than COVID-19. Reference Klann, Strasser and Hutch8 Further bias may have been introduced through EHR coding limitations at the outset and varying testing guidelines throughout the studied periods. The analysis was also limited by the absence of adjustment for other potential confounders such as issues of delayed care, specific treatments, or prior infection. Moreover, caution must be taken in extrapolating to other geographic regions where “severe” may have different interpretations. Lastly, calendar periods served as surrogate variables for the predominant variants, and claims have not been made regarding their intrinsic disease severity.
Alternative multistate methods to those presented might also be explored. Parametric models Reference Jackson, Tom and Kirwan9 can extrapolate outside the time frame in which the data were collected, overcoming a shortcoming of nonparametric estimators. Despite not predicting the progression of patients as demonstrated here, length-of-stay estimates conditioned on the final pathway of patients can be used for resource planning. Reference Keogh, Diaz-Ordaz and Jewell10
Without ignoring the seriousness of the COVID-19 pandemic, we should learn as much as we can from this crisis. As demonstrated here, multistate methodology should be used in the analyses of future patients, whether for COVID-19 or subsequent pandemics.
Supplementary material
To view supplementary material for this article, please visit https://doi.org/10.1017/ash.2023.153
Acknowledgments
We thank Prof. Martin Schumacher for his invaluable insights and recommendations.
Financial support
This research was supported by the Innovative Medicines Initiative Joint Undertaking resources, comprising financial contribution from the European Union’s Seventh Framework Programme (grant no. FP7/2007-2013) and EFPIA companies (grant nos. 115737-2–COMBACTE-MAGNET and 115523–COMBACTE-NET).
Conflicts of interest
All authors report no conflicts of interest relevant to this article.