Mortality risk prediction is one of the essential issues of healthcare decision making. The acute respiratory distress syndrome (ARDS) is caused by respiratory failure, which has 55% mortality rate for general ICU admissions . During the pandemic of coronavirus disease (COVID-19), one third of the COVID-19 patients suffered the respiratory failure . The respiratory failure is one of most severe causes of death with 39% average mortality rate among the COVID-19 patients with ARDS . In this situation, overwhelmed intensive care unit (ICU) system requires an urgent need of accurate mortality risk prediction for respiratory failure patients in order to allocate clinical resources during ICU stays and reduce morbidity accordingly.
On the other hand, the existing literature is limited on the research of ICU mortality prediction related to respiratory failure patients. In previous studies, quantitative tools have been used for estimating the mortality risks using data collected from respiratory failure patients in ICU, including two major directions: first, the traditional methods of mortality prediction are scoring systems, including Simpler Acute Physiology Score (SAPS), Acute Physiology and Chronic Health Evaluation (APACHE) , and Sequential Organ Failure Assessment (SOFA) . In study [7, 8]
, these risk scoring systems have been compared among the ICU patients with ARDS. The overall performance was limited, given that these scoring systems were developed by empirical models for general ICU patients population. These scoring system were not specific designed for respiratory failure patients. Thus, some studies have developed data-driven models using machine learning techniques[9, 2]
, which could be flexibly applied to various types of critical illness including respiratory failure. However, the conventional machine learning-based predictive methods have less consideration on the time-varying information while the patient’s risk probability is highly time-dependent.
To takle the above-mentioned challenges, we propose a cumulative hazard function based autoregressive hidden Markov model (CHF-AR-HMM) to handle the time-varying mortality risks. Additionally, this model has the advantages of estimating the short term risk against data imbalance and sparsity. Different from our previous work, the current model learns the survival model parameter cumulative hazard function in each time windows instead of directly estimating the parameter based on distribution of length of survival. Thus the cumulative hazard function has an increased capability to reflect the time-variation.
Ii-a Feature Engineering
Our model uses the first 24 hours ICU physiological data to predict the mortality risk by certain target time as shown in Figure 1. Firstly, we segment the first 24 hour data into equal sized time windows. To avoid loss of information with the low sampling rate data, the size of time window is pre-determined by hours (). In order to sort the risk for each variable, the SAPS II is employed to discretize the original physiological values into integer scores. While the higher SAPS II scores reflects a greater risk, the highest score in each time window will corresponds to the worst case representative for variables denoted as , where is the number of ICU variables and is the number of time windows. After data discretization, the existing missing data of each time window has been considered as a new feature set to reflect the sparsity of original dataset. The binary occurrence indicator marks the missing values for each variable, which defined as as 0 if missing and otherwise 1. The feature matrix for individual patients is denoted as .
Afterwards, in order to measure the variation of the observation sequence, the feature sequences from the feature matrix are labeled. We apply the partition around medoids (PAM)  to cluster the feature sequences across the time windows with the highest similarity measured by Gower’s distance . denoted as the number of clusters are now the integer values of the observation sequence along the time window.
Ii-B Transition Probability
In our study, the hidden states are defined as the future survival status by the target time , which is the terminal of mortality measurement. On the other hand, we lose the assumption of the AR-HMM that the current state is not directly correlated to the previous state, where the observation still follows , where is the autoregressive function determined by corresponding hidden state at time .
is not considered as an absorbing state. In this case, the transition probability is the prior probability of:
where is the total duration by time window with window size toward the target time . is the hazard function parameter at time window learned by the exponential parametric survival regression:
is the vector of coefficient for feature sequence at time window
. This model is learned by response variables including the length of survival and the outcome variable labeling the death by target time and random censoring patients (discharge from ICU before or censored by the target time).
For the training data, the last hidden state in the sequence is labeled as same as the outcome, where the data censoring by target time is considered as survival. The rest of the hidden states
are determined by the normalized probability of death. In the density based supervised normalization process, we first fit the probability of death from the training data by the outcome label in each time window with Gaussian density estimation. Thus, for both death and survival labels, the possible data points from probability of death of density have been mapped into the density. Then, we measure the proportion of the density of death class among the total density at each corresponding data point from original probability of death. This proportion is now the normalized version of probability of death. We can map the original probability of death to the corresponding proportion of the density as the new probability of death for both training and testing data to classify the hidden states with cut-off 0.5.
The structure of CHF-AR-HMM is shown in Figure 3. CHF-AR-HMM generate the joint probability of all the hidden states and observations:
where is emission probability. Recall the autoregressive function , we generalize the concept to the prior probability about the correlation between observations and the hidden states. The emission probability determines the prior probability of current observation in the condition of the current state with previous observation , where . is identified as joint probability of high mortality risk group where the hidden states sequence contains . is for low risk group with only hidden state. Based on the and , we define as the mortality risk measurement by target time:
where we sum up all the joint probabilities for either high or low risk group.
Iii Experimental Results
Our model is validated on the eICU Collaborative Research Database 
using the subset of patients who have been diagnosed with respiratory failure and have a minimum of 24 hours ICU stay. The samples with incomplete data for heart rate, blood pressure and Glasgow Coma Scale across the time window have been disregarded, and thus a total of 4391 samples are included. Each time window is fixed to 12 hours in this study. The rest missing values of each variable are imputed with their median. Because the lower bound of median stay in ICU for COVID-19 patients is 5 days, the target time is set from Day 2 to Day 5 after ICU admission to cover the early risks.
In this study we evaluate the model with three metrics. The primary metric is area under precision-recall curve (AUCPR), which is informative when positive class is more important. The baseline for AUCPR is equivalent to the ratio of death class. Secondly, since
is equivalent to the time-to-event joint probability for survival analysis, the concordance-statistic (C-statistic) is used to measure the goodness of fit for the binary classification. The concordance is defined that the joint probability of risk should be higher while the length of survival is shorter comparing to other samples. Finally, we still evaluate the model by AUROC considering the equal importance of both class. We include exponential parametric survival regression, logistic regression and SAPS II scoring system as baseline methods. The exponential parametric survival regression generates the hazard function to get the probability of death by target time. The maximum SAPS II score of variables in first 24 hours are used for baselines prediction. Our model is trained and evaluated by the 30 times 3-fold cross-validation on 2 to 5 days since ICU admission respectively.
|Model||AUCPR||95% CI||C-statistic||95% CI||AUROC||95% CI|
|CHF-AR-HMM||Day 2||0.27||(0.26, 0.28)||0.82||(0.81, 0.83)||0.83||(0.82, 0.83)|
|Day 3||0.35||(0.33, 0.35)||0.80||(0.80, 0.81)||0.81||(0.81, 0.82)|
|Day 4||0.42||(0.41, 0.43)||0.79||(0.79, 0.80)||0.81||(0.80, 0.82)|
|Day 5||0.44||(0.43, 0.45)||0.78||(0.78, 0.79)||0.80||(0.79, 0.80)|
|SAPS II||Day 2||0.22||(0.21, 0.24)||0.79||(0.78, 0.80)||0.80||(0.79, 0.81)|
|Day 3||0.29||(0.27, 0.30)||0.76||(0.76, 0.77)||0.78||(0.77, 0.79)|
|Day 4||0.35||(0.34, 0.36)||0.75||(0.74, 0.76)||0.77||(0.77, 0.78)|
|Day 5||0.37||(0.36, 0.38)||0.74||(0.73, 0.74)||0.76||(0.75, 0.76)|
|Parametric Survival Regression||Day 2||0.25||(0.23, 0.26)||0.81||(0.80, 0.81)||0.81||(0.81, 0.82)|
|Day 3||0.32||(0.30, 0.33)||0.78||(0.78, 0.79)||0.80||(0.79, 0.81)|
|Day 4||0.38||(0.37, 0.40)||0.78||(0.77, 0.78)||0.79||(0.78, 0.80)|
|Day 5||0.40||(0.38, 0.41)||0.76||(0.75, 0.77)||0.78||(0.77, 0.78)|
|Logistic Regression||Day 2||0.23||(0.22, 0.25)||0.80||(0.80, 0.81)||0.81||(0.80, 0.82)|
|Day 3||0.31||(0.30, 0.33)||0.78||(0.78, 0.79)||0.80||(0.79, 0.80)|
|Day 4||0.38||(0.36, 0.39)||0.77||(0.77, 0.78)||0.79||(0.78, 0.80)|
|Day 5||0.39||(0.38, 0.41)||0.76||(0.75, 0.76)||0.77||(0.77, 0.78)|
The AUCPR has the baseline value from 0.05, 0.09, 0.12 and 0.15 for Day 2 to Day 5 respectively. The C-statistic and AUROC has the baseline value at 0.5.
Iii-B Performance Comparison
The performance on the early mortality prediction for respiratory failure patients from Day 2 to Day 5 after ICU admission has been shown in Table I
. All the performance outcomes from cross-validation have been tested by paired one-tailed t-test. For all target days, performance of CHF-AR-HMM has the statistically significant improvement with all p-value less than 0.02. On Day 2, the CHF-AR-HMM has significantly improved the performance on AUCPR by 2% (p-value0.02) comparing with exponential parametric survival regression, which is the best performed baseline method. On Day 5, comparing with exponential parametric survival regression, the CHF-AR-HMM has significantly improved the performance on AUCPR by 4% (p-value 0.001). CHF-AR-HMM has significantly 1% (p-value 0.01) better than C-statistic than exponential parametric survival regression on Day 2 mortality prediction. On Day 5, CHF-AR-HMM has 2% (p-value 0.001) significant improvement comparing to exponential parametric survival regression. For AUROC, the CHF-AR-HMM has 2% (p-value 0.02) significant improvement on Day 2 comparing to exponential parametric survival regression. On Day 5, the AUROC from CHF-AR-HMM is 2% (-value 0.001) higher than that from exponential parametric survival regression. For survival analysis, the CHF-AR-HMM estimated survival probability generated from is displayed in Figure 4 along the target days. The estimated survival probability curve for actual death patients is lower than that from the actual survival patients.
In this study, CHF-AR-HMM is used for the prediction of early mortality for the ICU patients with respiratory failure. For all target days, comparing with baseline methods, our model shows significantly better capability of classifying the death class. While the prediction capability of our method is also better for both classes with equal importance. Meanwhile, the CHF-AR-HMM has significantly higher concordance comparing with other baseline methods. The survival curves are the distinguishable for actual death and survival patients as shown Figure 4. In the future work, we seek to enhance the positive class prediction capability by reducing the data overlapping with further feature engineering.
-  G. Bellani, J. G. Laffey, T. Pham, E. Fan, L. Brochard, A. Esteban, L. Gattinoni, F. Van Haren, A. Larsson, D. F. McAuley et al., “Epidemiology, patterns of care, and mortality for patients with acute respiratory distress syndrome in intensive care units in 50 countries,” Jama, vol. 315, no. 8, pp. 788–800, 2016.
-  S. Bolourani, M. Brenner, P. Wang, T. McGinn, J. S. Hirsch, D. Barnaby, T. P. Zanos, N. C.-. R. Consortium et al., “A machine learning prediction model of respiratory failure within 48 hours of patient admission for covid-19: Model development and validation,” Journal of medical Internet research, vol. 23, no. 2, p. e24246, 2021.
-  S. S. Hasan, T. Capstick, R. Ahmed, C. S. Kow, F. Mazhar, H. A. Merchant, and S. T. R. Zaidi, “Mortality in covid-19 patients with acute respiratory distress syndrome and corticosteroids use: a systematic review and meta-analysis,” Expert review of respiratory medicine, vol. 14, no. 11, pp. 1149–1163, 2020.
-  J. R. Le Gall, P. Loirat, A. Alperovitch, P. Glaser, C. Granthil, D. Mathieu, P. Mercier, R. Thomas, and D. Villers, “A simplified acute physiology score for ICU patients,” Crit. Care Med., vol. 12, no. 11, pp. 975–977, Nov 1984.
-  W. A. Knaus, D. P. Wagner, E. A. Draper, J. E. Zimmerman, M. Bergner, P. G. Bastos, C. A. Sirio, D. J. Murphy, T. Lotring, and A. Damiano, “The apache iii prognostic system. risk prediction of hospital mortality for critically ill hospitalized adults.” Chest, vol. 100 6, pp. 1619–36, 1991.
-  U. Janssens, R. Dujardin, J. Graf, W. Lepper, J. Ortlepp, M. Merx, M. Zarse, T. Reffelmann, and P. Hanrath, “Value of SOFA (sequential organ failure assessment) score and total maximum SOFA score in 812 patients with acute cardiovascular disorders,” Critical Care, vol. 5, no. Suppl 1, 2001.
-  M. Aydogdu, E. Ozyilmaz, H. Aksoy, G. Gursel, and N. Ekim, “Mortality prediction in community-acquired pneumonia requiring mechanical ventilation; values of pneumonia and intensive care unit severity scores,” Tuberk Toraks, vol. 58, no. 1, pp. 25–34, 2010.
-  A. Saleh, M. Ahmed, I. Sultan, and A. Abdel-Lateif, “Comparison of the mortality prediction of different icu scoring systems (apache ii and iii, saps ii, and sofa) in a single-center icu subpopulation with acute respiratory distress syndrome,” Egyptian journal of chest diseases and tuberculosis, vol. 64, no. 4, pp. 843–848, 2015.
-  W. D. Gannon, D. J. Lederer, M. Biscotti, A. Javaid, N. M. Patel, D. Brodie, M. Bacchetta, and M. R. Baldwin, “Outcomes and mortality prediction model of critically ill adults with acute respiratory failure and interstitial lung disease,” Chest, vol. 153, no. 6, pp. 1387–1395, 2018.
-  Y. Yin and C.-A. Chou, “A novel switching state space model for post-icu mortality prediction and survival analysis,” IEEE Journal of Biomedical and Health Informatics, 2021.
A. P. Reynolds, G. Richards, B. de la Iglesia, and V. J. Rayward-Smith, “Clustering rules: a comparison of partitioning and hierarchical clustering algorithms,”Journal of Mathematical Modelling and Algorithms, vol. 5, no. 4, pp. 475–504, 2006.
-  J. C. Gower, “A general coefficient of similarity and some of its properties,” Biometrics, pp. 857–871, 1971.
-  T. J. Pollard, A. E. Johnson, J. D. Raffa, L. A. Celi, R. G. Mark, and O. Badawi, “The eicu collaborative research database, a freely available multi-center database for critical care research,” Scientific data, vol. 5, no. 1, pp. 1–13, 2018.
-  E. M. Rees, E. S. Nightingale, Y. Jafari, N. R. Waterlow, S. Clifford, C. A. Pearson, T. Jombart, S. R. Procter, G. M. Knight, C. W. Group et al., “Covid-19 length of hospital stay: a systematic review and data synthesis,” BMC medicine, vol. 18, no. 1, pp. 1–22, 2020.