The acute respiratory distress syndrome (ARDS) is a significant cause of morbidity and mortality in the USA and worldwide [1, 2, 3, 4]. While the mainstay of treatment is to treat the inciting cause of ARDS effectively, it is clear that early, evidence-based management of ARDS can limit the propagation of lung injury and significantly improve patient outcomes. Early management requires early identification. In 2012 an expert panel announced a new definition of ARDS as the acute (within 7 days of a precipitant cause such as sepsis, trauma, etc.) development of abnormal oxygenation and bilateral opacities in chest imaging consistent with pulmonary edema. However the reported ability to predict mortality using this definition was at best modest with an AUC of 0.577 .
To date, there exists no reliable way to anticipate which patients are likely to develop ARDS. Numerous prediction scores have been developed to assess ARDS prognosis and risk of death, such as the Lung Injury Score (LIS) , Lung Injury Prediction Score (LIPS) , APPS (age, plateau, PaO2/FiO2) score , Early Acute Lung Injury (EALI) , and Modified ARDS Prediction Score (MAPS) . However, the predictive validities of these ARDS scoring tools have been shown to be moderate: for example LIPS discriminated patients who developed Acute Lung Injury (ALI) from those who did not with an AUC of 0.80 (95% CI, 0.77–0.84)7. MAPS was shown to have a similar AUC of 0.79 (95% CI, 0.72 - 0.87) in predicting ARDS development10; the reported EALI AUC was 0.85 (95% CI: 0.80-0.91, (on the training set) for identifying patients who progressed to acute lung injury .
In identifying mortality, the predictive validity of LIS was found to be limited, with an AUC of 0.60 (95% CI 0.55 to 0.65), in the era of the Berlin definition . Similarly APPS was found to have an AUC of 0.8 for predicting ARDS mortality . General illness severity scores, such as SAPS, SAPS II, APACHE II/III, and MPM, were also examined in their ability in helping recognize ARDS early, but only similar moderate performance has been reported [8, 9, 11].
Improved predictive validity is needed to enable reliable early identification and management of patients at risk for ARDS. We hypothesize that a barrier to improved predictive performance of existing scoring tools may be the heterogeneity of the ARDS populations used to derive these models. Studies have indicated that ARDS is a highly heterogeneous syndrome that may be composed of several distinct sub-phenotypes [12, 13, 14]
. Such heterogeneity in population implies heterogeneity in relationships between explanatory and response variables within partitions, posing serious challenges in predictive model building seeking to identify a common explanatory data pattern associated with an outome.
In this study, we integrate prior knowledge of the heterogeneity in ARDS population into predictive model building by identifying ARDS subtypes that share common underlying pathophysiology as statistically expressed in observed clinical data (e.g. labs, vitals). Specifically we utilize latent class analysis (LCA)  to identify homogeneous sub-groups of ARDS subjects , and then build predictive models on partitioned data  to see whether predictive validity could be improved, comparing with those from treating all patients as a whole homogeneous group. Prior work has demonstrated the value of problem domain semantics in the enhancement of mortality risk predictive models .
2 Material and Methods
Clinical encounter data of adult patients (age 18 years) were extracted from the MIMIC- III version 1.4 ICU database . MIMIC-III consists of retrospective ICU encounter data of patients admitted into Beth Israel Deaconess Medical Center from 2001 to 2012. Included ICUs are medical, surgical, trauma-surgical, coronary, cardiac surgery recovery, and medical/surgical care units. Although MIMIC-III includes both time series data recorded in the EMR during encounters (e.g. vital signs/diagnostic laboratory results, free text nursing notes/radiology reports, medications, discharge summaries, treatments, etc.) as well as high- resolution physiological data (time series / waveforms) recorded by bedside monitors, only the time series data recorded in the EMR was used in this study.
2.1.1 Data for Latent Class Analysis
ICD-9 diagnosis codes and procedure codes identifying mechanically ventilated patients are the basis for identifying the class of ARDS patients. PaO2, FiO2, and PEEP information were extracted from charted data. Time points of ARDS onset are defined based on Berlin criteria, i.e. PaO2/FiO2 ratio 300 with PEEP at least 5 cmH2O. The observed vital and lab measurements after the identified diagnosis time are extracted, and features constructed as class-defining variables in the LCA modelling including BMI, means of bicarbonate, plateau pressure, mean airway pressure (MAP), PaCO2, tidal volume, platelet count, total bilirubin; minimum of sodium, glucose, albumin, hematocrit, systolic blood pressure (SBP); maximum of temperature, heart rate, white blood cell (WBC) count, creatinine; and first-available PaO2/FiO2 ratio and PEEP (Table 1). Four predisposition conditions: sepsis, shock, aspiration, and pneumonia, are also included in the LCA analysis.
2.1.2 Data for Predictive Modeling
Features considered in the predictive model building include: 1) vital signs: heart rate, respiratory rate, body temperature, systolic blood pressure, diastolic blood pressure, mean arterial pressure, oxygen saturation, tidal volume; 2) laboratory tests: white blood cell count, bands, hemoglobin, hematocrit, lactate, creatinine, bicarbonate, pH, PT, INR, BUN, blood gas measurements (partial pressure of arterial oxygen, fraction of inspired oxygen, and partial pressure of arterial carbon dioxide); 4) motor, verbal, and eye sub-score of Glasgow Coma Scale ; and 5) indicators of predisposition factors and potential modifier: gender, sepsis, shock, trauma, pneumonia, high risk surgery, near drowning, fracture, diabetes. See Appendix for a complete list of features with considered time ranges and properties of features.
2.2.1 Latent Class Analysis
Latent class model estimation is based on Gaussian finite mixture modelling methods
. It assumes that the population is composed of a finite number of components. Mixture model parameters, i.e. components’ means, covariance structure, and mixing weights, are obtained via the expectation maximization (EM) algorithm. Before LCA modeling, Yeo-Johnson power transformation
is applied to ensure approximate normality of continuous variables, and mean imputation is used to replace missing values. To select a model fitting the data best, a series of latent class models with different number of components are fitted, and Bayesian Information Criteria (BIC) is used for model selection.
2.2.2 Predictive Modeling
Predictive models, including gradient boosted machine (GBM)
and random forest (RF), were built for all cases and each phenotype separately, with all non ARDS subjects as the contrast group. Missing values were replaced by medians of each variables. Synthetic Minority Over-Sampling Technique (SMOTE) is applied to resample data 
. Data was split into a training (70%) and test set (30%). Cross validation was used for tuning hyperparameters: number of trees, interaction depth, learning rate, minimum number of observations in nodes for GBM model; and number of trees for RF model. Tuned models were used to evaluate performance of predicting sepsis in the test set. Confidence intervals of performance metrics were obtained by bootstrapping method. To compare performance (e.g. AUC) between models, the method proposed by Delong et al. 
was applied, with the null hypothesis that the true difference in performance metrics was equal to 0.
3.1 Latent Class Analysis to Identify ARDS Sub-phenotypes
Data of 4181 ARDS patients (4714 ICU stays) were used in the LCA analysis classification was conducted without consideration of clinical outcomes. Details on clinical variable selection, data cleaning and a complete list of the clinical variables included in the LCA models are listed in Table 1. Unless specified, medians of measurements are extracted. When value is missing values, median imputation is applied to each variable.
|Maximum Heart Rate||85||124.34||23.89||55||123||239|
BIC criterion suggests a 3-component VEV model (i.e. ellipsoidal distribution, variable volume, equal shape, and variable orientation) fits the data best (Figure 1). Numbers of ICU stays assigned into sub-phenotypes are 753 in phenotype 1, 1471 in phenotype 2, 2490 in phenotype 3. Figure 2 shows differences in variables by phenotype assignment. Figure 3 below shows mortality of each phenotype.
3.2 Predictive Models
Predictive models were built for all cases and each phenotype separately. Performances of predictive models evaluated using validation set are listed in Table 2.
Comparisons between models for phenotypes and for all cases indicate that significant improvements in AUC over existing predictive models are obtained for phenotype 1 and 2. Phenotype 3 was found to be the most difficult to predict, with an AUC 90.9% using GBM. However, even in this case, it is still a great improvement when compared to the LIPS score with a 81% AUC .
|Phenotype||Metrics||GBM||Random Forest||P value|
|AUC||0.904(0.897 - 0.911)||0.904(0.897 - 0.911)||Reference|
|Sensitivity||0.905(0.837 - 0.931)||0.9(0.88 - 0.916)|
|Specificity||0.765(0.745 - 0.825)||0.782(0.768 - 0.799)|
|Accuracy||0.779(0.761 - 0.827)||0.793(0.781 - 0.807)|
|PPV||0.289(0.275 - 0.338)||0.303(0.291 - 0.317)|
|NPV||0.987(0.98 - 0.99)||0.987(0.984 - 0.989)|
|1||AUC||0.983(0.981 - 0.986)||0.985(0.983 - 0.987)||0.0001*|
|Sensitivity||1(0.997 - 1)||1(0.995 - 1)|
|Specificity||0.952(0.948 - 0.957)||0.956(0.95 - 0.961)|
|Accuracy||0.953(0.949 - 0.957)||0.956(0.951 - 0.961)|
|PPV||0.231(0.218 - 0.25)||0.246(0.225 - 0.269)|
|NPV||1(1 - 1)||1(1 - 1)|
|2||AUC||0.986(0.984 - 0.988)||0.985(0.982 - 0.987)||0.0001*|
|Sensitivity||0.982(0.97 - 0.994)||0.979(0.961 - 0.99)|
|Specificity||0.957(0.953 - 0.961)||0.955(0.938 - 0.96)|
|Accuracy||0.957(0.953 - 0.961)||0.956(0.939 - 0.961)|
|PPV||0.391(0.368 - 0.414)||0.382(0.31 - 0.411)|
|NPV||0.999(0.999 - 1)||0.999(0.999 - 1)|
|3||AUC||0.909(0.901 - 0.918)||0.902(0.892 - 0.911)||0.6538*|
|Sensitivity||0.902(0.829 - 0.93)||0.879(0.826 - 0.919)|
|Specificity||0.769(0.749 - 0.843)||0.773(0.733 - 0.82)|
|Accuracy||0.776(0.757 - 0.843)||0.778(0.742 - 0.821)|
|PPV||0.167(0.155 - 0.215)||0.165(0.148 - 0.191)|
|NPV||0.994(0.99 - 0.995)||0.992(0.989 - 0.994)|
The study applied LCA to group ARDS patients into 3 sub-phenotypes and built separate predictive models for each phenotype. Using routinely available clinical variables, our LCA analysis identified three classes of ARDS that had different mortality rate, with sub phenotype 2 having significantly higher mortality (48%) than the other 2 types. Key characteristics of phenotype 2 include high bilirubin (liver failure), high creatinine/high pulse pressure (heart/renal failure), thrombocytopenia, lower minimum systolic blood pressure (hemodynamic instability), and lower PaO2/FiO2 indicating most severe respiratory failure. The identification of these subtypes may help triage ARDS patients that respond differently to treatment (e.g. fluids). At a AUC of .98 for this high mortality subgroup, our results indicate that significantly improved performance of prediction can be obtained for key ARDS sub-phenotypes.
It is known that heterogeneity in population poses a great challenge to predictive modeling. First, the training data may be comprised of instances from not just one distribution, but several distributions juxtaposed together. In the presence of multimodality within the classes, there may be imbalance among the distribution of different modes in the training set. Hence, some of the modes may be underrepresented during training, resulting in poor performance on those modes during the testing stage. Second, while some of the modes of a particular class may be easy to distinguish from modes of the other class, there may be modes that participate in class confusion, i.e., reside in regions of feature space that overlap with instances from other classes. The presence of such overlapping modes can degrade the learning of any classification model trained across all modes of every class. Third, even if we are able to learn a predictive model that shows reasonable performance on the training set, the test set may have a completely different distribution of data instances than the training set, as the populations of training and test sets can be different. Hence, the training performance can be quite misleading as it may not always be reflective of the performance on test instances. Due to these reasons, identifying homogeneous subgroups within heterogeneous population mitigate the impact of population’s heterogeneity in predictive model building.
A limitation of MIMIC III is that inflammatory or genetic biomarker data was not available. Studies indicates they may contribute in ARDS phenotyping. Combining clinical data and biological data in LCA-based phenotyping may improve homogeneities of ARDS sub-phenotypes, and further enhance predictive performance of machine learning within subgroups.
Research reported in this publication was supported by a NIH SBIR award to CTA by NIH National Heart, Lung, and Blood Institute, of the National Institutes of Health under award number 1R43HL135909-01A1.
- 1. Bellani G, Laffey JG, Pham T, Fan E, Brochard L, Esteban A, et al. Epidemiology, patterns of care, and mortality for patients with acute respiratory distress syndrome in intensive care units in 50 countries. Jama. 2016;315(8):788–800.
- 2. Pham T, Rubenfeld GD. Fifty years of research in ARDS. The epidemiology of acute respiratory distress syndrome. A 50th birthday review. American journal of respiratory and critical care medicine. 2017;195(7):860–870.
- 3. Bice T, Cox CE, Carson SS. Cost and health care utilization in ARDS—different from other critical illness? In: Seminars in respiratory and critical care medicine. vol. 34. Thieme Medical Publishers; 2013. p. 529–536.
- 4. Webster N, Cohen A, Nunn J. Adult respiratory distress syndrome—how many cases in the UK? Anaesthesia. 1988;43(11):923–926.
- 5. Force ADT, Ranieri V, Rubenfeld G, et al. Acute respiratory distress syndrome. Jama. 2012;307(23):2526–2533.
- 6. Murray JF, Matthay MA, Luce JM, Flick MR, et al. An expanded definition of the adult respiratory distress syndrome. Am Rev Respir Dis. 1988;138(3):720–723.
- 7. Gajic O, Dabbagh O, Park PK, Adesanya A, Chang SY, Hou P, et al. Early identification of patients at risk of acute lung injury: evaluation of lung injury prediction score in a multicenter cohort study. American journal of respiratory and critical care medicine. 2011;183(4):462–470.
- 8. Villar J, Ambrós A, Soler JA, Martínez D, Ferrando C, Solano R, et al. Age, PaO2/FIO2, and Plateau pressure score: a proposal for a Simple Outcome score in patients with the Acute Respiratory distress syndrome. Critical care medicine. 2016;44(7):1361–1369.
- 9. Levitt JE, Calfee CS, Goldstein BA, Vojnik R, Matthay MA. Early acute lung injury: criteria for identifying lung injury prior to the need for positive pressure ventilation. Critical care medicine. 2013;41(8):1929.
- 10. Xie J, Liu L, Yang Y, Yu W, Li M, Yu K, et al. A modified acute respiratory distress syndrome prediction score: a multicenter cohort study in China. Journal of thoracic disease. 2018;10(10):5764.
- 11. Kangelaris KN, Calfee CS, May AK, Zhuo H, Matthay MA, Ware LB. Is there still a role for the lung injury score in the era of the Berlin definition ARDS? Annals of intensive care. 2014;4(1):4.
- 12. Calfee CS, Delucchi K, Parsons PE, Thompson BT, Ware LB, Matthay MA, et al. Subphenotypes in acute respiratory distress syndrome: latent class analysis of data from two randomised controlled trials. The Lancet Respiratory Medicine. 2014;2(8):611–620.
- 13. Sinha P, Delucchi KL, Thompson BT, McAuley DF, Matthay MA, Calfee CS, et al. Latent class analysis of ARDS subphenotypes: a secondary analysis of the statins for acutely injured lungs from sepsis (SAILS) study. Intensive care medicine. 2018;44(11):1859–1869.
- 14. Zhang Z. Identification of three classes of acute respiratory distress syndrome using latent class analysis. PeerJ. 2018;6:e4592.
- 15. Karpatne A, Khandelwal A, Boriah S, Kumar V. Predictive learning in the presence of heterogeneity and limited training data. In: Proceedings of the 2014 SIAM International Conference on Data Mining. SIAM; 2014. p. 253–261.
- 16. McCutcheon AL. Latent class analysis. 64. Sage; 1987.
Wang T, Velez T, Apostolova E, Tschampel T, Ngo TL, Hardison J.
Semantically Enhanced Dynamic Bayesian Network for Detecting Sepsis Mortality Risk in ICU Patients with Infection.arXiv preprint arXiv:180610174. 2018;.
- 18. Johnson AE, Pollard TJ, Shen L, Li-wei HL, Feng M, Ghassemi M, et al. MIMIC-III, a freely accessible critical care database. Scientific data. 2016;3:160035.
- 19. Fraley C, Raftery AE. Model-based clustering, discriminant analysis, and density estimation. Journal of the American Medical Informatics Association. 2002;97(458):611–631.
- 20. Yeo IK, Johnson RA. A new family of power transformations to improve normality or symmetry. Biometrika. 2000;87(4):954–959.
- 21. Friedman JH. Greedy function approximation: a gradient boosting machine. Annals of statistics. 2001;p. 1189–1232.
- 22. Breiman L. Random forests. Machine learning. 2001;45(1):5–32.
Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP.
SMOTE: synthetic minority over-sampling technique.
Journal of artificial intelligence research. 2002;16:321–357.
- 24. Carpenter J, Bithell J. Bootstrap confidence intervals: when, which, what? A practical guide for medical statisticians. Statistics in medicine. 2000;19(9):1141–1164.
DeLong ER, DeLong DM, Clarke-Pearson DL.
Comparing the areas under two or more correlated receiver operating characteristic curves: a nonparametric approach.Biometrics. 1988;44(3):837–845.
Appendix A Appendix
|Anion gap||First 24 hours||Minimum, Mean, Maximum||Continuous|
|Albumin||First 24 hours||Minimum, Mean, Maximum||Continuous|
|Albumin||Across hospital stay||Minimum||Continuous|
|Bilirubin||First 24 hours||Minimum, Mean, Maximum||Continuous|
|Creatinine||First 24 hours||Minimum, Mean, Maximum||Continuous|
|Chloride||First 24 hours||Minimum, Mean, Maximum||Continuous|
|Glucose||First 24 hours||Minimum, Mean, Maximum||Continuous|
|Lactate||First 24 hours||Minimum, Mean, Maximum||Continuous|
|Potassium||First 24 hours||Minimum, Mean, Maximum||Continuous|
|Sodium||First 24 hours||Minimum, Mean, Maximum||Continuous|
|BUN||First 24 hours||Minimum, Mean, Maximum||Continuous|
|pH||Across hospital stay||Minimum||Continuous|
|Blood Gas||SpO2||First 24 hours||Minimum, Mean, Maximum||Continuous|
|PCO2||First 24 hours||Minimum, Mean, Maximum||Continuous|
|PO2||First 24 hours||Minimum, Mean, Maximum||Continuous|
|Bicarbonate||First 24 hours||Minimum, Mean, Maximum||Continuous|
|Tidal volume||First 24 hours||Minimum, Mean, Maximum||Continuous|
|Oxygen saturation||First 24 hours||Minimum, Mean, Maximum||Continuous|
|FIO2||Across hospital stay||Maximum||Continuous|
|Hematology||PTT||First 24 hours||Minimum, Mean, Maximum||Continuous|
|INR||First 24 hours||Minimum, Mean, Maximum||Continuous|
|PT||First 24 hours||Minimum, Mean, Maximum||Continuous|
|WBC||First 24 hours||Minimum, Mean, Maximum||Continuous|
|Hematocrit||First 24 hours||Minimum, Mean, Maximum||Continuous|
|Bands||First 24 hours||Minimum, Mean, Maximum||Continuous|
|Platelet||First 24 hours||Minimum, Mean, Maximum||Continuous|
|Vital||Heart Rate||First 24 hours||Minimum, Mean, Maximum||Continuous|
|Systolic Blood Pressure||First 24 hours||Minimum, Mean, Maximum||Continuous|
|Diastolic Blood Pressure||First 24 hours||Minimum, Mean, Maximum||Continuous|
|Mean Airway Pressure||First 24 hours||Minimum, Mean, Maximum||Continuous|
|Respiratory Rate||First 24 hours||Minimum, Mean, Maximum||Continuous|
|Respiratory Rate||Across hospital stay||Maximum||Continuous|
|Temperature||First 24 hours||Minimum, Mean, Maximum||Continuous|
|GCS||Total||First 24 hours||Minimum||Continuous|
|GCS motor||First 24 hours||Minimum||Continuous|
|GCS verbal||First 24 hours||Minimum||Continuous|
|GCS eye||First 24 hours||Minimum||Continuous|
|Output||Urine||Day 1, 2, 3||Continuous|
|High risk surgery||Binary(Yes/No)|