Inference of a Multi-Domain Machine Learning Model to Predict Mortality in Hospital Stays for Patients with Cancer upon Febrile Neutropenia Onset

02/21/2019 ∙ by Xinsong Du, et al. ∙ University of Florida 0

Febrile neutropenia (FN) has been associated with high mortality, especially among adults with cancer. Understanding the patient and provider level heterogeneity in FN hospital admissions has potential to inform personalized interventions focused on increasing survival of individuals with FN. We leverage machine learning techniques to disentangling the complex interactions among multi domain risk factors in a population with FN. Data from the Healthcare Cost and Utilization Project (HCUP) National Inpatient Sample and Nationwide Inpatient Sample (NIS) were used to build machine learning based models of mortality for adult cancer patients who were diagnosed with FN during a hospital admission. In particular, the importance of risk factors from different domains (including demographic, clinical, and hospital associated information) was studied. A set of more interpretable (decision tree, logistic regression) as well as more black box (random forest, gradient boosting, neural networks) models were analyzed and compared via multiple cross validation. Our results demonstrate that a linear prediction score of FN mortality among adults with cancer, based on admission information is effective in classifying high risk patients; clinical diagnoses is the domain with the highest predictive power. A number of the risk variables (e.g. sepsis, kidney failure, etc.) identified in this study are clinically actionable and may inform future studies looking at the patients prior medical history are warranted.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Febrile neutropenia (FN) is a life-threatening condition affecting more than sixty thousand people (1) in the US each year in which individuals develop a high fever with a concomitantly very low count of neutrophil granulocytes (white blood cells). FN appears as a complication of chemotherapy due to myelosuppression, which may lead to dose density reduction of chemotherapy and reduce cancer cure rates (2). FN has been associated with high mortality, morbidity, and healthcare cost (3). Several scores have been developed with traditional inferential methods to identify low-risk patients for FN but have yielded varying prediction performances (4; 5; 6; 7). However, few work regarding FN mortality prediction has been done with machine learning methods despite its widely use in medical field (8). A limitation of statistical techniques using it that many require proprietary data. Moreover, previous scoring systems (e.g., MACCS) only used a limited number of patient-level factors and no interactions (4)

. Therefore, we tried machine learning algorithms and employed variables across different domains to build prediction models, as well as analyzing the reasons behind FN deaths. According to previous works, patient-level variables associated with FN mortality include age, number of comorbidities, length of stay in intensive care unit, lung disease, hepatic disease, pulmonary embolism, renal disease, cerebrovascular infections, sepsis/bacteremia, cancer type, severity of neutropenia, vitals instability, severity of dehydration, creatinine level, platelet count, protein level, respiratory rate, pulmonary infiltration, C-reactive protein, and estimated glomerular filtration rate

(9; 10; 11; 12; 13; 14; 15; 16; 17).

Publicly available electronic health record data can provide valuable and reproducible insight (18). Public sources have been used previously with success for large-scale machine learning modelling (19). The Healthcare Cost and Utilization Project (HCUP) databases –used in this study– collate hospital administrative data nationwide in the United States since 1988 (20). To date, HCUP includes the largest collection of longitudinal hospital care data in the US, covering 48 states. A number of studies related to FN have been conducted within HCUP (21; 22; 23). In this paper, we aim at developing a machine-learning-based prediction model of mortality for cancer patients who are admitted to the hospital and developed FN. Differently from prior studies, our work investigates the contribution of different information domains (clinical, demographic, hospital-related), both independently and together, under a rigorous complexity-based model selection framework (24). The code is available at https://github.com/GalaxyDream/FN_Mortality.

2 Materials and Methods

This study is compliant to the NIS checklist (25; 26)

2.1 Data source, study population, and outcome

For our analysis, we used HCUP’s National Inpatient Sample (27) and Nationwide Inpatient Sample data (28) between January 2007 and September 2015. These data sets use a single ontology system, International Classification of Diseases-Ninth revision (ICD-9) and their in-house developed medical ontology –the Clinical Classifications Software (CCS) (29). Our target population was adults with neoplasia presenting with or developing FN during a single hospital stay. Patients with a diagnosis of cancer, defined by CCS codes 11-45 and their corresponding ICD-9 codes (listed in Appendix A), were included. Patients younger than 18 years of age or missing age information were excluded. Patients with FN were identified by the following ICD-9 codes: 780.6 for fever and 112.5, 284.1, 288.0 for neutropenia (18; 23; 22)

. We included ICD-9 codes 288.01 (congenital neutropenia) and 288.02 (cyclic neutropenia) since our aim is to analyze cancer patients with FN regardless of underlying etiology. We added binary variables as predictors to keep track of congenital neutropenia and cyclic neutropenia. The outcome variable was death. All visits with missing information on the mortality status of the patient were removed. We also excluded the fourth quarter of 2015, since it was undergoing an update during the time of our analyses.

2.2 Study covariates and data domains

All variables included in all 2007-2015 NIS databases were considered, then filtered and divided into domains. The three variable domains were: 1) demographic, including age, gender, race, payer (e.g. self-pay, private insurance, etc.), and median household income for patient’s ZIP code; 2) clinical which includes diagnoses, clinical procedures, Charlson’s Comorbidity Index (30), number of diagnosis/procedures, and number of chronic conditions (31)

; and 3) hospital-related information, such as admission time, admission type (urgent or elective), hospital bed size, hospital region, hospital ownership (e.g. government, private, etc.), indicator of emergency department, and hospital location/teaching status. All nominal/categorical variables were recoded as binary. Since some statistical learning models require that inputs are on a homogeneous scale, we standardized all numerical variables

(32)

. Missing or invalid values for continuous or ordinal variables were replaced by the mean value, while that of categorical ones were imputed by the most common value

(33).

2.3 Statistical analyses and model selection framework

We performed descriptive statistics based on the deceased/alive status and estimated time trends of FN mortality rates over FN discharges, FN rate over cancer diagnoses, and cancer rate over all discharges. All hospitalizations included in the analysis were weighted with discharge weights provided by NIS. We performed univariate tests among the variables and outcome correcting p-values

(34) with Bonferroni’s method (35).

This study used the following statistical learning models: logistic regression with either LASSO or ridge regularization (36; 37); decision tree (shallow with and without the help of neural network, whose depth is no more than 5; and deep, whose depth is tuned between 6 and 20) (38); support vector machine (SVM) recursive feature elimination (RFE) with linear kernel (39; 40); random forest (41); gradient boosting tree (42); naïve Bayes (43)

; and multi-layer feed-forwards neural network

(44)

. Model parameters (regularization penalty, tree pruning, number of features selected on a random tree, number of trees in a forest, hidden-layer size, dropout rate, and learning rate of neural network) were tuned by three-fold cross-validation and grid-search. Since the shallow decision tree (DT) has a good interpretability but bad predictive power, we employed mimic learning to improve its performance on prediction. We used multilayer neural network as the teacher model, taking the predicted logits as data labels and feeding them to shallow DT

(45).

In addition, we tested several ontology systems: ICD-9, CCS, major diagnosis categories (MDC), diagnosis-related group (DRG), and their clinical/procedure sub-variations using a random forest model. Then, we performed the domain selection using the best-performing model on all the variables together. Due to the large number of input variables in some ontology systems (more than 6,000 variables for the clinical diagnoses domain using ICD-9), we only included variables with high Gini importance score (46).

The loss function used for model/domain selection was the area under the receiver operating characteristic (AUROC); we also calculated sensitivity, specificity, and optimal calibration point using Youden’s J index

(47)

. Model validation was executed through multiple cross-validation (MCV), specifically 10 runs of 10-fold cross validation. Finally, variable importance was evaluated by bootstrapped average decrease in impurity from gradient boosting and by means of odds ratios from logistic regression. We also drew the optimized shallow DT.

The R language for statistical computing (48), Python (49)

, Tensorflow

(50), and Scikit-Learn (51) were used to write scripts and perform all analyses.

(a)
(b)
(c)
Figure 1: (a) Yearly cancer rates over all NIS admissions (adult population), (b) FN diagnosis rate over total adult cancer diagnoses, and (c) mortality rate of adults with cancer and FN.
Characteristic All admissions Alive Deceased P-value a
Total N (%) 138,932 (100%) 132,423 (95.2%) 6,509 (4.8%)
Age median [IQR] 61 [50 – 70] 61 [49 - 70] 66 [56 – 75] <0.0001
Gender N (%)
    Female 68,635 (49.4%) 66,659 (49.7%) 3,634 (44.2%) <0.0001
    Missing, invalid, or inconsistent 4 (0.0%) 4 (0.0%) 0 (0.0%) -
Race/ethnicity N (%)
    White 91,735 (65.8%) 87,481 (66.1%) 4,254 (65.4%) 1.0000
    Black 12,014 (8.6%) 11,374 (8.6%) 640 (9.8%) 1.0000
    Hispanic 12,184 (8.8%) 11,590 (8.8%) 594 (9.1%) 1.0000
    Asian or Pacific Islander 4,068 (2.9%) 3,845 (2.9%) 223 (3.4%) 1.0000
    Native American 503 (0.4%) 472 (0.4%) 31 (0.5%) 1.0000
    Other 3,837 (2.8%) 3,635 (2.7%) 202 (3.1%) 1.0000
    Missing, invalid, or inconsistent 14,591 (10.5%) 14,026 (10.6%) 565 (8.5%) 1.0000
Charlson’s comorbidity index median [IQR] 2 [2 – 5] 2 [2 – 5] 3 [2 – 7] <0.0001
    Missing, invalid, or inconsistent 0 (0%) 0 (0%) 0 (0%) -
Number of diagnoses median [IQR] 13 [9 - 17] 12 [9 – 16] 18 [15 – 24] <0.0001
    Missing, invalid, or inconsistent 0 (0%) 0 (0%) 0 (0%) -
Number of procedures median [IQR] 1 [0 - 3] 1 [0 – 3] 4 [2 – 6] <0.0001
    Missing, invalid, or inconsistent 0 (0%) 0 (0%) 0 (0%) -
Number of chronic conditions median [IQR] 6 [4-7] 6 [4-7] 7 [5-8] <0.0001
    Missing, invalid, or inconsistent 0 (0%) 0 (0%) 0 (0%) -
Insurance type N (%)     Medicare 57,395 (41.3%) 53,897 (40.7%) 3,498 (53.7%) <0.0001
    Medicaid 16,080 (11.6%) 15,395 (11.6%) 685 (10.5%) 1.0000
    Private 57,935 (41.7%) 55,947 (42.2%) 1,988 (30.5%) <0.0001
    Self-pay 2,919 (2.1%) 2,777 (2.1%) 142 (2.2%) 1.0000
    No charge 380 (0.3%) 366 (0.3%) 14 (0.2%) 1.0000
    Other 3,932 (2.8%) 3,763 (2.8%) 169 (2.6%) 1.0000
    Missing, invalid, or inconsistent 291 (0.2%) 278 (0.2%) 13 (0.2%) 1.0000
Median household income per ZIP code N (%) b
    1stquartile ($1 - $41,999) 31,266 (22.5%) 29,687 (22.4%) 1,539 (23.6%) 1.0000
    2nd quartile ($42,000 - $51,999) 33,285 (24.0%) 31,762 (24.0%) 1,523 (23.4%) 1.0000
    3rd quartile ($52,000 - $67,999) 34,977 (25.2%) 33,325 (25.2%) 1,672 (25.6%) 1.0000
    4th quartile ($68,000+) 36,214 (26.1%) 34,585 (26.1%) 1,629 (25.2%) 1.0000
    Missing, invalid, or inconsistent 3,190 (2.3%) 3,064 (2.3%) 146 (2.2%) 1.0000
Hospital location/teaching status N (%)
    Rural 9,584 (6.9%) 9,195 (6.9%) 389 (6.0%) 1.0000
    Urban nonteaching 35,804 (25.9%) 34,013 (25.7%) 1,791 (27.5%) 1.0000
    Urban teaching 92,781 (66.7%) 88,488 (66.8%) 4,293 (66.0%) 1.0000
    Missing, invalid, or inconsistent 763 (0.6%) 727 (0.5%) 47 (0.7%) 1.0000
Hospital region N (%)
    Northeast 25,445 (18.3%) 24,275 (18.3%) 1,170 (18.0%) 1.0000
    Midwest 33,384 (24.0%) 31,904 (24.1%) 1,480 (22.7%) 1.0000
    South 53,240 (38.3%) 50,754 (38.3%) 2,486 (38.2%) 1.0000
    West 26,863 (19.3%) 25,490 (19.2%) 1,373 (21.1%) 1.0000
    Missing, invalid, or inconsistent 0 (0%) 0 (0%) 0 (0%) -
Total Cost [$] median (IQR) $38,350 [19,062 – 97,023.5] $37,283 [18,736.5 – 92,472] $80,097 [32,812.5 – 189,682] <0.0001
    Missing, invalid, or inconsistent 3,176 (2.2%) 2,999 (2.2%) 177 (2.7%) 1.0000
a Bonferroni-adjusted
b Adjusted by year
Table 1: Demographic characteristics of the study sample
Model AUROC average SE Sensitivity average SE Specificity average SE Cutoff average SE a P-value b
Gradient Boosting Tree 0.92 0.00 0.82 0.01 0.86 0.00 0.23 0.01 Ref.
Random Forest 0.92 0.00 0.82 0.01 0.86 0.01 0.22 0.01 0.4869
Deep Neural Network 0.91 0.00 0.82 0.01 0.86 0.00 0.22 0.01 0.1853
Linear SVM-RFE 0.91 0.00 0.82 0.01 0.85 0.01 0.22 0.01 0.0475
Logistic Regression (ridge) 0.91 0.00 0.81 0.01 0.85 0.01 0.22 0.01 0.0001
Logistic Regression (lasso) 0.89 0.01 0.80 0.01 0.84 0.01 0.20 0.01 <0.0001
Gaussian Naïve Bayes 0.88 0.00 0.78 0.01 0.83 0.01 0.19 0.01 <0.0001
Deep Decision Tree 0.87 0.00 0.76 0.01 0.88 0.01 0.24 0.00 <0.0001
Shallow Decision Tree with Mimic Learning 0.85 0.00 0.68 0.00 0.89 0.00 0.23 0.01 <0.0001
Shallow Decision Tree 0.85 0.00 0.67 0.00 0.93 0.00 0.32 0.01 <0.0001
One-Rule c 0.74 0.02 0.56 0.07 0.93 0.10 0.36 0.09 <0.0001
a Calculated via Youden’s Index.
b

Bengio’s corrected t-test between the best model and others.

c Respiratory failure/insufficiency/arrest was selected for 9 times while total number of diagnoses was selected once by one-rule model.
Table 2:

Model selection on the full feature set (10x10-fold cross validation) with standard errors (SE)

Domain(s) No. of variables AUROC average SE Sensitivity average SE Specificity average SE Cutoff average SE a P-value b
All 85 0.92 0.00 0.82 0.01 0.86 0.00 0.23 0.01 Ref.
Clinical diagnoses (CCS) 10/251 c 0.89 0.00 0.78 0.00 0.86 0.01 0.22 0.00 <0.0001
Clinical procedures (CCS) 20/216 d 0.80 0.00 0.60 0.00 0.90 0.00 0.23 0.01 <0.0001
Total number of diagnoses 1 0.76 0.00 0.77 0.02 0.62 0.02 0.09 0.00 <0.0001
Total number of procedures 1 0.71 0.00 0.64 0.00 0.70 0.01 0.10 0.00 <0.0001
Demographics 19 0.62 0.00 0.66 0.02 0.51 0.02 0.06 0.00 <0.0001
Number of chronic conditions (unweighted) 1 0.61 0.00 0.66 0.02 0.51 0.02 0.06 0.00 <0.0001
Charlson’s Comorbidity Index 1 0.60 0.00 0.56 0.01 0.61 0.02 0.07 0.00 <0.0001
Hospital information 17 0.55 0.00 0.56 0.03 0.52 0.03 0.06 0.00 <0.0001
Admission information 15 0.54 0.00 0.63 0.02 0.44 0.03 0.05 0.00 <0.0001
a Calculated via Youden’s Index.
b Bengio’s corrected t-test between the best model and others.
c Number of features was shrunk from 251 to 10 by random forests (grid search result shows using the top 10 important variables lead to the best AUROC).
d Number of features was shrunk from 216 to 20 by random forests (grid search result shows using the top 20 important variables lead to the best AUROC).
Table 3: Comparison of prediction performance achieved by each single domain and all domains together, using gradient boosting (best model) and 10x10-fold CV
(a)
(b)
Figure 2: ROC curves (averaged from 10x10 fold CV) for model comparison (a) and domain comparison (b)

3 Results

A total of 66,702,602 discharges (unweighted) were included in HCUP NIS between January 2007 and September 2015. Of these, 138,932 discharges met the inclusion criteria (adults with cancer and FN with known mortality status after admission). Of these discharges, there were 6,509 (4.68%) deaths. Figure 1 shows the yearly prevalence of adult cancer discharges (A), FN rate among those with cancer (B), and FN mortality rate (C) in the study population, weighted by the NIS discharge sampling. All prevalence rates showed an increasing trend across years; the strongest was cancer diagnoses at about 0.3% increase/year (P<0.01).

Table 1 shows the demographic characteristics of the study sample and information related to the hospitals and charges. Patients who died were older (median age 66), male, had a higher Charlson’s Comorbidity Index, had a higher number of diagnoses, procedures, and chronic conditions, while people with private insurance coverage were more prevalent in the survivors’ group. We did not found relevant differences in hospital rurality, race, or macro-geographic region. Higher discharge costs were more frequent in admissions leading to death.

For prediction model inference, we first selected the best clinical ontology using random forest to compare ICD-9, CCS, MDC, DRG, and all patients refined DRG (APRDRG) (Table S in Appendix B). For clinical diagnoses, the CCS yielded the best AUROC of 88%, followed by APRDRG and ICD-9 (AUROC 86% and 85%), which were statistically significantly worse than CCS (p<0.01). For clinical procedures, CCS also had the best performance (78% AUROC), and ICD-9 had a similar performance (77%), both of which outperformed procedure class significantly (p<0.01).

Once the best ontology was determined, we selected the best machine learning model using the integrated domain sets. Table 2 reports AUROC, sensitivity, and specificity distributions obtained from the MCV. Gradient boosting tree led to the highest AUROC of 92%, with a sensitivity of 82% and specificity of 86%. Random forest, deep neural network (optimized to 6 hidden layers), linear SVM-RFE, and ridge logistic regression also achieved high AUROCs (>90%) comparable to that of gradient boosting tree and were significantly better than other methods (p<0.01). Besides AUROC, random forest, gradient boosting tree, neural network, and linear SVM-RFE had similarly high sensitivity of 82%, while the shallow DT and one-rule model achieved the highest specificity of 93%.

The best-performing gradient boosting tree model was then used to evaluate the predictive ability of each domain. Table 3 shows AUROC, sensitivity, specificity, and the optimized cutoff for each domain. Clinical diagnoses yielded the best AUROC of 89%. Clinical procedures, number of diagnoses and number of procedures resulted in AUROC between 70-80%, followed by demographics (62%), number of chronic conditions (61%), Charlson’s Comorbidity Index (60%) [43], hospital information (55%), and admission information (both 54%). Clinical diagnoses had the highest sensitivity of 78%, while clinical procedures showed the best specificity of 90%, outperforming the combination of all domains. However, when all these domains were combined, the AUROC and sensitivity were higher than any of the individual domains. ROC curves of main model comparison and domain comparison results are shown in Figure 2.

Figure 3: Evaluation of variable importance (top-15) via Gradient Boosting Tree (measured as average decrease in accuracy by variable permutation) over 10 bootstrap runs
Variable Crude OR [95% CI] Crude P-value Adjusted OR [95% CI] Adjusted P-value
Cardiac arrest and ventricular fibrillation 110.99 [90.46, 136.18] <0.001 14.84 [11.31, 19.47] <0.001
Conversion of cardiac rhythm 46.68 [40.26, 54.31] <0.001 4.67 [3.74, 5.85] <0.001
Respiratory failure; insufficiency; arrest (adult) 26.83 [25.38, 28.37] <0.001 4.54 [4.17, 4.95] <0.001
Respiratory intubation and mechanical ventilation 42.15 [39.58, 44.90] <0.001 3.82 [3.45, 4.23] <0.001
Other aftercare 2.70 [2.57, 2.84] <0.001 2.77 [2.59, 2.96] <0.001
Acute and unspecified renal failure 6.96 [6.61, 7.33] <0.001 2.34 [2.18, 2.51] <0.001
Shock 18.34 [17.26, 19.50] <0.001 2.11 [1.92, 2.31] <0.001
Septicemia (except in labor) 6.10 [5.79, 6.26] <0.001 1.71 [1.59, 1.85] <0.001
Other injuries and conditions due to external cause 4.64 [4.41, 4.90] <0.001 1.64 [1.52, 1.76] <0.001
Pneumonia (except that caused by tuberculosis or sexually transmitted disease) 4.28 [4.07, 4.51] <0.001 1.4 [1.33, 1.52] <0.001
Number of diagnoses – low (1-9) Ref.
    of diagnoses – medium (10-19) 4.58 [4.19, 5.02] <0.001 1.37 [1.24, 1.52] <0.001
    Number of diagnoses – high (20 or more) 14.93 [13.59, 16.40] <0.001 1.08 [0.96, 1.23] 0.187
Age (per 10 years older) 1.29 [1.27, 1.32] <0.001 1.29 [1.26, 1.32] <0.001
Number of procedures 1.23 [1.22, 1.24] <0.001 1.05 [1.04, 1.06] <0.001
Table 4: Odds ratios (OR) from the ridge logistic regression (coefficient threshold: 0.1)
Figure 4:

Shallow decision tree with mimic learning. Each box indicates the number of patients (percentage within sample) and probability of death

We then performed single-variable importance analysis. Respiratory failure was chosen by one-rule model as the most informative variable 9 times while the number of diagnoses was selected once. For multivariable analysis, we summarized importance according to the gradient boosting tree and ridge logistic regression in

Figure 3 and Table 4. In detail, age, respiratory intubation and mechanical ventilation, respiratory failure, cardiac arrest and ventricular fibrillation, shock, other aftercare (e.g. follow-up exam after treatment for malignant neoplasm), acute and unspecified renal failure, septicemia, conversion of cardiac rhythm, and number of diagnoses/procedures were identified by both models as important variables associated with FN mortality. Finally, to summarize the importance of single variables in different population sub-strata, we show the optimized shallow DT in Figure 4: respiratory failure, number of diagnoses, other aftercare, shock, renal failure, and ventricular fibrillation, bone marrow transplant receipt, and respiratory intubation and mechanical ventilation were selected as split-decision rules that stratified the study population into different subgroups with different mortality rates.

4 Discussion

In this study, we used nine years of electronic medical records from a large national sample with machine learning to develop interpretable models for predicting FN mortality. The dataset used in our study, HCUP’s NIS, represents 95% of the US population, making it the largest publicly available all-payer inpatient healthcare database. The large sample size of the NIS is accompanied by fine-grained, structured recording of clinical diagnostic and procedure codes for all visits. The strengths of our study are not only the large sample/variable size, but also in the evaluation of predictors’ importance across different domains. Results show that many machine learning models are capable for exploiting high-dimensional hospital admission data on FN mortality prediction, and clinical diagnoses is the domain with the highest predictive power.

In our study population, we found that the mortality rate ranged from 3.36% in 2007 to 5.28% in 2014 with an increasing trend. Abbasi et al. (52) also used the NIS to study the incidence of FN and death rates, not restricted to adults or to those diagnosed with cancer. They found that the mortality rate is rather stable from 2007 to 2012 and ranges from 1.06% to 1.28%. Adult FN patients with cancer may have a higher mortality rate, despite the usage of granulocyte colony stimulating factor (G-CSF) which is used as a prophylaxis for chemotherapy-induced FN (3).

Although patients with Medicare had higher rates of mortality, this is most likely related to the age group of this payer population (65 and over). The household income in the area of residence and hospital-related variables were not significantly different between the deceased and alive patient populations.

For the development of a prediction model for mortality, we compared a number of linear and non-linear machine learning approaches, both complex “black-boxes” and more interpretable ones. One previous work, by Hui et al. (53), also used machine learning to predict adverse health outcomes (development of severe complication or death) of FN patients, but only a single method was analysed (a shallow artificial neural network) and the sample size was relatively small (n=227).

In our all-domains analysis, although the best performing method was gradient boosting tree (a nonlinear ensemble), we found that a linear score derived via linear SVM-RFE was not significantly inferior at 0.01 alpha level. Furthermore, we found that the usage of mimic learning (of a neural network) can help improve performance of simple, interpretable methods like decision trees (that in many cases do not provide satisfactory performance and therefore cannot be used in clinical practice). Similar work has been done by Che et al. (54). We believe that mimic learning should be considered in the development of risk prediction models because it can reduce model complexity and increase interpretability, easing the understanding of mechanics behind the biological or disease processes and minimizing the hurdles of being used in clinical practice (24).

Domain-specific analysis indicated that clinical diagnoses have the highest discriminative ability for predicting FN death, followed by clinical procedures (which also showed the highest specificity), number of diagnoses/procedures, demographics, number of chronic conditions, CCI, and hospital-admission information. The total number of clinical diagnoses yielded the second-best sensitivity of 0.77, which is only 0.01 inferior to the best one and an AUROC of 0.76; of note, the Charlson’s comorbidity index was not as good as the raw number of diagnoses. The number of diagnoses is positively related to the physician’s workload (55), and in our analyses, it was identified as an important variable by most of the linear (including one-rule) and non-linear models; also, the optimized decision tree showed that patients with the number of diagnoses no more than 12 and without acute respiratory failure have a probability of death near zero.

Our inter-domain variable analysis validates the findings of previous studies: old age, sepsis, respiratory failure, heart disease, renal disease, pneumonia, liver disease, and higher Charlson’s comorbidity index were associated with the death of FN patients (3; 9; 12; 14; 16). We also found that external injuries, total number of discharges in a hospital, and aftercare were associated with death. Furthermore, cyclic and congenital neutropenia were not selected as important factors, which might indicate that the etiology of neutropenia has little relationship with the probability of death.

One of the limitations of our work is that we used discharge-level data, which might lead to multiple observations (admissions) per patient: even though we corrected the yearly mortality estimates, this may affect the odds ratio as well as the cross-validation estimates. The unavailability of patient identifiers does not permit us to test mixed models. Moreover, our data is historical, from which the FN mortality rate might not reflect current value due to the increasing availability and use of G-CSF. Also, our data is an administrative database, which may lead to an under-estimation of clinical events frequency, since some FN events may have occurred but were not in the patient’s billing. Another limitation is the lack of laboratory test data and of clinical notes. Indeed, a number of risk factors based on laboratory testing were identified by previous works (11; 12; 15). Furthermore, we could not comparatively evaluate the predictive power of some baseline scoring systems like MASCC that are based on physician’s notes (10). From modeling point of view, mimic learning did not improve the specificity of the shallow decision tree when using Youden’s Index to select the cut-off. Thus, mimic learning seems not add much to the student model, and further analysis regarding how to maximize the performance of interpretable machine learning models is warranted.

5 Conclusions

Hospital admission information contains a high information signal that –through machine learning– can be exploited for classifying high risk patients of FN mortality among adults with cancer. The usage of interpretable models can further help to identify and act pre-emptively on a number of prior and current risk variables such as sepsis, liver disease, and kidney failure; therefore, further studies looking at the patient prior medical history are warranted.

Conflict of interest

None.

Acknowledgements

We thank all the HCUP partners listed on the following webpage: https://www.hcup-us.ahrq.gov/db/hcupdatapartners.jsp.

This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors.

Authors’ Contributions

SC, JM, RB, XD designed the initial study protocol. SC, MP, JM provided critical suggestions on the clinical part of study design. MP, JM, XD provided critical suggestions on the statistical and machine learning related study design. XD wrote codes for all experiments. MP, JM, XD double checked correctness of codes and experiment results. SC, MP, DJL, JM, RB, XD wrote and modified the manuscript.

References

  • (1) R. A, A. A, F. S, S. W, Febrile Neutropenia in Cancer Patient: Epidemiology, Microbiology, Pathophysiology and Management, Journal of Cancer Prevention & Current Research 5 (3) (2016) 1–0. doi:10.15406/jcpcr.2016.5.00165.
    URL http://medcraveonline.com/JCPCR/JCPCR-05-00165
  • (2) G. H. Lyman, S. L. Michels, M. W. Reynolds, R. Barron, K. S. Tomic, J. Yu, Risk of mortality in patients with cancer who experience febrile neutropenia, Cancer 116 (23) (2010) 5555–5563. doi:10.1002/cncr.25332.
    URL https://onlinelibrary.wiley.com/doi/abs/10.1002/cncr.25332
  • (3) G. H. Lyman, N. M. Kuderer, Epidemiology of Febrile Neutropenia, Supportive Cancer Therapy 1 (1) (2003) 23–35. doi:10.3816/SCT.2003.n.002.
    URL http://linkinghub.elsevier.com/retrieve/pii/S1543291213600764
  • (4) J. Klastersky, M. Paesmans, E. B. Rubenstein, M. Boyer, L. Elting, R. Feld, J. Gallagher, J. Herrstedt, B. Rapoport, K. Rolston, J. Talcott, The Multinational Association for Supportive Care in Cancer Risk Index: A Multinational Scoring System for Identifying Low-Risk Febrile Neutropenic Cancer Patients, Journal of Clinical Oncology 18 (16) (2000) 3038–3051. doi:10.1200/JCO.2000.18.16.3038.
    URL http://ascopubs.org/doi/10.1200/jco.2000.18.16.3038
  • (5) A. Carmona-Bayonas, P. Jiménez-Fonseca, J. Virizuela Echaburu, M. Antonio, C. Font, M. Biosca, A. Ramchandani, J. Martínez, J. Hernando Cubero, J. Espinosa, E. Martínez de Castro, I. Ghanem, C. Beato, A. Blasco, M. Garrido, Y. Bonilla, R. Mondéjar, M. Á. Arcusa Lanza, I. Aragón Manrique, A. Manzano, E. Sevillano, E. Castañón, M. Cardona, E. Gallardo Martín, Q. Pérez Armillas, F. Sánchez Lasheras, F. Ayala de la Peña, Prediction of Serious Complications in Patients With Seemingly Stable Febrile Neutropenia: Validation of the Clinical Index of Stable Febrile Neutropenia in a Prospective Cohort of Patients From the FINITE Study, Journal of Clinical Oncology 33 (5) (2015) 465–471. doi:10.1200/JCO.2014.57.2347.
    URL http://ascopubs.org/doi/abs/10.1200/jco.2014.57.2347
  • (6) L. de Souza Viana, J. C. Serufo, M. O. da Costa Rocha, R. N. Costa, R. C. Duarte, Performance of a modified MASCC index score for identifying low-risk febrile neutropenic cancer patients, Supportive Care in Cancer 16 (7) (2008) 841–846. doi:10.1007/s00520-007-0347-3.
    URL https://doi.org/10.1007/s00520-007-0347-3
  • (7) H. Moon, Y. J. Choi, S. H. Sim, Validation of the Clinical Index of Stable Febrile Neutropenia (CISNE) model in febrile neutropenia patients visiting the emergency department. Can it guide emergency physicians to a reasonable decision on outpatient vs. inpatient treatment?, PLOS ONE 13 (12) (2018) e0210019. doi:10.1371/journal.pone.0210019.
    URL https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0210019
  • (8) A. L. Beam, I. S. Kohane, Big Data and Machine Learning in Health CareBig Data and Machine Learning in Health CareBig Data and Machine Learning in Health Care, JAMA 319 (13) (2018) 1317–1318. arXiv:https://jamanetwork.com/journals/jama/articlepdf/2675024/jama_beam_2018_vp_170174.pdf, doi:10.1001/jama.2017.18391.
    URL https://dx.doi.org/10.1001/jama.2017.18391
  • (9) J. Cupp, E. Culakova, M. S. Poniewierski, D. C. Dale, G. H. Lyman, J. Crawford, Analysis of Factors Associated With In-hospital Mortality in Lung Cancer Chemotherapy Patients With Neutropenia, Clinical Lung Cancer 19 (2) (2018) e163–e169. doi:10.1016/j.cllc.2017.10.013.
  • (10) C. J. Coyne, V. Le, J. J. Brennan, E. M. Castillo, R. A. Shatsky, K. Ferran, S. Brodine, G. M. Vilke, Application of the MASCC and CISNE Risk-Stratification Scores to Identify Low-Risk Febrile Neutropenic Patients in the Emergency Department, Annals of Emergency Medicine 69 (6) (2017) 755–764. doi:10.1016/j.annemergmed.2016.11.007.
  • (11) M. Günalp, M. Koyunoğlu, S. Gürler, A. Koca, I. Yeşilkaya, E. Öner, M. Akkaş, N. Metin Aksu, A. Demirkan, O. Polat, A. H. Elhan, Independent factors for prediction of poor outcomes in patients with febrile neutropenia, Medical Science Monitor: International Medical Journal of Experimental and Clinical Research 20 (2014) 1826–1832. doi:10.12659/MSM.892269.
  • (12) A. Lal, Y. Bhurgri, N. Rizvi, M. Virwani, R. U. Memon, W. Saeed, M. R. Sardar, P. Kumar, A. J. Shaikh, S. Adil, N. Masood, M. Khurshed, Factors influencing in-hospital length of stay and mortality in cancer patients suffering from febrile neutropenia, Asian Pacific journal of cancer prevention: APJCP 9 (2) (2008) 303–308.
  • (13) R. G. Rosa, L. Z. Goldani, Cohort study of the impact of time to antibiotic administration on mortality in patients with febrile neutropenia, Antimicrobial Agents and Chemotherapy 58 (7) (2014) 3799–3803. doi:10.1128/AAC.02561-14.
  • (14) J. Chindaprasirt, C. Wanitpongpun, P. Limpawattana, K. Thepsuthammarat, W. Sripakdee, K. Wirasorn, A. Sookprasert, Mortality, Length of Stay, and Cost Associated with Hospitalized Adult Cancer Patients with Febrile Neutropenia, Asian Pacific Journal of Cancer Prevention 14 (2) (2013) 1115–1119. doi:10.7314/APJCP.2013.14.2.1115.
    URL http://koreascience.or.kr/journal/view.jsp?kj=POCPA9&py=2013&vnc=v14n2&sp=1115
  • (15) A. H. Elhan, Independent Factors for Prediction of Poor Outcomes in Patients with Febrile Neutropenia, Medical Science Monitor 20 (2014) 1826–1832. doi:10.12659/MSM.892269.
    URL http://www.medscimonit.com/abstract/index/idArt/892269
  • (16) N. M. Kuderer, D. C. Dale, J. Crawford, L. E. Cosler, G. H. Lyman, Mortality, morbidity, and cost associated with febrile neutropenia in adult cancer patients, Cancer 106 (10) (2006) 2258–2266. doi:10.1002/cncr.21847.
  • (17) A. H. Osmani, A. A. Jabbar, M. K. Gangwani, B. Hassan, Outcomes of High Risk Patients with Febrile Neutropenia at a Tertiary Care Center, Asian Pacific journal of cancer prevention: APJCP 18 (10) (2017) 2741–2745. doi:10.22034/APJCP.2017.18.10.2741.
  • (18) C. Shah, X. Du, R. Bishnoi, J. Bian, Risk of mortality in adult cancer febrile neutropenia patients with a machine learning approach., Journal of Clinical Oncologydoi:10.1200/JCO.2018.36.15_suppl.e13562.
    URL http://ascopubs.org/doi/abs/10.1200/JCO.2018.36.15_suppl.e13562
  • (19) S. Rose, Machine Learning for Prediction in Electronic Health Data, JAMA Network Open 1 (4) (2018) e181404–e181404. doi:10.1001/jamanetworkopen.2018.1404.
    URL https://jamanetwork.com/journals/jamanetworkopen/fullarticle/2695072
  • (20) M. Rockville, Hcup databases. healthcare cost and utilization project (hcup). 2007-2015, Agency for Healthcare Research and Quality.
    URL https://doi.org/10.1186/s12885-015-1413-8
  • (21) E. Tai, G. P. Guy, A. Dunbar, L. C. Richardson, Cost of Cancer-Related Neutropenia or Fever Hospitalizations, United States, 2012, Journal of Oncology Practice 13 (6) (2017) e552–e561. doi:10.1200/JOP.2016.019588.
  • (22) E. L. Mueller, J. Croop, A. E. Carroll, Fever and neutropenia hospital discharges in children with cancer: A 2012 update, Pediatric Hematology and Oncology 33 (1) (2016) 39–48. doi:10.3109/08880018.2015.1102998.
  • (23) E. L. Mueller, K. J. Walkovich, R. Mody, A. Gebremariam, M. M. Davis, Hospital discharges for fever and neutropenia in pediatric cancer patients: United States, 2009, BMC Cancer 15 (1) (2015) 388. doi:10.1186/s12885-015-1413-8.
    URL https://doi.org/10.1186/s12885-015-1413-8
  • (24) M. Prosperi, J. S. Min, J. Bian, F. Modave, Big data hurdles in precision medicine and precision public health, BMC Medical Informatics and Decision Making 18 (1) (2018) 139. doi:10.1186/s12911-018-0719-2.
    URL https://doi.org/10.1186/s12911-018-0719-2
  • (25) R. Khera, H. M. Krumholz, With Great Power Comes Great Responsibility: Big Data Research From the National Inpatient Sample, Circulation. Cardiovascular Quality and Outcomes 10 (7). doi:10.1161/CIRCOUTCOMES.117.003846.
  • (26) R. Khera, S. Angraal, T. Couch, J. W. Welsh, B. K. Nallamothu, S. Girotra, P. S. Chan, H. M. Krumholz, Adherence to Methodological Standards in Research Using the National Inpatient Sample, JAMA 318 (20) (2017) 2011–2018. doi:10.1001/jama.2017.17653.
  • (27) M. Rockville, Hcup national inpatient sample (nis). healthcare cost and utilization project (hcup). 2012-2015. agency for healthcare research and quality.
    URL www.hcup-us.ahrq.gov/nisoverview.jsp
  • (28) M. Rockville, Healthcare cost and utilization project (hcup). 2007-2011. agency for healthcare research and quality.
    URL www.hcup-us.ahrq.gov/nisoverview.jsp
  • (29) M. Rockville, Hcup clinical classifications software (ccs) for icd-9-cm. healthcare cost and utilization project (hcup). 2007-2015. agency for healthcare research and quality.
    URL www.hcup-us.ahrq.gov/toolssoftware/ccs/ccs.jsp
  • (30) M. E. Charlson, P. Pompei, K. L. Ales, C. R. MacKenzie, A new method of classifying prognostic comorbidity in longitudinal studies: Development and validation, Journal of Chronic Diseases 40 (5) (1987) 373–383. doi:10.1016/0021-9681(87)90171-8.
    URL http://www.sciencedirect.com/science/article/pii/0021968187901718
  • (31) M. Rockville, Hcup chronic condition indicator (cci). healthcare cost and utilization project (hcup). 2007-2015. agency for healthcare research and quality.
    URL http://www.hcup-us.ahrq.gov/toolssoftware/chronic/chronic.jsp
  • (32) R. Tibshirani, The lasso method for variable selection in the Cox model, Statistics in Medicine 16 (4) (1997) 385–395.
  • (33) J. W. Grzymala-Busse, W. J. Grzymala-Busse, Handling Missing Attribute Values, in: The Data Mining and Knowledge Discovery Handbook, 2005. doi:10.1007/978-0-387-09823-4_3.
  • (34) C. Nadeau, Y. Bengio, Inference for the Generalization Error 7.
  • (35) R. A. Armstrong, When to use the Bonferroni correction, Ophthalmic & Physiological Optics: The Journal of the British College of Ophthalmic Opticians (Optometrists) 34 (5) (2014) 502–508. doi:10.1111/opo.12131.
  • (36) R. Tibshirani, Regression Shrinkage and Selection via the Lasso, Journal of the Royal Statistical Society. Series B (Methodological) 58 (1) (1996) 267–288.
    URL https://www.jstor.org/stable/2346178
  • (37) A. E. Hoerl, R. W. Kennard, Ridge Regression: Biased Estimation for Nonorthogonal Problems, Technometrics 12 (1) (1970) 55–67. doi:10.1080/00401706.1970.10488634.
    URL https://www.tandfonline.com/doi/abs/10.1080/00401706.1970.10488634
  • (38) J. R. Quinlan, Induction of decision trees, Machine Learning 1 (1) (1986) 81–106. doi:10.1007/BF00116251.
    URL http://link.springer.com/10.1007/BF00116251
  • (39) I. Guyon, J. Weston, S. Barnhill, V. Vapnik, Gene Selection for Cancer Classification using Support Vector Machines, Machine Learning 46 (1) (2002) 389–422. doi:10.1023/A:1012487302797.
    URL https://doi.org/10.1023/A:1012487302797
  • (40) C.-W. Hsu, C.-C. Chang, C.-J. Lin, A Practical Guide to Support Vector Classification 16.
  • (41) L. Breiman, Random Forests, Machine Learning 45 (1) (2001) 5–32. doi:10.1023/A:1010933404324.
    URL https://doi.org/10.1023/A:1010933404324
  • (42) T. Chen, C. Guestrin, XGBoost: A Scalable Tree Boosting System, in: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining - KDD ’16, ACM Press, San Francisco, California, USA, 2016, pp. 785–794. doi:10.1145/2939672.2939785.
    URL http://dl.acm.org/citation.cfm?doid=2939672.2939785
  • (43)

    I. Rish, An empirical study of the naive Bayes classifier 6.

  • (44) Y. LeCun, Y. Bengio, G. Hinton, Deep learning, Nature 521 (7553) (2015) 436–444. doi:10.1038/nature14539.
    URL http://www.nature.com/articles/nature14539
  • (45) J. Ba, R. Caruana, Do Deep Nets Really Need to be Deep?, in: Z. Ghahramani, M. Welling, C. Cortes, N. D. Lawrence, K. Q. Weinberger (Eds.), Advances in Neural Information Processing Systems 27, Curran Associates, Inc., 2014, pp. 2654–2662.
    URL http://papers.nips.cc/paper/5484-do-deep-nets-really-need-to-be-deep.pdf
  • (46) G. Louppe, L. Wehenkel, A. Sutera, P. Geurts, Understanding variable importances in forests of randomized trees 9.
  • (47) M. D. Ruopp, N. J. Perkins, B. W. Whitcomb, E. F. Schisterman, Youden Index and Optimal Cut-Point Estimated from Observations Affected by a Lower Limit of Detection, Biometrical journal. Biometrische Zeitschrift 50 (3) (2008) 419–430. doi:10.1002/bimj.200710415.
    URL https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2515362/
  • (48) R Core Team, R: A Language and Environment for Statistical Computing, R Foundation for Statistical Computing, Vienna, Austria (2013).
    URL www.hcup-us.ahrq.gov/databases.jsp
  • (49) G. Van Rossum, F. L. Drake, Python 3 Reference Manual, CreateSpace, Paramount, CA, 2009.
  • (50) M. Abadi, A. Agarwal, P. Barham, E. Brevdo, Z. Chen, C. Citro, G. S. Corrado, A. Davis, J. Dean, M. Devin, S. Ghemawat, I. Goodfellow, A. Harp, G. Irving, M. Isard, Y. Jia, R. Jozefowicz, L. Kaiser, M. Kudlur, J. Levenberg, D. Mane, R. Monga, S. Moore, D. Murray, C. Olah, M. Schuster, J. Shlens, B. Steiner, I. Sutskever, K. Talwar, P. Tucker, V. Vanhoucke, V. Vasudevan, F. Viegas, O. Vinyals, P. Warden, M. Wattenberg, M. Wicke, Y. Yu, X. Zheng, TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems 19.
  • (51) F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, É. Duchesnay, Scikit-learn: Machine Learning in Python, Journal of Machine Learning Research 12 (Oct) (2011) 2825–2830.
    URL http://www.jmlr.org/papers/v12/pedregosa11a.html
  • (52) S. Abbasi, B. Nazha, E. Moussaly, M. Manchanda, J. P. Atallah, Febrile neutropenia in the nationwide inpatient sample: In-hospital outcomes and impact of cormobidities in 2007-2012., Journal of Clinical Oncology 35 (15_suppl) (2017) e18103–e18103. doi:10.1200/JCO.2017.35.15_suppl.e18103.
    URL http://ascopubs.org/doi/abs/10.1200/JCO.2017.35.15_suppl.e18103
  • (53) E. P. Hui, L. K. S. Leung, T. C. W. Poon, F. Mo, V. T. C. Chan, A. T. W. Ma, A. Poon, E. K. Hui, S.-S. Mak, M. Lai, K. I. K. Lei, B. B. Y. Ma, T. S. K. Mok, W. Yeo, B. C. Y. Zee, A. T. C. Chan, Prediction of outcome in cancer patients with febrile neutropenia: a prospective validation of the Multinational Association for Supportive Care in Cancer risk index in a Chinese population and comparison with the Talcott model and artificial neural network, Supportive Care in Cancer: Official Journal of the Multinational Association of Supportive Care in Cancer 19 (10) (2011) 1625–1635. doi:10.1007/s00520-010-0993-8.
  • (54) Z. Che, S. Purushotham, R. Khemani, Y. Liu, Interpretable Deep Models for ICU Outcome Prediction, AMIA Annual Symposium Proceedings 2016 (2017) 371–380.
    URL https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5333206/
  • (55) N. V. Groningen, P. A. Prasad, N. Najafi, A. Rajkomar, R. R. Khanna, M. C. Fang, Electronic Order Volume as a Meaningful Component in Estimating Patient Complexity and Resident Physician Workload, Journal of Hospital Medicine 13 (12) (2018) 829–835. doi:10.12788/jhm.3069.

Appendix A ICD-9-CM codes for cancer discharges identification

1400, 1401, 1403, 1404, 1405, 1406, 1408, 1409, 1410, 1411, 1412, 1413, 1414, 1415, 1416, 1418, 1419, 1420, 1421, 1422, 1428, 1429, 1430, 1431, 1438, 1439, 1440, 1441, 1448, 1449, 1450, 1451, 1452, 1453, 1454, 1455, 1456, 1458, 1459, 1460, 1461, 1462, 1463, 1464, 1465, 1466, 1467, 1468, 1469, 1470, 1471, 1472, 1473, 1478, 1479, 1480, 1481, 1482, 1483, 1488, 1489, 1490, 1491, 1498, 1499, 1500, 1501, 1502, 1503, 1504, 1505, 1508, 1509, 1510, 1511, 1512, 1513, 1514, 1515, 1516, 1518, 1519, 1520, 1521, 1522, 1523, 1528, 1529, 1530, 1531, 1532, 1533, 1534, 1535, 1536, 1537, 1538, 1539, 1540, 1541, 1542, 1543, 1548, 1550, 1551, 1552, 1560, 1561, 1562, 1568, 1569, 1570, 1571, 1572, 1573, 1574, 1578, 1579, 1580, 1588, 1589, 1590, 1591, 1598, 1599, 1600, 1601, 1602, 1603, 1604, 1605, 1608, 1609, 1610, 1611, 1612, 1613, 1618, 1619, 1620, 1622, 1623, 1624, 1625, 1628, 1629, 1630, 1631, 1638, 1639, 1640, 1641, 1642, 1643, 1648, 1649, 1650, 1658, 1659, 1700, 1701, 1702, 1703, 1704, 1705, 1706, 1707, 1708, 1709, 1710, 1712, 1713, 1714, 1715, 1716, 1717, 1718, 1719, 1720, 1721, 1722, 1723, 1724, 1725, 1726, 1727, 1728, 1729, 17300, 17301, 17302, 17309, 17310, 17311, 17312, 17319, 17320, 17321, 17322, 17329, 17330, 17331, 17332, 17339, 17340, 17341, 17342, 17349, 17350, 17351, 17352, 17359, 17360, 17361, 17362, 17369, 17370, 17371, 17372, 17379, 17380, 17381, 17382, 17389, 17390, 17391, 17392, 17399, 1740, 1741, 1742, 1743, 1744, 1745, 1746, 1748, 1749, 1750, 1759, 1760, 1761, 1762, 1763, 1764, 1765, 1768, 1769, 179, 1800, 1801, 1808, 1809, 181, 1820, 1821, 1828, 1830, 1832, 1833, 1834, 1835, 1838, 1839, 1840, 1841, 1842, 1843, 1844, 1848, 1849, 185, 1860, 1869, 1871, 1872, 1873, 1874, 1875, 1876, 1877, 1878, 1879, 1880, 1881, 1882, 1883, 1884, 1885, 1886, 1887, 1888, 1889, 1890, 1891, 1892, 1893, 1894, 1898, 1899, 1900, 1901, 1902, 1903, 1904, 1905, 1906, 1907, 1908, 1909, 1910, 1911, 1912, 1913, 1914, 1915, 1916, 1917, 1918, 1919, 1920, 1921, 1922, 1923, 1928, 1929, 193, 1940, 1941, 1943, 1944, 1945, 1946, 1948, 1949, 1950, 1951, 1952, 1953, 1954, 1955, 1958, 1960, 1961, 1962, 1963, 1965, 1966, 1968, 1969, 1970, 1971, 1972, 1973, 1974, 1975, 1976, 1977, 1978, 1980, 1981, 1982, 1983, 1984, 1985, 1986, 1987, 19881, 19882, 19889, 1990, 1991, 1992, 20000, 20001, 20002, 20003, 20004, 20005, 20006, 20007, 20008, 20010, 20011, 20012, 20013, 20014, 20015, 20016, 20017, 20018, 20020, 20021, 20022, 20023, 20024, 20025, 20026, 20027, 20028, 20030, 20031, 20032, 20033, 20034, 20035, 20036, 20037, 20038, 20040, 20041, 20042, 20043, 20044, 20045, 20046, 20047, 20048, 20050, 20051, 20052, 20053, 20054, 20055, 20056, 20057, 20058, 20060, 20061, 20062, 20063, 20064, 20065, 20066, 20067, 20068, 20070, 20071, 20072, 20073, 20074, 20075, 20076, 20077, 20078, 20080, 20081, 20082, 20083, 20084, 20085, 20086, 20087, 20088, 20100, 20101, 20102, 20103, 20104, 20105, 20106, 20107, 20108, 20110, 20111, 20112, 20113, 20114, 20115, 20116, 20117, 20118, 20120, 20121, 20122, 20123, 20124, 20125, 20126, 20127, 20128, 20140, 20141, 20142, 20143, 20144, 20145, 20146, 20147, 20148, 20150, 20151, 20152, 20153, 20154, 20155, 20156, 20157, 20158, 20160, 20161, 20162, 20163, 20164, 20165, 20166, 20167, 20168, 20170, 20171, 20172, 20173, 20174, 20175, 20176, 20177, 20178, 20190, 20191, 20192, 20193, 20194, 20195, 20196, 20197, 20198, 20200, 20201, 20202, 20203, 20204, 20205, 20206, 20207, 20208, 20210, 20211, 20212, 20213, 20214, 20215, 20216, 20217, 20218, 20220, 20221, 20222, 20223, 20224, 20225, 20226, 20227, 20228, 20230, 20231, 20232, 20233, 20234, 20235, 20236, 20237, 20238, 20240, 20241, 20242, 20243, 20244, 20245, 20246, 20247, 20248, 20250, 20251, 20252, 20253, 20254, 20255, 20256, 20257, 20258, 20260, 20261, 20262, 20263, 20264, 20265, 20266, 20267, 20268, 20270, 20271, 20272, 20273, 20274, 20275, 20276, 20277, 20278, 20280, 20281, 20282, 20283, 20284, 20285, 20286, 20287, 20288, 20290, 20291, 20292, 20293, 20294, 20295, 20296, 20297, 20298, 20300, 20301, 20302, 20310, 20311, 20312, 20380, 20381, 20382, 20400, 20401, 20402, 20410, 20411, 20412, 20420, 20421, 20422, 20480, 20481, 20482, 20490, 20491, 20492, 20500, 20501, 20502, 20510, 20511, 20512, 20520, 20521, 20522, 20530, 20531, 20532, 20580, 20581, 20582, 20590, 20591, 20592, 20600, 20601, 20602, 20610, 20611, 20612, 20620, 20621, 20622, 20680, 20681, 20682, 20690, 20691, 20692, 20700, 20701, 20702, 20710, 20711, 20712, 20720, 20721, 20722, 20780, 20781, 20782, 20800, 20801, 20802, 20810, 20811, 20812, 20820, 20821, 20822, 20880, 20881, 20882, 20890, 20891, 20892, 20900, 20901, 20902, 20903, 20910, 20911, 20912, 20913, 20914, 20915, 20916, 20917, 20920, 20921, 20922, 20923, 20924, 20925, 20926, 20927, 20929, 20930, 20931, 20932, 20933, 20934, 20935, 20936, 20970, 20971, 20972, 20973, 20974, 20975, 20979, 2300, 2301, 2302, 2303, 2304, 2305, 2306, 2307, 2308, 2309, 2310, 2311, 2312, 2318, 2319, 2320, 2321, 2322, 2323, 2324, 2325, 2326, 2327, 2328, 2329, 2330, 2331, 2332, 23330, 23331, 23332, 23339, 2334, 2335, 2336, 2337, 2339, 2340, 2348, 2349, 2350, 2351, 2352, 2353, 2354, 2355, 2356, 2357, 2358, 2359, 2360, 2361, 2362, 2363, 2364, 2365, 2366, 2367, 23690, 23691, 23699, 2370, 2371, 2372, 2373, 2374, 2375, 2376, 2379, 2380, 2381, 2382, 2383, 2384, 2385, 2386, 23871, 23872, 23873, 23874, 23875, 23876, 23877, 23879, 2388, 2389, 2390, 2391, 2392, 2393, 2394, 2395, 2396, 2397, 23981, 23989, 2399, 25802, 25803, 2731, 2733, 27789, 28730, 28989, 51181, 72702, 78951, 79501, 79502, 79503, 79504, 79506, 79511, 79512, 79513, 79514, 79516, 79671, 79672, 79673, 79674, 79676, V1000, V1001, V1002, V1003, V1004, V1005, V1006, V1007, V1009, V1011, V1012, V1020, V1021, V1022, V1029, V103, V1040, V1041, V1042, V1043, V1044, V1045, V1046, V1047, V1048, V1049, V1050, V1051, V1052, V1053, V1059, V1060, V1061, V1062, V1063, V1069, V1071, V1072, V1079, V1081, V1082, V1083, V1084, V1085, V1086, V1087, V1088, V1089, V1090, V1091, V580, V5811, V5812

Appendix B Diagnose/Procedure Coding System Comparison

Domain Variable Name Reduced Dimension/Original Dimension a AUROC SE P-value b
Diagnoses Diagnose CCS c 20/251 0.884 0.000 Ref.
APRDRG d 80/1602 0.861 0.000 <0.001
ICD-9-CM 10/6272 0.853 0.001 <0.001
DRG e 160/642 0.805 0.000 <0.001
MDC f 10/6272 0.853 0.001 <0.001
Procedures Procedure CCS 20/216 0.778 0.001 Ref.
ICD-9-PCS 10/1697 0.772 0.001 0.134
Procedure Class g 4/4 0.657 0.000 <0.001
a Grid search with 3-fold cross validation among using top [10, 20, 40, 80, 160, 320, 640, 1000, 1500, 2000, …, 6000] variables
b Bengio’s corrected t-test between the best model and others
c CCS: Clinical Classifications Software;
d APRDRG: Expand basic DRG structure by adding four subclasses for disease severity and four subclasses for risk of mortality;
e DRG: Diagnostic-Related Groups;
f MDC: Major Diagnosis Categories;
g Procedure Class: Categorize all procedure ICD-9 codes into four groups, which are "minor diagnostic", "minor therapeutic", "major diagnostic", "major therapeutic"
Table S: Diagnose/procedure coding system selection using random forest with Area Under ROC (standard error) and P-values