New-onset diabetes mellitus after transplant (NODAT) has been reported in up to 25% of liver transplant recipients (Hadj Ali et al., 2011; Kesiraju et al., 2014; First et al., 2013; Lv et al., 2015). One of the risk factors comes from the use of immunosuppressant drugs, such as corticosteroids, calcineurin inhibitors, and mammalian target of rapamycin (mTOR) inhibitors, which are commonly prescribed during and/or after the transplant. These drugs are known to significantly impact metabolic balance, and are associated with diabetes, hypertension, obesity, and dyslipidemia among transplant patients (Charco et al., 1999). Previous studies have shown that some drugs, such as sirolimus, one of mTOR inhibitors, are more associated with diabetes than others and there’s a suspicion that higher levels would be associated with the condition, but there is no firm evidence on this.
Many diabetes risk calculators are already available for the general public (Kaczorowski et al., 2009; Heikes et al., 2008). However, recent review of type-2 diabetes risk-prediction models described many deficiencies in majority of them (Collins et al., 2011). There needs to be a tool that is suitable for liver transplant recipients, who are more vulnerable to diabetes, to account for medication information and clinical diagnosis.
Liver transplant recipients have an annual checkup with their clinicians. We would like to make a risk calculator that clinicians can use at each patient’s yearly checkup. At each visit, the clinician will then collect current patient’s data as well as his/her historical information from previous followups, when applicable, and use our calculator to assess the risk of diabetes.
Difference between Classification and Survival Analysis We can formulate the risk estimation in two ways. First, we can formulate it as a classification task where indicates that patient at followup visit will not develop diabetes in the near future, and otherwise. On the other hand, since diabetes is a chronic disease that is persistent and long-lasting (WHO, 2017), we can approach the problem using a time-to-event model such as survival analysis. Survival analysis assumes that the event will eventually happen for all individuals (Schober & Vetter, 2018). While this assumption does not apply in our case, we will compare both approaches and assess their performance.
Dealing with patient’s historic data Historical information can be useful in predicting diabetes-onset in the following year. In a later section, we will validate this by comparing the performance with and without incorporating historic data. Due to the variable length of historic data available for each patient (e.g. some patients have data for more than 20 years, while others have less), we would like to represent the historic data in a fixed-dimensional latent representation. One way we can incorporate the latent representation of historic data is by modelling the data with Deep Markov Model (DMM) (Krishnan et al., 2017) and using the hidden variable as our representation.
2 Related Works
Deep Survival Analysis
Recently, several approaches incorporated deep learning methods into survival analysis(Ranganath et al., 2016; Christ et al., 2017; Katzman et al., 2017). Both Christ et al. and Katzman et al.
use neural networks to predict either a risk function or patient’s survival. The difference between the two algorithms is the incorporation of Bayesian optimization for hyperparameter tuning byChrist et al. The introduction of non-linearity increases the algorithm’s power to fit the data better. However, unlike the classical Cox proportional-hazards (Therneau, 2015), these algorithms neglect time-varying covariates, which are crucial since we’re dealing with temporal data.
Survival Analysis with RNN (LSTM) architecture Several recent works have combined Recurrent Neural Network (RNN), specifically LSTM, with survival analysis (Martinsson, 2016; Giunchiglia et al., 2018; Grob et al., 2018). Martinsson’s Weibull Time To Event Recurrent Neural Network (WTTE-RNN) predicts time to event in real time. The idea of the algorithm is to make the RNN architecture learn to output the parameters of a Weibull distribution, estimating the time-to-event.
We used a mixed effects logistic regression (MELR)(Bates et al., 2015) and mixed effects random forest (MERF) (Hajjem et al., 2014) to predict onset of diabetes within the following year. One year in advance is sufficient to intervene and potentially prevent the onset of diabetes. We used a mixed effects model to handle the different number of visits per individual. Discussion on models trained only on one record per individual can be found in Appendix B.
Survival Analysis for Onset of Diabetes For survival analysis, we compared Cox proportional-hazards model (both regularized (Friedman et al., 2010; Simon et al., 2011) and unregularized (Therneau, 2015)), random survival forest (RSF) (Ishwaran & Kogalur, 2018), as well as DeepSurv (Katzman et al., 2017). As discussed earlier, other than unregularized Cox proportional-hazards model (CPH), these algorithms neglect time-varying covariates. Due to this constraint we had to treat each followup visit as a different observation (Kalbfleisch & Prentice, 2003). For comparison purposes, we included the unregularized Cox proportional-hazards with time-varying covariates included, as well as treating each followup visit as a different observation.
Latent Representation of Historic Data To incorporate the latent representation of a patient’s historical data, we trained a DMM and used the 2-dimensional latent representation as another feature when training and analyzing compared algorithms (see Figure 1
). We also trained models that directly incorporate measurements at the last clinical visit (1-lag) and measurements at the last clinical visit, penultimate, as well as the third to last visit (3-lag). If information from past visits were not available, we imputed the missingness with values of current observation.
Survival Analysis with LSTM While there are three recent models that use LSTM architecture, only WTTE-RNN (Martinsson, 2016) was able to run in reasonable time. RNN-Survival Model (Grob et al., 2018) code is available but is not well documented and RNN-SURV (Giunchiglia et al., 2018) code is not available yet.
Dataset Scientific Registry of Transplant Recipients (SRTR) is a collection of clinical data submitted by members of the Organ Procurement and Transplantation Network (OPTN) (Scientific Registry of Transplant Recipients, 2018). SRTR contains data both at the time of transplant and the follow-up visits from all transplant recipients in the United States. At the time of data curation, SRTR for liver transplant patients consisted of distinct patients having liver transplants from October 1st, 1987 to March 2nd, 2017. For this study, we included patients who are at least 18 years old and did not have diabetes at the time of transplant. We used 27 non-time-varying features and 5 time-varying features. See Appendix A for complete list of predictors and time-varying features. We split the dataset as follows: 80% of individual patients for training set, 10% validation set, and 10% held out (test) set. The reported results are presented on the held out (test) data.
We used area under the receiver operating characteristic (AUROC) to compare standard Mixed Effects Logistic Regression (MELR) and Mixed Effects Random Forests (MERF) to predict diabetes within the following year. An ROC curve plots the true positive rate (ratio of patients predicted as having diabetes among patients with diabetes) vs. false positive rate (ratio of patients predicted as having diabetes among patients without diabetes) at different classification thresholds. AUROC aggregates the performance measure across all possible classification thresholds (Google Developers, 2018). Each method was trained with 4 different sets of features: current-data only (no historic data was added), 1-lag, 3-lag, and DMM features (concatenation of a 2-dimensional latent representation of the patient’s historic data).
Table 1 shows that MELR was not able to learn to predict diabetes within the following year, but MERF was able to achieve AUROC of . As can be seen from Table 1, the addition of historical data (1-lag, 3-lag, or DMM feature) decreased the performance of the algorithm, showing that it was not able to effectively take advantage of additional historical data.
We also compared a variety of survival models: Cox proportional-hazards (CPH), CPH with L2-regularization (CPH-reg), CPH with appropriate time-varying covariates handling (CPH-time), DeepSurv, Random Survival Forests (RSF), and Weibull Time-to-Event RNN (WTTE-RNN). Since WTTE-RNN’s architecture itself incorporates historic data, we did not add additional 1-lag, 3-lag, or DMM feature options for this model. We used concordance-index (C-index) as a means to compare these methods. C-index compares estimated risk in pairs of patients, in our case those who have diabetes versus those that don’t, calling pairs concordant when patients with diabetes were estimated to have higher risk (Harrell et al., 1996). The concordance risk is then an indicator of an accurate ordering of patients with respect to time-to-event.
Table 2 shows a very unexpected result: CPH with L2-regularization outperformed all of the other methods. It was able to reach concordance with incorporation of historical data from the past 3 clinical visits as input. We should also note that the incorporation of DMM feature was not as predictive as including historical data from the past visits directly into the model, and even decreased the performance of RSF, DeepSurv, and CPH-time. This shows that latent representation obtained in unsupervised fashion using DMM is not as informative as one would have hoped.
Figure 2 provides a closer look at the performance of each algorithm. Specifically, Figure 2 shows the algorithms’ performance on diabetes-onset prediction across the years after liver transplant. All of the models have variable performance across time, especially MERF and RSF. This likely indicates that in our scenario random forest-based methods tend to overfit more easily while other methods are more robust. This performance variability may also partly be caused by the lack of stability of the time-varying predictors (See Appendix A, Figure 3
). It is also peculiar that DeepSurv and MELR’s performance spiked near the end. CPH’s performance was relatively the same across the different feature inputs. Overall, CPH and WTTE-RNN seemed to exhibit less variance over time and are thus more reliable and ultimately more clinically relevant.
We have compared a variety of models aiming to predict risk of new diabetes onset for post-liver transplant patients. Overall, the addition of deep latent embedding of historical data did not improve the performance of our models. On the other hand, the direct input of past visits from 1 to 3 years showed to be more informative for prediction. We recommend using an L2-regularized CPH model, with 1 to 3 years of historical visits input, to achieve a C-index of 86.3%-87.0% for further validation and ultimate use in clinical practice.
- Bates et al. (2015) Bates, Douglas, Mächler, Martin, Bolker, Ben, and Walker, Steve. Fitting linear mixed-effects models using lme4. Journal of Statistical Software, 67(1):1–48, 2015. doi: 10.18637/jss.v067.i01.
- Bays et al. (2007) Bays, H E, Chapman, R H, and Grandy, S. The relationship of body mass index to diabetes mellitus, hypertension and dyslipidaemia: comparison of data from two national surveys. International Journal of Clinical Practice, 61(5):737–747, May 2007. ISSN 1368-5031. doi: 10.1111/j.1742-1241.2007.01336.x. URL https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1890993/.
- Charco et al. (1999) Charco, Ramón, Cantarell, Carme, Vargas, Victor, Capdevila, Luis, Lázaro, Jose Luis, Hidalgo, Ernest, Murio, Enrique, and Margarit, Carlos. Serum cholesterol changes in long-term survivors of liver transplantation: A comparison between cyclosporine and tacrolimus therapy. Liver Transplantation and Surgery, 5(3):204–208, May 1999. ISSN 1527-6473. doi: 10.1002/lt.500050303.
Christ et al. (2017)
Christ, P. Ferdinand, Ettlinger, F., Kaissis, G., Schlecht, S., Ahmaddy, F.,
Grün, F., Valentinitsch, A., Ahmadi, S., Braren, R., and Menze, B.
SurvivalNet: Predicting patient survival from diffusion weighted magnetic resonance images using cascaded fully convolutional and 3d Convolutional Neural Networks.In 2017 IEEE 14th International Symposium on Biomedical Imaging (ISBI 2017), pp. 839–843, April 2017. doi: 10.1109/ISBI.2017.7950648.
- Collins et al. (2011) Collins, Gary S., Mallett, Susan, Omar, Omar, and Yu, Ly-Mee. Developing risk prediction models for type 2 diabetes: a systematic review of methodology and reporting. BMC Medicine, 9(1):103, September 2011. ISSN 1741-7015. doi: 10.1186/1741-7015-9-103. URL https://doi.org/10.1186/1741-7015-9-103.
- First et al. (2013) First, M. Roy, Dhadda, Shobha, Croy, Richard, Holman, John, and Fitzsimmons, William E. New-Onset Diabetes After Transplantation (NODAT): An Evaluation of Definitions in Clinical Trials. Transplantation, 96(1):58, July 2013. ISSN 0041-1337. doi: 10.1097/TP.0b013e318293fcf8.
- Friedman et al. (2010) Friedman, Jerome, Hastie, Trevor, and Tibshirani, Robert. Regularization paths for generalized linear models via coordinate descent. Journal of Statistical Software, 33(1):1–22, 2010. URL http://www.jstatsoft.org/v33/i01/.
- Giunchiglia et al. (2018) Giunchiglia, Eleonora, Nemchenko, Anton, and van der Schaar, Mihaela. Rnn-surv: a deep recurrent model for survival analysis. 27th International Conference on Artificial Neural Networks, October 2018.
Google Developers (2018)
Classification: ROC and AUC | Machine Learning Crash Course, 2018.URL https://developers.google.com/machine-learning/crash-course/classification/roc-and-auc.
- Grob et al. (2018) Grob, Georg L., Cardoso, Ângelo, Liu, C. H. Bryan, Little, Duncan A., and Chamberlain, Benjamin Paul. A Recurrent Neural Network Survival Model: Predicting Web User Return Time. arXiv:1807.04098 [cs, stat], July 2018. URL http://arxiv.org/abs/1807.04098. arXiv: 1807.04098.
Groll & Tutz (2014)
Groll, Andreas and Tutz, Gerhard.
Variable selection for generalized linear mixed models by l1-penalized estimation.Statistics and Computing, 2014.
- Hadj Ali et al. (2011) Hadj Ali, I., Adberrahim, E., Ben Abdelghani, K., Barbouch, S., Mchirgui, N., Khiari, K., Chérif, M., Ounissi, M., Ben Romhane, N., Ben Abdallah, N., Ben Abdallah, T., Ben Maiz, H., and Khedher, A. Incidence and Risk Factors for Post–Renal Transplant Diabetes Mellitus. Transplantation Proceedings, 43(2):568–571, March 2011. ISSN 0041-1345. doi: 10.1016/j.transproceed.2011.01.032.
- Hajjem et al. (2014) Hajjem, Ahlem, Bellavance, François, and Larocque, Denis. Mixed-effects random forest for clustered data. Journal of Statistical Computation and Simulation, 84(6):1313–1328, June 2014. ISSN 0094-9655. doi: 10.1080/00949655.2012.741599. URL https://doi.org/10.1080/00949655.2012.741599.
- Harrell et al. (1996) Harrell, Frank E., Lee, Kerry L., and Mark, Daniel B. Multivariable Prognostic Models: Issues in Developing Models, Evaluating Assumptions and Adequacy, and Measuring and Reducing Errors. Statistics in Medicine, 15(4):361–387, February 1996. ISSN 1097-0258. doi: 10.1002/(SICI)1097-0258(19960229)15:4<361::AID-SIM168>3.0.CO;2-4. URL http://onlinelibrary.wiley.com/doi/abs/10.1002/%28SICI%291097-0258%2819960229%2915%3A4%3C361%3A%3AAID-SIM168%3E3.0.CO%3B2-4.
- Heikes et al. (2008) Heikes, Kenneth E., Eddy, David M., Arondekar, Bhakti, and Schlessinger, Leonard. Diabetes Risk Calculator: A simple tool for detecting undiagnosed diabetes and pre-diabetes. Diabetes Care, 31(5):1040–1045, May 2008. ISSN 0149-5992, 1935-5548. doi: 10.2337/dc07-1150. URL http://care.diabetesjournals.org/content/31/5/1040.
- Ishwaran & Kogalur (2018) Ishwaran, H. and Kogalur, U.B. Random Forests for Survival, Regression, and Classification (RF-SRC), 2018. URL https://cran.r-project.org/package=randomForestSRC. R package version 2.7.0.
- Kaczorowski et al. (2009) Kaczorowski, Janusz, Robinson, Chris, and Nerenberg, Kara. Development of the CANRISK questionnaire to screen for prediabetes and undiagnosed type 2 diabetes. Canadian Journal of Diabetes, 33(4):381–385, January 2009. ISSN 1499-2671. doi: 10.1016/S1499-2671(09)34008-3. URL http://www.sciencedirect.com/science/article/pii/S1499267109340083.
- Kalbfleisch & Prentice (2003) Kalbfleisch, J. D. and Prentice, Ross L. The statistical analysis of failure time data. J. Wiley, 2003.
- Katzman et al. (2017) Katzman, Jared, Shaham, Uri, Cloninger, Alexander, Bates, Jonathan, Jiang, Tingting, and Kluger, Yuval. DeepSurv: personalized treatment recommender system using a Cox proportional hazards deep neural network. BMC Medical Research Methodology, 18(24), February 2017.
- Kesiraju et al. (2014) Kesiraju, Sailaja, Paritala, Purna, Rao Ch, Uma Maheswara, and Sahariah, S. New onset of diabetes after transplantation — An overview of epidemiology, mechanism of development and diagnosis. Transplant Immunology, 30(1):52–58, January 2014. ISSN 0966-3274. doi: 10.1016/j.trim.2013.10.006.
- Krishnan et al. (2017) Krishnan, Rahul G, Shalit, Uri, and Sontag, David. Structured inference networks for nonlinear state space models. In AAAI, 2017.
- Liaw & Wiener (2002) Liaw, Andy and Wiener, Matthew. Classification and regression by randomforest. R News, 2(3):18–22, 2002. URL https://CRAN.R-project.org/doc/Rnews/.
- Lv et al. (2015) Lv, Chaoyang, Zhang, Yao, Chen, Xianying, Huang, Xiaowu, Xue, Mengjuan, Sun, Qiman, Wang, Ting, Liang, Jing, He, Shunmei, Gao, Jian, Zhou, Jian, Yu, Mingxiang, Fan, Jia, and Gao, Xin. New-onset diabetes after liver transplantation and its impact on complications and patient survival. Journal of Diabetes, 7(6):881–890, November 2015. ISSN 1753-0407. doi: 10.1111/1753-0407.12275.
- Martinsson (2016) Martinsson, Egil. WTTE-RNN : Weibull Time To Event Recurrent Neural Network. Master’s thesis, Chalmers University Of Technology, 2016.
- (25) Perktold, Josef, Fulton, Chad, and Shedden, Kerby. Statsmodels. URL https://www.statsmodels.org/stable/index.html. [Online; accessed <today>].
- Ranganath et al. (2016) Ranganath, Rajesh, Perotte, Adler, Elhadad, Noémie, and Blei, David. Deep Survival Analysis. Journal of Machine Learning Research, 2016.
- Schober & Vetter (2018) Schober, Patrick and Vetter, Thomas R. Survival Analysis and Interpretation of Time-to-Event Data: The Tortoise and the Hare. Anesthesia & Analgesia, 127(3):792, September 2018. ISSN 0003-2999. doi: 10.1213/ANE.0000000000003653. URL https://journals.lww.com/anesthesia-analgesia/Fulltext/2018/09000/Survival_Analysis_and_Interpretation_of.32.aspx.
- Scientific Registry of Transplant Recipients (2018) Scientific Registry of Transplant Recipients. Data that drives development: the srtr database, 2018. URL https://www.srtr.org/about-the-data/the-srtr-database/.
- Sidi (2017) Sidi, Jonathan. Regularization and classification of linear mixed models via the elastic net penalty. 2017.
- Simon et al. (2011) Simon, Noah, Friedman, Jerome, Hastie, Trevor, and Tibshirani, Rob. Regularization paths for cox’s proportional hazards model via coordinate descent. Journal of Statistical Software, 39(5):1–13, 2011. URL http://www.jstatsoft.org/v39/i05/.
- Therneau (2015) Therneau, Terry M. A Package for Survival Analysis in S, 2015. URL https://CRAN.R-project.org/package=survival. version 2.38.
- WHO (2017) WHO. Diabetes, 2017. URL http://www.who.int/news-room/fact-sheets/detail/diabetes.
Appendix A Feature choices
For non-varying features, we included gender, ethnicity, race, age at transplant, primary diagnosis (hepatitis-c, hepatitis-b, non-alcoholic fatty liver disease, alcohol-related diagnosis, primary biliary cholangitis, primary sclerosing cholangitis, and autoimmune hepatitis), donor’s characteristics (age, gender, ethnicity, race, diabetes status, level of creatinine, history of smoking, hypertension, and inotropic support), whether the person had medication prescribed for hypertension, whether the person was hospitalized and in ICU, total cold ischemic time, ascites, spontaneous bacterial peritonitis, and status of previous abdomen surgery.
For time-varying features, we included BMI, height, weight, immunosuppressants prescribed, and whether the patient had any acute rejection episodes during the follow-up period. Figure 3 shows the distribution of each time-varying feature across the different years after liver transplant. Each cell’s colour represents the feature’s average scaled value for patients after certain number of years after transplant. For continuous features, such as BMI, HGT_CM (height in cm), WGT_KG (weight in kg), each feature is scaled such that they range from 0 (minimum) to 1 (maximum). Each cell’s colour represents the average value of this feature. For discrete features (features with either 0 or 1 values), cell’s colour represents the frequency of patients having feature value as 1.
As can be seen from Figure 3, some of the features’ distributions change over the years, e.g. medicine from immuno_group_3 (tacrolimus) were more likely to be prescribed in the first 6 years after transplant, while medicine from immuno_group_2 (cyclosporine) were more frequently prescribed 12 years after transplant. This is reflective of the change in practice over time, given that tacrolimus has been significantly favoured in recent years due to consistent and dependable trough drug levels, which make it practically easier to manage as an immunosuppressant than cyclosporine. We can also see how the mean of BMI at the time of transplant and 10 years after transplant were higher than any other time points. This shift of distribution is important since BMI is one of the biggest risk factors for diabetes (Bays et al., 2007).
Appendix B Looking at single visits
Aside from treating each patient’s multiple visits as different observations, we explored different ways of approaching the problem by isolating a single visit per patient (first visit, last visit recorded or last visit before onset of diabetes, and random visit). These approaches allow for more interpretable models, but may bias the model.
Table 3 showed performance of classification of prediction of diabetes within the following year. We performed a standard logistic regression (LR), and L2-regularized logistic regression (LR-reg) (Friedman et al., 2010), and random forest (RF) (Liaw & Wiener, 2002). "First" column represents performance of models trained with just each patient’s first visit data, but tested on all visits from held-out patient data. The reported performance are based on all visits of held-out patient data. Likewise for "Last" and "Random" columns, each model was trained with only data from each patient’s last visit or a random visit, but tested on all visits from held-out patient data. "Last 3-lag" indicates that the model was trained with each patient’s last visit data concatenated with data from the last 3 visits, similarly with "Random 3-lag". Surprisingly, LR was able to generalize and achieved a relatively-well performance.
Table 4 showed performance of survival models. More details on each method can be found in Results. We do not need to include CPH-time and WTTE-RNN, since there is only one visit per patient. As shown in Table 4, models trained on data from patient’s last visit, random visit, or their concatenation with the previous 3 visits, were able to generalize well and achieved a relatively-well performance, compared to models trained on all patient records shown in Table 2.
|Name||First||Last||Last 3-lag||Random||Random 3-lag|
Appendix C DMM dimension
DMM was only used to model 5 time-dependent features (14-dimensional). When comparing the loss of the trained DMM on a held-out dataset, model with 2-dimensional latent representations and 10-dimensional latent representations have the same loss after 90 epochs of training:. As shown on Figure 4, the performance of the models are almost indistinguishable. Therefore, we decided to use a 2-dimensional representation for our classification.