The ability of healthcare organizations to make accurate predictions about individuals’ future health is becomingly increasingly important with the adoption of accountable care and alternative payment models (CMS (2016)). In particular, management of complex, chronic diseases such as cardiovascular disease, diabetes, and chronic kidney disease is especially difficult, because selection of optimal therapy may require integration of multiple conditions and risk factors that are each considered in isolation under current approaches to care. Additionally, individuals with multiple chronic conditions are among both the most expensive and highest utilizers of healthcare services (Johnson and et al. (2015)).
Accountable care organizations (ACOs) are institutions that aim to link provider reimbursement with quality metrics and reductions in total cost of healthcare services to a specific population of patients. These organizations need personalized prediction tools capable of flagging specific patients in their populations at the highest risk of having poor outcomes (Parikh et al. (2016)). Much of the data required to develop such tools are already being routinely collected, due to the recent widespread adoption of Electronic Health Records (EHRs). However, in order to be clinically relevant the prediction methods will need to be flexible enough to accommodate the inherent limitations of operational EHR data (Hersh and et al (2013)), dynamically update predictions as more information is collected, and scale seamlessly to the massive size of modern health records.
Prediction of future disease trajectory is a challenging problem. One difficulty is the many underlying sources of variability that can drive the different potential manifestations of the disease. For instance, the underlying biological mechanisms of the disease can give rise to latent disease subtypes, or groups of individuals with shared characteristics (Saria and Goldenberg (2015)). For most complicated diseases, there are no clear definitions of subtypes, so this must be inferred from the data. In addition, there are individual-specific sources of variability that may not be directly observed, such as behavioral factors, environmental conditions, or temporary infections. Another challenge is the fact that observations are irregularly sampled, asynchronous, and episodic, precluding the use of many time series methods developed for data regularly sampled at discrete time intervals. The large degree of missing data, especially when modeling multivariate longitudinal data, also presents complications. The task is made even more difficult when the primary data source is the EHR rather than a curated registry, as even defining a relevant cohort of patients to model can take extensive review by a clinical expert.
We collaborated with Duke Connected Care, the ACO affiliated with the Duke University Health System, to create models to predict disease progression in chronic kidney disease (CKD). CKD is characterized by a slow and typically symptomless loss of kidney function over time, resulting in complications that can cause poor health, premature death, increased health service utilization, and excess economic costs. A person’s estimated glomerular filtration rate (eGFR) is an approximation of overall kidney function and is calculated using a common clinical laboratory test (serum creatinine or cystatin C) and demographic information (age, sex and race) (Levey and et al. (2009)). Impairment of a person’s eGRF is used to define and stage CKD. In addition, a number of other routinely measured laboratory values may be used to help detect abnormal or declining kidney function.
Providing optimal care for CKD patients is a challenging problem for healthcare providers. Even though CKD can be easily identified using simple, eGFR-based laboratory criteria, it is often not recognized (Szczech et al. (2014), Tuot et al. (2011)). Predicting which patients with recognized CKD will progress to kidney failure (requiring dialysis or kidney transplantation), or suffer from other complications such as early death from heart attack or stroke, is a difficult task (Mendelssohn et al. (2011)). Additionally, oftentimes providers fail to prescribe appropriate preventive treatment to slow disease progression or address complications (Smart et al. (2014)).
These traits make CKD an ideal condition to develop disease progression models that can be used in high-impact care management programs. The difficulties with CKD care are best explained with a representative clinical case, illustrated in Figure 1. A 47 year-old man makes first contact with the health system with normal kidney function, although he possesses several risk factors for developing CKD in the future. He receives sufficient medical care over the next few years to detect that his kidney function is rapidly deteriorating (normal annual rate of kidney function loss at his age is only about 1-2%). Due to other more pronounced medical conditions, his kidney disease goes unnoticed and he does not receive any treatment to slow progression to total kidney failure. He is finally referred to a kidney specialist at age 52, more than a year after his kidney function has fallen below the recommended referral threshold. There is now not enough time to make advanced preparations for kidney failure, such as pre-emptive kidney transplantation or at-home dialysis, and kidney failure is inevitable. Less than 90 days from the first specialist appointment, he requires hospitalization for emergency dialysis initiation, a traumatizing procedure that also makes him among the most expensive class of patient to treat (Johnson and et al. (2015)
). He suffers multiple cardiovascular complications from kidney failure while on dialysis over the next decade, ultimately dying at age 63. This patient story is riddled with missed opportunities that could have been acted upon by a care management program that uses accurate predictions from machine learning models for disease progression.
Our aim in this work is to develop broadly applicable statistical methods that flexibly model multivariate longitudinal data in order to jointly predict future disease trajectories with other related variables, such as patient symptoms or relevant laboratory values. To accomplish this, we utilize a hierarchical latent variable model that captures dependencies between multivariate longitudinal trajectories. Our model uses a Gaussian process (GP) with a highly structured mean function to model each longitudinal variable for each individual. The mean functions for the GPs are made dependent through shared latent variables. Using our model, we study a large cohort of patients with CKD from the Duke University Health System EHR and make predictions about the future trajectory of their disease severity, as measured by eGFR, along with five other commonly recorded laboratory values that are known to be affected by CKD. Our inference algorithm scales well to the large dataset, and makes accurate predictions that outperform a state-of-the-art baseline.
2 Related Work
Within the medical literature, the vast majority of predictive models are cross-sectional and only consider features at or up until the current time to predict outcomes at a fixed point in the future. Typically, these models only attempt to explain variability in the outcome of interest by conditioning on baseline covariates. This precludes the ability to generate dynamic individualized predictions, making them difficult to use for medical decision making in practice. For instance, Tangri and et al. (2011) is a commonly referenced Cox proportional hazards model for predicting time to kidney failure.
There are several related works that tackle the problem of dynamic predictions with medical applications in mind. (Rizopoulos (2011)) construct joint models with a focus on updating dynamic predictions about time to death as more values of a longitudinal biomarker are observed. They also account for individual heterogeneity using random effects. Related work on joint modeling (Proust-Lima and et al. (2014)
) use a mixture model to address heterogeneity. Although a few works from this joint modeling literature address multivariate data, typically the only dependency among variables is simplistic, coming from a shared vector of random effects. Most similar to our work is (Schulam and Saria (2015)), who present a model for a univariate marker of disease trajectory using a GP with highly structured mean function.
There has been much recent interest in machine learning in modeling EHRs and other types of healthcare data. For instance, (Lian and et al. (2015)) use hierarchical point processes to predict hospital admissions, and (Ranganath and et al. (2015)) develop a dynamic factor model to learn relationships between diseases and predict future diagnosis codes. Some related papers use multitask GPs to model multivariate, longitudinal clinical data (Durichen and et al. (2015), Ghassemi and et al. (2015)). However, these works use independently trained models for each patient and do not hierarchically share information across patients. This worked well in their examples where there were a large number of observations for each patient and little missing data, but would not work well with our much sparser EHR data. Closest to our work in the application is (Perotte and et al. (2015)
), who predict time of progression from CKD stage 3 to stage 4. Although they use a standard Kalman filter to model multivariate laboratory data, this is not the focus of their work and will not be as flexible as our GP-based models. Finally,Futoma et al. (2016) present a method for jointly modeling CKD disease trajectory with adverse events.
3 Proposed Multivariate Disease Trajectory Model
Our proposed hierarchical latent variable model jointly models each patient’s multivariate longitudinal data by using a GP for each individual variable, with shared latent variables inducing dependence between the mean functions. Note that our model reduces to the method presented in (Schulam and Saria (2015)) in the univariate setting.
Let denote the dimensional trajectory of measurements for individual , and let be the observations for variable at times for this individual. Let , , , and be latent variables specific to individual , to be defined subsequently. We will assume for each person that the longitudinal variables are conditionally independent given these latent variables: .
For each longitudinal variable, we propose the following generative model:
with common priors across all variables:
The first term in the mean function (2) is a population-level fixed effect that uses observed baseline covariates , e.g. gender and race, to determine a population-level intercept for each lab, with a coefficient vector. This can easily be extended to accommodate more flexible fixed effects, e.g. a fixed intercept and slope.
The second term in (2) is a subpopulation component, where it is assumed individual ’s trajectory for variable belongs to a latent subpopulation, denoted . Each subpopulation is associated with a unique trajectory; in particular, is a fixed B-spline basis expansion of time (for simplicity, assumed to be the same for each variable with eight interior knots evenly spaced in time) with the coefficient vector for subpopulation and variable .
The prior for each in (4) depends on the individual’s “global” cluster . Drawing on ideas from latent structure analysis for multivariate categorical data (Lazarsfeld and Henry (1968)), the induce dependence among the trajectory-specific clusters , as they have conditionally independent multinomial priors. Each of the columns in the matrix defines a distribution over the values that can take. The
then has a multinomial logistic regression prior in (5) that depends on the baseline covariates, where are regression coefficients with 0 for identifiability.
The third term in (3) is a random effects component, allowing for individual-specific long-term deviations in trajectory that are learned dynamically as more data becomes available. In practice is a linear expansion of time with , so that is a random slope and intercept vector for patient . The overall vector has a multivariate normal prior distribution in (6), making the random effects dependent across labs.
Finally, is the Ornstein-Uhlenbeck (OU) covariance function for the GP each variable, with parameters . This kernel is well-suited for modeling transient deviations from the mean function, as it is mean-reverting and has no long-range dependence between deviations. We can rewrite (2) as , where , so that we can explicitly represent the short-term deviations from the GP in order to learn them during inference.
The computational problem associated with fitting our model is estimation of the posterior distribution of latent variables and model parameters given the observed data. As is common with complex probabilistic generative models, exact computation of the posterior is intractable and requires approximation to compute. To this end, we develop a mean field variational inference [Jordan et al., 1999] algorithm to find an approximation to the posterior.
Variational inference converts the task of posterior inference into an optimization problem of finding a distribution in some approximating family of distributions that is close in KL divergence to the true posterior. The problem is equivalent to maximizing what is known as the evidence lower bound (ELBO) [Bishop, 2006]:
which forms a lower bound on the marginal likelihood of our model.
The model parameters to be learned are , and the local latent variables specific to each person are their global cluster assignment , subpopulation assignments , random effects and structured noise functions
. The joint distribution for our model can be expressed as:
We make the mean field assumption for the variational distribution, which assumes that in the approximate posterior , all the latent variables are independent. This implies that , where:
The assumed variational distributions for each latent variable are in the same family as their prior distribution, i.e. and the are multinomials, is multivariate normal. For the we use a full multivariate normal evaluated at all times at which variable is observed for person . In order to evaluate the at additional times e.g. during model evaluation we can use the conditional GP framework and treat the observed values as pseudo-inputs, from the sparse GP literature (Titsias (2009)). Finally, for all model parameters we learn a point estimate, so that their variational distributions are delta functions. We impose vague normal priors on and , a uniform prior on , and simply learn MLEs for other parameters. Thus, the goal of our variational algorithm is to learn optimal variational parameters for each individual , as well as a point estimate for the model parameters. In practice, we optimize the Cholesky decompositions of covariance matrices.
Solving the Optimization Problem
We use the automatic differentiation package autograd 111https://github.com/HIPS/autograd in Python to compute analytic gradients in order to optimize the lower bound, since the lower bound has an analytic expression. At each iteration of the algorithm, we optimize the local variational parameters using exact gradients. To optimize the global parameters, we turn to stochastic optimization, a commonly used tool in variational inference. Rather than using the entire large dataset to compute exact gradients of the ELBO with respect to , we can compute an unbiased noisy gradient based on a sampled batch of observations (Hoffman et al. (2013)
). To set the learning rate we use RMSProp, which adaptively allows for a different learning rate for each parameter (Tieleman and Hinton (2012)).
5 Empirical Study
In this section we describe our experimental setup and results on our dataset.
Our dataset contains laboratory values from 44,519 patients with stage 3 CKD or higher extracted from the Duke University Health System EHR. IRB approval (#Pro00066690) was obtained for this work. We first created an initial cohort of roughly 600,000 patients that had at least one encounter in the health system in the year prior to Feb. 1, 2015. This includes all types of encounters within the health system, including inpatient, outpatient, and emergency department visits, over a span of roughly 20 years. From this, we filtered to patients who had at least five recorded values for serum creatinine, the laboratory value we used to calculate eGFR. We next filtered to patients that had Stage 3 CKD or higher, indicative of moderate to severe kidney damage, defined as two eGFR measurements less than 60 mL/min separated by at least 90 days.
We also choose to model five related lab values that have important clinical significance for CKD. The first, Serum Albumin, is an overall marker of health and nutrition. The second, Serum Bicarbonate, can indicate acid accumulation from inadequate acid elimination by the kidney. The third, Serum Calcium, can indicate improperly functioning kidneys if levels are too high. The fourth, Serum Phosphorus, can indicate phosphorus accumulation due to inadequate elimination by the kidney, and is associated with cardiovascular death and bone disorders. Finally, Urine Albumin to Creatinine Ratio (ACR) is a risk factor and cause of kidney failure. While all 44,519 patients have at least five eGFR measurements, there are 884, 242, 78, 16159, and 4321 patients who have no recorded values for the other five labs, respectively. The median number of measurements for each lab (among patients with at least one) is 14, 8, 11, 12, 3, and 4, respectively.
As a final preprocessing step, since the recorded eGFR values are extremely noisy and eGFR is only a valid estimate of kidney function at steady state, we take the mean of eGFR readings in monthly time bins for each patient. Rapid fluctuations in acute illness are related to long term risk, but we have not yet explicitly incorporated this into our modeling. We also do this to the other lab values, to reduce the overall noise in short time spans when there may be many rapidly changing measurements. In order to align the patients on a common time axis, for each patient we fix to be their first recorded eGFR reading below 60 mL/min. The baseline covariates used for were baseline age, race and gender, and indicator variables for hypertension and diabetes, as well as an overall intercept.
After learning a point estimate for the global model parameters during training, they are held fixed. Then, an approximate posterior over the local latent variables is learned for each patient in the held-out test set. Predictions about future lab values are made by drawing samples from the approximate posterior predictive. We compare our method with the method of (Schulam and Saria (2015)
), trained independently to each of the 6 labs. We use 10 fold cross validation and use one-sided, paired t-tests to test for significant improvements in performance. For each test patient, we learn their parameters using their data up until time, , and , and record the mean absolute error of the predictions for each lab in future time bins. These values are then averaged over all patients in the test set, and finally averaged over the 10 cross validation folds to produce Table 1.
|Predictions with data up to…|
Table 1 displays the results of both methods. Our proposed method outperforms (Schulam and Saria (2015)) for labs that had higher amounts of missingness. In these settings, the methods sometimes have to predict held-out lab values for a patient without having yet observed any values for that lab. In this case (Schulam and Saria (2015)) can only predict using baseline covariates for that individual, while our multivariate approach is able to leverage information from the other related labs as well. Interestingly, our method performs slightly worse at predicting eGFR, perhaps because this is the most commonly measured variable in our cohort. Current clinical practice for managing care of CKD patients does not use any form of modeling to predict future disease status or lab values. Incorporation of a model such as ours to predict future eGFR trajectory and other labs would provide a useful tool for nephrologists to assess the risk of future decline in kidney function in CKD patients.
In this paper, we proposed a flexible model for multivariate longitudinal data, and applied it to disease trajectory modeling in CKD patients. We find our model yields good performance on the task of predicting future kidney function and related lab values. Although our work is a promising first step for developing machine learning models from EHR data and applying them to real clinical tasks, many inherent limitations to working with operational EHR data (Hersh and et al (2013)) must be overcome in order for the methods to be used in practice.
There are many interesting directions for extending the ideas presented here. We plan to consider models for other diseases, including diabetes and cardiovascular disease, which are frequently comorbid with CKD. Incorporation of a larger number of longitudinal variables will require care in order to ensure tractability, especially if we choose to model vitals that are recorded much more frequently. Another interesting direction to consider is joint models for adverse event data, such as in (Futoma et al. (2016)). Jointly modeling multiple event processes (e.g. emergency department visits, heart attacks, strokes) will allow us to learn correlations between different types of events. Given that much of the data recorded in the EHR is in the form of administrative billing codes, future work should incorporate these into the models as well. Finally, models incorporating additional outcomes such as medical costs, hospitalizations, and patient quality of life are of significant practical interest.
The screenshot in Figure 2 shows a working prototype of the application being developed in collaboration with Duke Connected Care, where we use our predictions to help flag high risk patients for chart review during weekly rounding sessions. By further refining and deploying a flexible, scalable predictive tool such as ours, ACOs around the country can intervene on high-risk patients and realize the potential benefits of precision medicine.
Joseph Futoma is supported by an NDSEG fellowship. Dr. Cameron is supported by a Duke Training Grant in Nephrology (5T32DK007731). This project was funded by both the Duke Translational Research Institute and the Duke Institute for Health Innovation. Research reported in this publication was supported by the National Center for Advancing Translational Sciences of the NIH under Award Number UL1TR001117. The content is solely the responsibility of the authors and does not necessarily represent the official views of the NIH.
- CMS (2016) CMS. Quality payment program: Delivery system reform, medicare payment reform, and macr, 2016. URL https://www.cms.gov/Medicare/Quality-Initiatives-Patient-Assessment-Instruments/Value-Based-Programs/MACRA-MIPS-and-APMs/MACRA-MIPS-and-APMs.html.
- Durichen and et al. (2015) R. Durichen and M. Pimentel et al. Multitask gaussian processes for multivariate physiological time-series analysis. In IEEE Transactions on Biomedical Engineering, pages 314–322, 2015.
- Futoma et al. (2016) J. Futoma, M. Sendak, C. Cameron, and K. Heller. Scalable joint modeling of longitudinal and point process data for disease trajectory prediction and improving management of chronic kidney disease. In UAI, to appear, 2016.
- Ghassemi and et al. (2015) M. Ghassemi and M. Pimentel et al. A multivariate timeseries modeling approach to severity of illness assessment and forecasting in icu with sparse, heterogeneous clinical data. In AAAI, 2015.
- Hersh and et al (2013) W. Hersh and M. Weiner et al. Caveats for the use of operational electronic health record data in comparative effectiveness research. In Medical Care, 2013.
- Hoffman et al. (2013) M. Hoffman, D. Blei, C. Wang, and J. Paisley. Stochastic variational inference. In JMLR, pages 1303–1347, 2013.
- Johnson and et al. (2015) T. Johnson and D.Rinehart et al. For many patients who use large amounts of healthcare services, the need is intense yet temporary. In Health Affairs, 2015.
- Lazarsfeld and Henry (1968) P. Lazarsfeld and N. Henry. Latent structure analysis. In Houghton Mifflin, 1968.
- Levey and et al. (2009) A. Levey and L. Stevens et al. A new equation to estimate glomerular filtration rate. In Annals of Internal Medicine, pages 604–612, 2009.
- Lian and et al. (2015) W. Lian and R. Henao et al. A multitask point process predictive model. In ICML, 2015.
- Mendelssohn et al. (2011) D. Mendelssohn, B. Curtis, and K. Yeates et al. Suboptimal initiation of dialysis with and without early referral to a nephrologist. In Nephrology Dialysis Transplantation, 2011.
- Murphy (2012) K. Murphy. Machine learning: A probabilistic perspective. In MIT Press, 2012.
- Parikh et al. (2016) R. Parikh, M. Kakad, and D. Bates. Integrating predictive analytics into high-value care: The dawn of precision delivery. In JAMA, pages 651–652, 2016.
- Perotte and et al. (2015) A. Perotte and R. Ranganath et al. Risk prediction for chronic kidney disease progression using heterogeneous electronic health record data and time series analysis. In JAMIA, 2015.
- Proust-Lima and et al. (2014) C. Proust-Lima and et al. Joint latent class models for longitudinal and time-to-event data: A review. In Statistical Methods in Medical Research, 2014.
- Ranganath and et al. (2015) R. Ranganath and A. Perotte et al. The survival filter: Joint survival analysis with a latent time series. In UAI, 2015.
- Rizopoulos (2011) D. Rizopoulos. Dynamic predictions and prospective accuracy in joint models for longitudinal and time-to-event data. In Biometrics, 2011.
- Roberts and et al. (2013) S. Roberts and M. Osborne et al. Gaussian processes for time series modeling. In Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences, 2013.
- Saria and Goldenberg (2015) S. Saria and A. Goldenberg. Subtyping: What is it and its role in precision medicine. In IEEE Intelligent Systems, 2015.
- Schulam and Saria (2015) P. Schulam and S. Saria. A framework for individualizing predictions of disease trajectories by exploiting multi-resolution structure. In NIPS, 2015.
- Shi et al. (2005) J. Shi, R. Murray-Smith, and D. Titterington. Hierarchical gaussian process mixtures for regression. In Statistics and Computing, 2005.
- Smart et al. (2014) N. Smart, G. Dieberg, M. Ladhanni, and T. Titus. Early referral to specialist nephrology services for preventing the progression to end-stage kidney disease. In Cochrane Database of Systematic Reviews, 2014.
- Szczech et al. (2014) L. Szczech, R. Stewart, and H. Su et al. Primary care detection of chronic kidney disease in adults with type-2 diabetes: The add-ckd study. In PLoS ONe, 2014.
- Tangri and et al. (2011) N. Tangri and L. Stevens et al. A predictive model for progression of chronic kidney disease to kidney failure. In JAMA, pages 1553–1559, 2011.
- Tieleman and Hinton (2012) T. Tieleman and G. Hinton. Lecture 6.5-rmsprop: Divide the gradient by a running average of its recent magnitude, 2012.
- Titsias (2009) M. Titsias. Variational learning of inducing variables in sparse gaussian processes. In AISTATS, 2009.
- Tuot et al. (2011) D. Tuot, L. Plantinga, and C. Hsu et al. Chronic kidney disease awareness among individuals with clinical markers of kidney dysfunction. In Clinical Journal of the American Society of Nephrology, 2011.