In the era of digital medicine, modern medical devices enable clinicians to accurately and frequently measure the physiological state of their patients. Heart rate, blood pressure, arterial oxygen saturation, and white blood cell counts are just a few of the vital signs used to monitor patient state and to predict outcomes such as hospital discharge rate, patient survival, or readmission (Rajkomar et al., 2018; Ranganath et al., 2016). To model the complex relationships governing patient state dynamics, a large body of work takes advantage of the flexible properties of Gaussian processes (GPs). Both single-output (Stegle et al., 2008; Lasko et al., 2013) and multi-output methods (Nemati et al., 2012; Ghassemi et al., 2015; Cheng et al., 2017; Futoma et al., 2017)
use carefully designed kernels to improve prediction accuracy of patient states across sparse, noisy, and irregularly sampled time series horizons. These methods predict correlations between patient vital signs that give insight into imputing hard-to-sample state variables, detecting patient trends, and making decisions under uncertainty. Modeling patient state dynamics is made difficult by the administration of drugs and other therapeutic interventions, which directly impact state features for a limited time window and often return to baseline. Accurate prediction of patient state following an intervention allows a clinician to consider multiple options for intervention and make informed treatment decisions.
Recent papers have attempted to model the complex effects of interventions on patient state (Xu et al., 2016; Schulam and Saria, 2017). These models do not incorporate a control systems understanding of the complex mechanisms governing the tightly controlled responses of physiological traits to interventions. This leads to inaccurate and non-generalizeable predictions when insufficient training data is available, which is often the case when predictions include a patient-specific component.
One motivation for modeling intervention effects on patient state with dynamical systems is to learn optimal treatment policies. For instance, prior work adapted reinforcement learning (RL) to derive a closed-loop anesthesia controller to regulate mean arterial pressure based on a dynamical patient model(Padmanabhan et al., 2015). Other applications, such as finding multi-drug therapies for human immunodeficiency virus (HIV), have used dynamical systems models (Adams et al., 2004).
In this paper, we introduce a Bayesian nonparametric framework for estimating the dynamics of clinical traits in response to drug interventions through electronic health records (EHRs) from hospital patients. Specifically, we develop an approach to learn the patient-specific response of clinical traits to treatments from medical time series data. Our approach consists of a multi-output Gaussian process (GP) derived from latent force models (LFMs) (Alvarez et al., 2009), which we model using GPs with kernels derived from differential equations representing dynamical systems. To model the effect of multiple treatments on patient-specific clinical traits, we use latent force functions sampled from Gaussian processes with causal kernels. By estimating patient-specific parameters to capture the effects of treatments on specific clinical traits, our model offers novel mechanistic insights and achieves competitive predictive accuracy of physiological response to interventions when compared with state-of-the-art methods (Soleimani et al., 2017).
The contributions of this work are two-fold. First, we compose causal kernels in time-marked medical data to capture the latent dynamics of interventions effects using a GP. Second, we incorporate these treatment effects in a model of patient state by convolving this causal kernel latent force model with a multi-output GP that captures patient state absent interventions. Our approach is a necessary step towards achieving optimal control of patient health through targeted, personalized treatments (Särkkä et al., 2017). We show the predictive value of our approach on a large hospital data set.
2.1 Gaussian Processes for Time Series
In the medical time series setting, data are collected from patients, indexed by , across time points indexed by . Single output data typically correspond to time-varying covariates encoding physiological states such as blood pressure or heart rate. The measured pairs , where corresponds to the time at which the covariate was recorded, are then modeled through an underlying latent function such that , where is independent Gaussian noise. The goal is to estimate and to evaluate at future time points to enable prediction and early detection of physiologically abnormal states.
Gaussian processes (GPs) represent a versatile generative framework for modeling the distribution on a real-valued function . GPs are nonparametric stochastic processes specified by mean and covariance functions:
where is the mean function and is the covariance function or kernel: . The mean function is often assumed to be zero (Rasmussen and Williams, 2006). An immediate result is the multivariate Gaussian form of the joint , where is the kernel matrix with entries .
Properties of the function such as smoothness or periodicity are determined by the kernel function . One commonly used kernel is the squared exponential (SE) kernel
which is parameterized by a length scale and a scale factor . The functions generated by a GP with an SE kernel are smooth because the kernel function is infinitely differentiable. SE kernels capture stationary
processes, as the covariance between two vectors depends on the difference in time (or other covariate)but not on absolute time.
GPs have been studied extensively in the context of time series (Roberts et al., 2012), and are especially useful when the data are sparsely or irregularly sampled. In medical time series, the clinical measurements are recorded sporadically and sometimes sparsely across time. To better model multiple correlated measurements, multi-output GPs (MOGPs), which capture the covariance structure between multiple measurements, have been adapted for use on medical data (Ghassemi et al., 2015; Futoma et al., 2017; Cheng et al., 2017). However, kernels used to capture temporal dependencies between sequential clinical measurements are mostly stationary. This is a limitation since, during a patient’s stay in the hospital, clinical events and interventions occur, and physiological dynamics often vary across time.
2.2 Latent Force Models for Patient Data
Physiological dynamics have been long studied by physiologists using systems of differential equations. In the case of heart rate and blood pressure, for example, the cardiac conduction system is assumed to be a network of self-excitatory pacemakers leading to systems of nonlinear oscillators (Glass, 2001). Medical time series-based prediction of such covariates has relied on regression models, linear and nonlinear, including but not limited to GPs; yet an explicit connection with control theory has been lacking. Recently, several methods were proposed to bridge the gap between stochastic control methods and nonparametric time series models (Gao et al., 2008; Alvarez et al., 2009)
. In particular, multi-output GPs are one approach to representing latent force models, where the covariance functions (kernels) are derived using ordinary differential equations (ODEs). In the LFM setup, we would like to model the system dynamics between a set of observed processes,, and a set of unobserved latent forces, , assuming that they interact according to differential equations capturing those dynamics. For example, in a first-order LFM, the following equation holds:
with and representing the base level value and underlying decay parameter of each output , and the influence constants from each latent force to each output . Through this formulation, one can recover the latent force functions and the output functions from discrete observations. Solving the ODE yields
In LFMs, the latent forces are modeled independently as samples from their respective GPs. For certain classes of covariance functions, such as SE kernels, one can show that the outputs are also GPs with analytically closed-form covariance functions, as well as cross-covariances between latent forces and the outputs (Alvarez et al., 2009).
While sampling the latent forces from independent GPs is computationally convenient, important information can be lost. For example, when provided with historical patient data, we might want to know how latent physiological dynamics are shared across patients from related covariate groups (i.e., same age, sex, BMI) or across patients receiving similar treatments. Limited numbers of observations also motivates a hierarchical approach to this problem, in order to share strength across patients.
2.3 Treatment Effect Estimation
In addition to modeling physiological outputs, accurate modeling of medical data sets requires incorporating various treatment effects, with the purpose of controlling or stabilizing the physiological states of patients. Treatments are often drug interventions that can be characterized by drug name, administration type (e.g., oral, intravenous), and dosage. Estimating the effect of a treatment on a patient’s physiological state is paramount to improving prediction and evaluating treatments.
While GPs are commonly used to model medical time series data, several extensions have been proposed to estimate the effects of medical treatments. Counterfactual Gaussian processes (CGPs) (Schulam and Saria, 2017) use marked point processes (MPP) as an event model to account for dependencies between actions and observed physiological trajectories. In Xu et al. (2016), a class of parametric functions is introduced to model the effects of dialysis for patients with acute kidney injury. The functions were designed to explicitly model different types of effects, including delay and decay. To explain heterogeneity across patients, a Dirichlet process was used for clustering patients. In Futoma et al. (2017), the treatment effects are modeled in the prior mean function of multi-output Gaussian processes, formulated as the sum of multiple exponential decay functions. These approaches require the response dynamics to conform to a specific functional form such as exponential decay, where in practice they are much more heterogeneous.
More recently, Soleimani et al. (2017) introduces treatment effects as the output of a linear time-invariant (LTI) system. The inputs are the observed administration of medications (e.g., type and dosage), and the effects are estimated based on a chosen form of a second-order filter. Patient-specific filter parameters were estimated and regularized using a global prior across patients.
To allow the treatment response to take on arbitrary functional forms with scalable effects sizes and directions, we use a hierarchical GP to model the latent forces. In particular, we replace the independent GPs with a causal-kernel GP whose hyperparameters are shared across patients to allow arbitrary functional response and to share strength by representing correlations across patient groups.
3 Causal Convolutional Gaussian Processes
Here, we propose a flexible framework to model medical time series data. Our method brings together ideas from GP latent force models (Alvarez et al., 2009; Särkkä et al., 2017) and causal GPs (Cunningham et al., 2012) to address a challenge in modeling medical time series data—the systematic inclusion of multi-treatment effects on the dynamics of multiple physiological covariates. We first introduce the notation in the context of medical time series data, and then introduce the model. We also discuss the details of implementation and inference methods.
3.1 Medical Time Series with Treatments
We denote the observed medical time series data from patients characterized by physiological dynamic covariates across irregularly sampled time points , as noisy samples from a Gaussian process with a patient- and covariate-specific mean function and kernel :
Here, the kernel accounts for stationary temporal fluctuations of physiological signals, such as circadian rhythms. We choose this kernel as a sum of one SE kernel and one periodic kernel:
For each patient , we index treatments given as , and the time of treatments is denoted as . For each treatment, we use function to map the index of the treatment to a treatment set representing the type of treatment. Since the dosage-response curve of the same drug usually has a nonlinear curve that varies across dosages (Myers and Thiessen, 1980; Ghassemi et al., 2014), and the characteristics of absorption vary across different routes, we treat the same drug with different dosages or taken via different routes (e.g., oral or injection) as different treatments. Whenever clear from the context, we drop the patient index from these variables.
We assume there are different latent forces induced by each type of treatment. For a treatment given at time , we model the latent force as a function of time drawn from a Gaussian process. We also assume that each patient has a patient-specific latent force, and use a hierarchical model to share support for latent force models across patients.
3.2 Causal Treatment Dynamics
The dynamic behavior of a treatment’s response to the clinical covariates are represented in our setup as a latent force model. In particular, we model the mean function of the clinical traits through the first-order dynamical system LFM:
where the decay , the baseline covariate output , and treatment effect size are patient-specific parameters that control the dynamics of the treatment response. We assume these patient-specific parameters come from a population-wise empirical prior based on demographic data, such as age and weight. We assume the latent force function of the same treatment is shared across patients, and is sampled from a Gaussian process:
We introduce causality on the latent force functions associated with each treatment, which means that causal effects must only act forward in time. To do this, we designed the kernel as a causal kernel (Cunningham et al., 2012). That is,
where is the clipping function warping the input space and enforcing causality, while preserving the GP structure (Cunningham et al., 2012). Through , the function is constant before the current time . In our model of treatment effects, we introduce an additional condition for .
3.3 Kernel Convolution
The structure of the latent force model leads to a natural composition with the causal Gaussian process prior, leading to an analytic computation of output covariances and cross-covariances. This fact allows for simple gradient descent-based inference. Closed-form kernels were derived following the integral in Eqn. 4. For instance, the cross-covariance between and one latent force when is computed as
where . Details of the computation of the closed-form kernels corresponding to the cases , , and are in the Supplementary Material.
3.4 Hyperparameter Learning for Medication Effects Model
To learn the hyperparameters for each patient, we optimize the marginal likelihood of the GP model with respect to the vector of observations across covariates and .
where . For each patient, we learn a set of hyperparameters for the baseline kernel for each covariate , and the hyperparameters of the derived causal LFM kernel, with the mean function sampled from . We assume that the patient- and treatment-specific hyperparameters are shared across treatments of the same type, while the treatment effect size parameter differs for different dosages or modes of administration. Our implementation is based on GPy (GPy, 2012), and we optimize the hyperparameters using scale conjugate gradient methods. We derived the gradients using the SymPy package (Meurer et al., 2017).
In this section, we show the effectiveness of our method by modeling multiple treatment effects on EHR data from the Hospital of University of Pennsylvania (HUP). We briefly describe the data and preprocessing procedures, and then we discuss results from our method fitted to patient subsets motivated by two clinical applications. We show empirical results of our method using the metrics of prediction accuracy. We compare results with baseline methods with GPs using basic kernels and mean functions from related work (Futoma et al., 2017; Soleimani et al., 2017).
4.1 Inpatient Hospital Data
We evaluated our method using clinical data collected at HUP. The data set consists of 139k patients with access to demographic details (e.g., age, weight, gender), as well as 139 clinical measurements consisting of vital signs and lab tests, and administrated medications during the patients’ stay in the hospital. We normalized each clinical trait by subtracting the empirical mean for each patient from each measurement. We tested our method on two challenging applications—modeling the effects of antihypertensive agents and anticoagulants. We chose to focus on the patients with a primary diagnosis of myocardial infarction (MI; i.e., heart attack) in our data set, resulting in total 1,716 adult patients, as they usually received both types of treatments.
We first modeled the effects of the most frequently administered antihypertensive drug in our data set, metoprolol, on heart rate (HR) and systolic blood pressure (SBP). Metoprolols are beta-blockers that are mainly used to treat high blood pressure or angina due to heart disease. We filtered MI patients to include patients with at least five observations in both heart rate or blood pressure, reducing the number of patients to 594. We removed patients that were administrated metropolol jointly with other antihypertensive agents, resulting in 233 patients. Finally, we included treatments of metoprolol that were administered at least 20 times across all patients, resulting in 181 patients with six type of treatments, including four different dosages of metoprolol tartrate, and one dosage for metoprolol succinate ER and metoprolol injection, and a total of 310 treatment administrations.
Second, we modeled the effect of two different types of anticoagulants: heparin and warfarin. We filtered the MI patients to include those with at least five observations on two lab test results that reflect a patient’s ability to form blood clots: partial thromboplastin time (PTT) and international normalized ratio (INR), resulting in 581 patients. Among all treatments, we considered the top three most frequently administered, and we include patients that received at least one of them. The filtered data includes 404 patients with a total of 592 treatment administrations (Table 1).
4.2 Evaluation Metrics
We evaluated our model by comparing predictive performance with three state-of-the-art GP models: (i) univariate GPs with a constant mean function and the baseline (squared exponential and periodic) kernel (Eqn. 6; Se+Per), (ii) univariate GPs with an exponential decay mean function and the Ornstein-Uhlenbeck (OU) kernel (Futoma et al., 2017) (Ou+Exp), and (iii) Matrn- kernel with a second-order LTI filter for effect modeling (Soleimani et al., 2017) (Mat32+Lti). Note that for the comparative method (ii) and (iii), we added a constant component in the mean function in the proposed setup to account for patient-specific baseline values of each covariate. For all methods, we used the first 70% of observations for each patient for training, and the remaining 30% for testing. We computed the mean absolute error (MAE) of predictions on test data to evaluate model performance.
4.3 Clinical Impact
When comparing the MAE on test data for the two experiments (antihypertensive agents and anticoagulants; Table 2), our method performed competitively across experiments on the predictive tasks for the covariates responding to antihypertensives. Our model performs better than related methods on the task of predicting traits responding to anticoagulants, in particular the blood clot formation trait PTT.
While predictive performance is similar across related methods, our method shows important advantages in model flexibility. We demonstrate this through two case studies: First, we study predictive trajectories and the inferred effects on blood pressure and heart rate of a treatment on one patient (Fig. 1). The patient received 50 mg of metoprolol tartrate . We compare the predictive trajectory with uncertainty from a related method (Mat32+LTI) (Fig. 1 a–b) with results from our method (Fig. 1 c–d). Our method estimates an explicit treatment-induced latent force with strong effects on both heart rate and systolic blood pressure (Fig. 1 e) that matches the time frame prescribed by clinical medicine. Indeed, the direction of estimated effects from the related method on SBP (Fig. 1 a and b) is the opposite of the clinical usage; this error may be due to delayed effect of the drug on SBP. While for this patient the MAE of the related method is slightly lower than our method (4.68 vs. 5.37 in SBP; 7.90 vs. 8.09 in HR), results from our method are closer to clinical ground truth with substantially lower uncertainty. Furthermore, with the GP model of latent force, our estimated effect is more robust than a standard latent force model when presented with additional uncertainty and noise in the data, such as delayed effects from the time of treatment administration, which is important in the clinical setting.
Next, we find similar robust and clinically interpretable behavior are achieved for a patient receiving multiple heparin infusions (25000 units) and heparin injections (5000 units; Fig. 2). For both our method and the related method (Mat32+Lti), the longer-term effects on PTT is estimated (Fig. 2, a and c). Both methods estimated negative effects for the heparin injections (Fig. 2 f), while in general heparin is assumed not to affect INR (Katzung and Trevor, 2015).
We developed a framework using latent force models (LFMs) to capture treatment effects on patients’ physiological state estimated using medical time series data. By modeling treatment effects as latent forces convolved with a multi-output GP, we create a flexible framework that bridges the gap between the smooth, stationary dynamics of patient state and a mechanistic understanding of the forces that impact clinical traits. A GP model of latent forces provides a flexible probabilistic framework with convenient inference properties; we enforce appropriate effect dynamics using a causal kernel. Two key contributions, the latent force model convolution and the associated causal kernel, lead to a computationally tractable solution with low variance and clinically-relevant interpretation of personalized treatment effects. Further improvements in speed may be found by adapting a recent kernel approximation (Guarnizo and Lopez, 2018).
There are several directions to improve our method for clinical treatment effect estimation. Our framework assumes that the effect of each treatment is independent of any other, and interactions between treatments are not modeled. These interactions could be modeled by modifying the kernel for the latent force component. In addition, the method assumes the decay parameters () of the treatment effect for a single patient are treatment-independent and constant throughout the patient’s hospital stay. As the decay parameters reflect the physiology of drug absorption, which may change as a function of patient state, we might model this parameter as a stochastic process itself. For future research, the latent force model encourages an optimal control perspective: estimating treatment effect sizes of each patient for each clinical covariate, coupled with accurate patient state prediction, can provide a path toward treatment prioritization and decision-making in clinical interventions.
- Adams et al. (2004) Adams, B. M., Banks, H. T., Kwon, H.-D., and Tran, H. T. (2004). Dynamic multidrug therapies for HIV: Optimal and STI control approaches. Mathematical Biosciences and Engineering, 1(2):223–241.
- Alvarez et al. (2009) Alvarez, M., Luengo, D., and Lawrence, N. (2009). Latent force models. In Artificial Intelligence and Statistics, pages 9–16.
- Cheng et al. (2017) Cheng, L.-F., Darnell, G., Dumitrascu, B., Chivers, C., Draugelis, M. E., Li, K., and Engelhardt, B. E. (2017). Sparse multi-output Gaussian processes for medical time series prediction. ArXiv e-prints.
Cunningham et al. (2012)
Cunningham, J., Ghahramani, Z., and Rasmussen, C. (2012).
Gaussian processes for time-marked time-series data.
In Lawrence, N. D. and Girolami, M., editors, Proceedings of the
Fifteenth International Conference on Artificial Intelligence and
Statistics, volume 22 of
Proceedings of Machine Learning Research, pages 255–263, La Palma, Canary Islands. PMLR.
- Futoma et al. (2017) Futoma, J., Hariharan, S., Sendak, M., Brajer, N., Clement, M., Bedoya, A., O’Brien, C., and Heller, K. (2017). An improved multi-output Gaussian process RNN with real-time validation for early sepsis detection. arXiv preprint arXiv:1708.05894.
- Gao et al. (2008) Gao, P., Honkela, A., Rattray, M., and Lawrence, N. D. (2008). Gaussian process modelling of latent chemical species: applications to inferring transcription factor activities. Bioinformatics, 24(16):i70–i75.
- Ghassemi et al. (2015) Ghassemi, M., Pimentel, M. A. F., Naumann, T., Brennan, T., Clifton, D. A., Szolovits, P., and Feng, M. (2015). A multivariate timeseries modeling approach to severity of illness assessment and forecasting in ICU with sparse, heterogeneous clinical data. In Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence, pages 446–453.
- Ghassemi et al. (2014) Ghassemi, M. M., Richter, S. E., Eche, I. M., Chen, T. W., Danziger, J., and Celi, L. A. (2014). A data-driven approach to optimized medication dosing: a focus on heparin. Intensive care medicine, 40(9):1332–1339.
- Glass (2001) Glass, L. (2001). Synchronization and rhythmic processes in physiology. Nature, 410(6825):277.
- GPy (2012) GPy (since 2012). GPy: A Gaussian process framework in Python. http://github.com/SheffieldML/GPy.
- Guarnizo and Lopez (2018) Guarnizo, C. and Lopez, A. (2018). Fast kernel approximations for latent force models and convolved multiple-output Gaussian processes. In Proceedings of the Thirty-Fourth Conference in Uncertainty and Artificial Intelligence (2018). Sheffield.
- Katzung and Trevor (2015) Katzung, B. G. and Trevor, A. J. (2015). Basic & Clinical Pharmacology. McGraw-Hill New York, NY.
- Lasko et al. (2013) Lasko, T. A., Denny, J. C., and Levy, M. A. (2013). Computational phenotype discovery using unsupervised feature learning over noisy, sparse, and irregular clinical data. PloS One, 8(6):1–13.
- Meurer et al. (2017) Meurer, A., Smith, C. P., Paprocki, M., Čertík, O., Kirpichev, S. B., Rocklin, M., Kumar, A., Ivanov, S., Moore, J. K., Singh, S., Rathnayake, T., Vig, S., Granger, B. E., Muller, R. P., Bonazzi, F., Gupta, H., Vats, S., Johansson, F., Pedregosa, F., Curry, M. J., Terrel, A. R., Roučka, v., Saboo, A., Fernando, I., Kulal, S., Cimrman, R., and Scopatz, A. (2017). Sympy: symbolic computing in Python. PeerJ Computer Science, 3:e103.
- Myers and Thiessen (1980) Myers, M. G. and Thiessen, J. J. (1980). Metoprolol kinetics and dose response in hypertensive patients. Clinical Pharmacology & Therapeutics, 27(6):756–762.
- Nemati et al. (2012) Nemati, S., Lehman, L.-W. H., Adams, R. P., and Malhotra, A. (2012). Discovering shared cardiovascular dynamics within a patient cohort. In Annual International Conference of the IEEE Engineering in Medicine and Biology Society, pages 6526–6529.
- Padmanabhan et al. (2015) Padmanabhan, R., Meskin, N., and Haddad, W. M. (2015). Closed-loop control of anesthesia and mean arterial pressure using reinforcement learning. Biomedical Signal Processing and Control, 22:54–64.
Rajkomar et al. (2018)
Rajkomar, A., Oren, E., Chen, K., Dai, A. M., Hajaj, N., Hardt, M., Liu, P. J.,
Liu, X., Marcus, J., Sun, M., et al. (2018).
Scalable and accurate deep learning with electronic health records.npj Digital Medicine, 1(1):18.
- Ranganath et al. (2016) Ranganath, R., Perotte, A., Elhadad, N., and Blei, D. (2016). Deep survival analysis. arXiv preprint arXiv:1608.02158.
- Rasmussen and Williams (2006) Rasmussen, C. E. and Williams, C. K. I. (2006). Gaussian Processes for Machine Learning. The MIT Press.
- Roberts et al. (2012) Roberts, S., Osborne, M., Ebden, M., Reece, S., Gibson, N., and Aigrain, S. (2012). Gaussian processes for time-series modelling. Philosophical Transactions of the Royal Society of London A: Mathematical, Physical and Engineering Sciences, 371(1984).
- Särkkä et al. (2017) Särkkä, S., Álvarez, M. A., and Lawrence, N. D. (2017). Gaussian process latent force models for learning and stochastic control of physical systems. arXiv preprint arXiv:1709.05409.
- Schulam and Saria (2017) Schulam, P. and Saria, S. (2017). Reliable decision support using counterfactual models. In Advances in Neural Information Processing Systems, pages 1696–1706.
- Soleimani et al. (2017) Soleimani, H., Subbaswamy, A., and Saria, S. (2017). Treatment-response models for counterfactual reasoning with continuous-time, continuous-valued interventions. In 33rd Conference on Uncertainty in Artificial Intelligence, UAI 2017. AUAI Press Corvallis.
- Stegle et al. (2008) Stegle, O., Fallert, S. V., MacKay, D. J. C., and Brage, S. (2008). Gaussian process robust regression for noisy heart rate data. IEEE Transactions on Biomedical Engineering, 55(9):2143–2151.
- Xu et al. (2016) Xu, Y., Xu, Y., and Saria, S. (2016). A Bayesian nonparametric approach for estimating individualized treatment-response curves. In Machine Learning for Healthcare Conference, pages 282–300.