G-Net: A Deep Learning Approach to G-computation for Counterfactual Outcome Prediction Under Dynamic Treatment Regimes

03/23/2020 ∙ by Rui Li, et al. ∙ 0

Counterfactual prediction is a fundamental task in decision-making. G-computation is a method for estimating expected counterfactual outcomes under dynamic time-varying treatment strategies. Existing G-computation implementations have mostly employed classical regression models with limited capacity to capture complex temporal and nonlinear dependence structures. This paper introduces G-Net, a novel sequential deep learning framework for G-computation that can handle complex time series data while imposing minimal modeling assumptions and provide estimates of individual or population-level time varying treatment effects. We evaluate alternative G-Net implementations using realistically complex temporal simulated data obtained from CVSim, a mechanistic model of the cardiovascular system.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Counterfactual prediction is a fundamental task in decision-making. It entails the estimation of expected future trajectories of variables of interest under alternative courses of action (or treatment strategies) given observed history. Treatment strategies of interest are usually time varying (meaning they comprise decisions at multiple time points) and dynamic (meaning the treatment decision at each time point is a function of history up to that time point).

As an example, consider the problem of fluid administration in Intensive Care Units (ICUs) [4]

. It is frequently necessary for physicians to adopt strategies that administer large volumes of fluids to increase blood pressure and promote blood perfusion through organs in septic patients. However, such strategies can lead to fluid overload, which can have serious adverse downstream effects such as pulmonary edema. Fluid administration strategies are time varying and dynamic because at each time point physicians decide the volume of fluid to administer based on observed patient history (e.g. blood pressure and volume of fluid already administered) up to that time point. To aid in the choice between alternative dynamic fluid administration strategies, it would be desirable to obtain counterfactual predictions of a patient’s probability of developing fluid overload (and other outcomes of interest) were they to follow each alternative strategy going forward given their observed covariate history up to the current time.

Counterfactual prediction is an inherently causal task in that it must account for the causal effects of following different treatment strategies. When treatment strategies of interest are time-varying, so-called “g-methods” [8, 13] are required to estimate their effects. G-methods include g-computation [14, 16, 19], structural nested models [17, 23], and marginal structural models [12, 10]. Of these methods, g-computation is best suited for estimating effects of general dynamic treatment strategies conditional on high dimensional patient histories [15].

G-computation works by estimating the conditional distribution of relevant covariates given covariate and treatment history at each time point, then producing Monte Carlo estimates of counterfactual outcomes by simulating forward patient trajectories under treatment strategies of interest. Regression model(s) for the covariates and outcomes at each time point conditional on observed history are a critical component of this method. While any regression models could in theory be input to the G-computation algorithm, most existing G-computation implementations have employed simple regression models with limited capacity to capture complex temporal and nonlinear dependence structures. In recent years, sequential deep learning methods such as Recurrent Neural Networks (RNNs) have achieved state of the art performance in predictive modeling of complex time series data while imposing minimal modeling assumptions. In this paper, we propose G-Net, a sequential deep learning framework for G-computation. G-Net admits the use of recurrent networks such as LSTMs to model covariates in a manner suitable for G-computation. The G-Net framework is flexible and allows for various configurations depending on the problem at hand. To the best of our knowledge, this is the first work to investigate a RNN based approach to G-computation.

Unfortunately, it is impossible to reliably evaluate counterfactual predictions on real data, since only the outcomes corresponding to treatment strategies that were actually followed can be observed. Consequently, to explore and evaluate various implementations of G-Net, we used simulated data in which counterfactual ground truth can be known. We used CVSim [6], a well established mechanistic model of the cardiovascular system, to estimate counterfactual simulated patient trajectories under various fluid and vasopressor administration strategies. These experiments provide a template for causal model evaluation using complex and physiologically realistic simulated longitudinal data.

2 Related Work

Several recent works have proposed a deep learning framework for counterfactual prediction from observational data, including [2, 1, 25]. However, these works have mostly focused on learning point exposure as opposed to time-varying treatment effects, which are the focus of this paper.

G-computation for estimating time-varying treatment effects was first proposed by Robins [14]. Illustrative applications of the general approach are provided in [19, 26], and summaries of g-computation (and other “g-methods” for estimating time-varying treatment effects) can be found in [8, 13]. The g-computation algorithm takes arbitrary regression models as inputs. While most applications (e.g. [19, 26]

) have thus far employed classical generalized linear models, there is no conceptual barrier to using more complex machine learning regression models. RNNs, and in particular LSTMs, have achieved state of the art performance on a wide variety of time series regression tasks, including healthcare related tasks

[21, 24, 3]. However, despite the popularity and success of RNNs for time series regression, we have not seen in the literature any “deep” implementation of g-computation.

Recently, Lim et al. [9] plugged RNN regression models into history adjusted marginal structural models (MSM) [22] to make counterfactual predictions. However, these MSMs can only make counterfactual predictions under time-varying treatment strategies that do not depend on recent covariate history. For example, a history adjusted MSM [9] could estimate the probability of fluid overload given patient history under the (static) treatment strategy “give 1 liter fluid each hour for the next 3 hours”, but it could not estimate the probability of fluid overload given patient history under the (dynamic) treatment strategy “each hour for the next 3 hours, if blood pressure is less than 65 give 1 liter fluids, otherwise give 0 liters”. History adjusted MSMs cannot estimate effects of time-varying treatment strategies that respond to changes in the patient’s health history, but g-computation can. Further, g-computation is able to straightforwardly estimate the of a counterfactual outcome under a time-varying treatment strategy. This is not straightforward to do with history adjusted MSMs.

Schulam and Saria [18] propose Counterfactual Gaussian Processes, an implementation of continuous time g-computation, and like us apply their method to ICU data. They only consider static time-varying treatment strategies, though it appears that their method might straightforwardly be extended to handle dynamic strategies as well. An advantage of Gaussian processes is interpretable uncertainty quantification. However, GPs are intractable for large datasets since they have time complexity of , where N is the number of observations [20]. Sparse GPs, which introduce inducing points, have at least [20] time complexity. RNNs are more scalable. Recurrent dropout can also be employed in RNN based implementations to produce uncertainty estimates that approximate posterior distributions from a Gaussian process [5].

3 G-computation for counterfactual prediction

Our goal is to predict patient outcomes under various future treatment strategies given observed patient histories. Let:

  • denote time, assumed discrete, with being the end of followup;

  • denote the observed treatment action at time ;

  • denote the observed value of the outcome at time

  • denote a vector of covariates at time

    that may influence treatment decisions or be associated with the outcome;

  • denote the history and denote the future for arbitrary time varying variable .

At each time point, we assume the causal ordering . Let denote patient history preceding treatment at time . A dynamic treatment strategy is a collection of functions , one per time point, such that maps onto a treatment action at time . A simple dynamic strategy for fluid volume might be , i.e. give .5 liters of fluid if mean arterial blood pressure is less than 65 at time .

Let denote the counterfactual outcome that would be observed at time had, possibly contrary to fact, treatment strategy been followed from baseline [14]. Further, let with denote the counterfactual outcome that would be observed had the patient received their observed treatments through time then followed strategy from time onward.

In counterfactual point prediction, our goal is to estimate expected counterfactual patient outcome trajectories

(1)

given observed patient history through time for any and any specified treatment strategy . We might also be interested in estimating the counterfactual outcome distributions at future time points

(2)

If we do not condition on anything in , then (1) is an expectation (and (2) a distribution) over the full population. If we condition on a small subset of variables contained in patient history, then (1) is an expectation (and (2) a distribution) over a sub-population. If we condition on all elements of a patient history, then (1) is still technically only an expectation (and (2) a distribution) over a hypothetical sub-population with the exact patient history conditioned on, but in this case (1) and (2) practically amount to what is usually meant by personalized prediction.

Under the below standard assumptions, we can estimate (1) and (2) through g-computation [14].

  1. [labelsep=2pt, leftmargin=10pt, topsep=0pt,itemsep=0ex,partopsep=1ex,parsep=1ex]

  2. Consistency:

  3. Sequential Exchangeability:

  4. Positivity:

Assumption 1 states that the observed outcome is equal to the counterfactual outcome corresponding to the observed treatment. Assumption 2 states that there is no unobserved confounding of treatment at any time and any future outcome. Assumption 2 would hold, for example, under the conditions depicted in Figure 1. Positivity states that the counterfactual treatment strategy of interest has some non-zero probability of actually being followed. Under the assumption that we specify certain predictive models correctly such that their predictions extrapolate to parts of a joint distribution that they were not trained on, positivity is not strictly necessary.

Figure 1: A causal DAG representing a data generating process in which Assumption 2 (sequential exchangeability) holds. This DAG represents a simple two time step process where the outcome is measured only after the final time step. However, its salient property is that all variables influencing treatment (i.e. with arrows directly into treatment) and associated with future outcomes are measured.

Under assumptions 1-3, for we have simply that

(3)

i.e. the conditional distribution of the counterfactual is simply the conditional distribution of the observed outcome given patient history and given that treatment follows the strategy of interest. For , things are slightly more complex because we need to adjust for time-varying confounding. With

for any random variable X, under Assumptions 1-3 the g-formula yields

(4)

It is not generally possible to compute this integral in closed form, but it could be approximated through Monte-Carlo simulation. We repeat the recursive process shown in Algorithm 1 times. (Here the outcome is without loss of generality deemed to be a variable in the vector .)

Figure 2: The G-Net: A flexible sequential deep learning framework for g-computation.

At the end of this process, we have simulated draws of the counterfactual outcome for each time . For each , the empirical distribution of these draws constitutes a Monte-Carlo approximation of the counterfactual outcome distribution (2). The sample averages of the draws at each time are an estimate of the conditional expectations (1) and can serve as point predictions for in a patient with history .

Key to the g-computation algorithm is the ability to simulate from joint conditional distributions of the covariates given patient history at time

. Of course, in practice we do not have prior knowledge of these conditional distributions and need to estimate them from data. Most implementations use generalized linear regression models to estimate the conditional distributions of the covariates. Often, these models do not capture temporal dependencies present in the patient data. We propose the G-Net for this task.

Set Simulate and record from Simulate from Set Simulate and record from Simulate from Continue simulations through time
Algorithm 1 G-Computation (One simulation)

4 G-Net Design

The proposed G-Net framework depicted in Figure 2 enables the use of sequential deep learning models to estimate conditional distributions of covariates given history at each time point and perform the G-computation algorithm described in Algorithm 1 to simulate covariates under various treatment strategies. In this setting, without loss of generality, we set as one of the co-variates in for notational simplicity.

Let denote components of the vector , where each component could be multivariate. We impose an arbitrary ordering and estimate the conditional distributions of each given all variables preceding it in this ordering. At simulation time, we exploit the basic probability identity

to simulate from by sequentially simulating each from . There are at least two reasons to allow for subdivision of the covariates. First, if covariates are of different types (e.g. continuous, categorical, count, etc.), it is difficult to simultaneously simulate from their joint distribution. Second, customizing models for each covariate component can potentially lead to better performance. Figure 2 illustrates this decomposition where at each time point the components are depicted via ordered (grey shaded) boxes that are responsible for the estimation of the various terms needed to compute the conditional distributions. One could set and model all covariates simultaneously or at the other extreme set to be the total number of variables.

The sequential model used in G-Net provides us with estimates of the conditional expectations for all and . To simulate from , we proceed as follows. If is multinomial, its conditional expectation defines its conditional density. If has a continuous density, there are various approaches we might take to simulate from its conditional distribution. Without making parametric assumptions, we could simulate from

(5)

where is a draw from the empirical distribution of the residuals in a holdout set not used to fit the model parameters used to generate as an estimate of . This method makes the simplifying assumption that the covariate error distribution does not depend on patient history. This is the approach we take in the experiments in this paper, and is depicted in the simulation noise nodes at the top of Figure 2. Alternatively, we might specify a parametric distribution for , e.g. a Gaussian, and directly estimate its parameters by maximum likelihood.

To obtain the estimates for the conditional distribution of covariates, G-Net admits sequential modeling of patients’ data using the group decomposition and arbitrary ordering outlined before. As shown in the yellow boxes in Figure 2, at each time , a representation of the patient history can be computed as

(6)

where represents model parameters learned during training. In its simplest form, may just be an identity function passing the covariates. In other configurations (e.g. using the selector in Figure 2), can be used to provide abstractions of histories using sequential learning architectures such as RNN. This formulation of allows for a great deal of flexibility in how information is shared across variables and time.

Estimates from each of the covariate groups can then be obtained, as shown in Figure 2, by successive estimation of conditional expectations of covariates. Specifically, the conditional expectation of each given the representation of patient history and the other variables from time that precede it in the arbitrary predefined ordering is estimated by the functions . The complete sequence can be given as follows:

(7)

where, each of the represents the specialized estimation function for group and are the learnable parameters for the same. Depending on modeling choices, can be sequential models specialized for group or models focusing only on the time step (e.g. linear models).

Given G-Net parameters, the distribution of the Monte Carlo simulations produced by G-computation Algorithm 1 constitute an estimate of uncertainty about a counterfactual prediction. But this estimate ignores uncertainty about the G-Net parameter estimates themselves. One way to incorporate such model uncertainty would be to fit a Bayesian model and, before each Monte Carlo trajectory simulation in G-computation, draw new network parameters from their posterior distribution. These Monte Carlo draws would be from the posterior predictive distribution of the counterfactual outcome. Bayesian deep learning can be prohibitively computationally intensive, but can be approximated through dropout

[5]. First, we fit the RNN component of the G-Net using recurrent dropout, with dropout masks at each layer held constant across time as described in [5]. Let denote a dropout mask constant across time for each layer of the RNN, denote estimated RNN parameters with mask applied, and denote the distribution from which was sampled during training. Then, during g-computation, we add step 0 to the beginning of Algorithm 1 before each simulation:

and compute all simulations plugging into (4). Then draws of are from an approximation to the posterior distribution of under a particular Gaussian Process prior. Therefore, the Monte Carlo simulations obtained from g-computation incorporating dropout (i.e. adding step 0 as above) approximate draws from the posterior predictive distribution of the counterfactual outcome.

The parameters, and

, are learned by optimizing a loss function forcing the G-Net to accurately estimate covariates

at each time point using standard gradient descent techniques. It is to be noted that it is necessary to use teacher-forcing (using observed values of as arguments to

in equation ) to obtain unbiased estimates of

, as shown in Figure 2.

5 Simulation Experiments Using CVSim

To evaluate counterfactual predictions, it is necessary to use simulated data in which counterfactual ground truth for outcomes under alternative treatment strategies is known. To this end, we performed experiments on data generated by CVSim [7], a program that simulates the dynamics of the human cardiovascular system.

Data Generation: We generated an ‘observational’ dataset under treatment regime and two ‘counterfactual’ datasets and under treatment regimes and . The data generating processes producing and were the same except for the treatment assignment rules. For each , was identical to for the first simulation time steps before diverging to a different treatment rule for time steps to as illustrated in Figure 3.

Figure 3: Covariates trajectories for the same patient (i.e. the same random seed) under two different treatments (blue) and (red). After the treatment strategies diverge at middle of trajectory, (black dashed line), delivers a large fluid dose while does not, temporarily pushing AP and CVP higher under the regime than the regime.

A CVSim 6-compartment circulatory model takes as inputs 28 variables that together govern a hemodynamic system. It then deterministically simulates forward in time a set of 25 output variables according to a collection of differential equations (parameterized by the input variables) modeling hemodynamics. Important variables in CVSim include arterial pressure (AP), central venous pressure (CVP), total blood volume (TBV), and total peripheral resistance (TPR). In real patients, physicians observe AP and CVP and seek to keep them above a clinically safe threshold. They do this by intervening on TBV (through fluid administration) and TPR (through vasopressors).

We defined simulated treatment interventions that were designed to mimic the impact of fluids and vasopressors. These simulated interventions alter the natural course of the simulation by increasing either TBV (in the case of the simulated fluids intervention) or TPR (in the case of the simulated vasopressor intervention). We generated patients by randomly initiating baseline inputs (which we hid from our G-Nets to make this a stochastic modeling problem) within plausible physiologic ranges, then using CVSim to simulate covariates forward, intervening according to the relevant treatment strategy at each timestep. Full details of the simulation process can be found in the Appendix.

Under (stochastic) observational treatment strategy

, the probability of receiving a non-zero vasopressor or fluid dose at a given time increases as MAP and CVP decrease according to a logistic regression function. Given that a dose is non-zero, the exact amount is drawn from a normal distribution with mean inversely proportional to MAP and CVP. Since all drivers of treatment under

are observed in our data, the sequential exchangeability assumption holds and g-computation may be validly applied.

is similar to , except it is a deterministic treatment strategy and the functions linking treatment and dose to covariates have different coefficents. Under , treatment is always withheld. Again, details are in the Appendix.

Experiment Setup: We set ourselves the task of training a G-Net on and using it to predict the trajectories of patients in for time steps to for each . This setup is designed to evaluate the performance of a G-Net in a situation in which we observe data from past patients () who received usual care () for K timesteps and would like to predict how a new patient who has been observed for timesteps would fare were they to follow a different treatment strategy of interest () for timesteps to . This is a standard use case for counterfactual prediction. provides ground truth for a collection of patients whose trajectories follow just the path we are interested in predicting. Thus, by aggregating predictive performance metrics across simulated patients in we generate measures of the population level performance of our G-Net at the counterfactual prediction task for which it was intended.

With pass thru With sequential
:LR : identity : LSTM
: linear layers. : linear layers.
:RNN : identity : LSTM
: LSTMs. : LSTMs
Table 1: Experimental Model Setup Grid: Each cell summarizes the instantiations: , , , and ), of G-Net used in our experiments.

As shown in Table 1, we explore two specific criteria: (a) using sequential vs. Identity functions for and (b) using sequential models of entire patient history vs. linear models focusing only on current time point for . This provides us different implementations of G-Net.

The best parameters for each model found from grid search are as follows. For the hidden dimension for the representational LSTM is , the hidden dimension for the categorical LSTM is , the hidden dimension for the continuous LSTM is , and the learning rate is . For , the hidden dimension for the categorical LSTM is , the hidden dimension for the continuous LSTM is , and the learning rate is . For , the hidden dimension for the representation LSTM is , the hidden dimension for the categorical LSTM is , the hidden dimension for the continuous LSTM is , and the learning rate is . For all models, the batch size used was 64. These parameters were the ones used to achieve the results presented in the experiments and results section.

Figure 4: 100 G-Net simulated trajectories (blue) and ground truth (red) for one patient under .
(a) MSE
(b) MSE
(c) Calibration
(d) Calibration
Figure 5: Performance of various models: MSE and calibration over time for and

Evaluation: We evaluate the accuracy (Mean Squared Error - MSE) and calibration of the counterfactual simulations generated by our G-Nets as follows. Say comprises trajectories of random variable . Given observed history for patient from , a G-Net fit to produces (in our experiments, 100) simulations of the counterfactual covariate trajectory . These simulated trajectories are the light blue lines in Figure 4.

The G-Net’s point prediction of is its estimate of , i.e. the average of the simulations . This is the dark blue line in Figure 4.

If has dimension , we compute the MSE of counterfactual predictions by a G-Net G in the dataset as .

We assess the calibration of a G-Net

as follows. Given lower and upper quantiles

and , the calibration measures the frequency with which the actual counterfactual covariate is between the and quantiles of the simulations . If this frequency is approximately , then is well calibrated.

Experiments/Results: Using CVSim, we generated a total of 10,000 trajectories in (), of which 80% were used for training, and the remaining 20% for validation. For testing, we generated 500 observations in the datasets (). We included a total of 18 output variables (including all variables influencing treatment assignment under ) from CVSim to construct and ; each trajectory is of length 64 time steps (d=18, K=64). In each , the switching time point from to is fixed at 34 for all trajectories ().

Figure 6: Estimated (G-Net in Table 1) and actual population average trajectories under for select variables.
Figure 7: Treatment Effect for Selected Variables

We fit the four models described in Table 1 to the training portion of

(80%), and used the remaining portion as validation to tune our model hyperparameters. Next, given observed covariate history through 34 time steps and treatment history through 33 time steps of each trajectory in each

, we computed the MSE and calibration of the G-Nets’ counterfactual predictions for time steps 34 to 64.

Figure 5 (a) and (b) illustrates the performance of the various G-Net architectures in terms of MSE over time. Overall, for this experiment we found M3 (Identity function for and LSTM for the functions), performed best over both treatment strategies. Also, a comparative analysis of M1 vs M3 (both with Identity representation function for ) and M2 vs M4 (both with LSTM representation functions for ) showed that G-Net provide better estimates of conditional distributions by admitting sequential prediction models focusing on the entire patient history compared to prediction models focusing on a single time point.

Figure 5 (c) and (d) depicts calibration for the candidate models. All of the calibration coverage rates are below the nominal levels (they should be .5), though the RNN based G-Nets again perform better than the linear model implementation. This is in part because the counterfactual predictive density estimates in these experiments do not take into account uncertainty about model parameter estimates, which could be addressed with dropout as discussed in Section 4.

The G-Net can also be used to estimate population average counterfactual outcomes and treatment effects, quantities sometimes more relevant to policy decisions than individual level counterfactual predictions. Figure 6 displays G-Net (M3 from Table 1) estimates and true values of population average trajectories for select variables under . Figure 7 shows estimates and true values of the population average treatment effect of following as opposed to on select variables. We see that the G-Net does a good job of estimating these population level quantities of interest.

6 Discussion and Future Work

In the G-Net we have introduced a novel and flexible framework for counterfactual prediction under dynamic treatment strategies through G-computation using sequential deep learning models. We have illustrated several implementations of the G-Net in a realistically complex simulation setting where we had access to ground truth counterfactual outcomes. In particular, we considered alternative approaches to representation learning that either share representations of patient history across predictive tasks or keep them separate. In the particular implementation we considered, shared representations seemed to aid simple linear classifiers but harm LSTMs.

The G-Net framework’s flexibility means that there are many other alternative implementations to explore. For example, we might consider alternative architectures for representing patient history, such as attention mechanisms or memory networks.

Another direction of future work is incorporation of prior causal knowledge. For example, if it is known which variables are confounders and which merely predictive of the outcome, we might include weights in the loss function emphasizing faithful representation and prediction of confounding variables compared to non-confounders.

References

  • [1] M. A. Alaa, M. Weisz, and M. van der Schaar (2017) Deep counterfactual networks with propensity-dropout. In Proceedings of the 34th International Conference on Machine Learning (ICML), Cited by: §2.
  • [2] O. Atan, J. Jordan, and M. van der Schaar (2018) Deep-treat: learning optimal personalized treatments from observational data using neural networks. In Proceedings of AAAI, Cited by: §2.
  • [3] E. Choi, M. T. Bahadori, J. Sun, J. Kulas, A. Schuetz, and W. Stewart (2016) RETAIN: an interpretable predictive model for healthcare using reverse time attention mechanism. In Advances in Neural Information Processing Systems 29, D. D. Lee, M. Sugiyama, U. V. Luxburg, I. Guyon, and R. Garnett (Eds.), pp. 3504–3512. Cited by: §2.
  • [4] S. Finfer, J. Myburgh, and R. Bellomo (2018) Intravenous fluid therapy in critically ill adults. Nature Reviews Nephrology 14 (9), pp. 541. Cited by: §1.
  • [5] Y. Gal and Z. Ghahramani (2016) A theoretically grounded application of dropout in recurrent neural networks. In Advances in Neural Information Processing Systems, Cited by: §2, §4.
  • [6] T. Heldt, R. Mukkamala, G. B. Moody, and R. G. Mark (2010)

    CVSim: an open-source cardiovascular simulator for teaching and research

    .
    The open pacing, electrophysiology & therapy journal 3, pp. 45. Cited by: §A.1, §1.
  • [7] Heldt,T, R. Mukkamala, G. Moody, and R. Mark (2010) CVSim: an open-source cardiovascular simulator for teaching and research. Open Pacing, Electrophysiol & Ther J 3, pp. 45–54. Cited by: §5.
  • [8] M. Hernan and J. Robins (Forthcoming) Causal inference. , Chapman and Hall. External Links: ISBN Cited by: §1, §2.
  • [9] B. Lim, A. Alaa, and M. van der Schaar (2018) Forecasting treatment responses over time using recurrent marginal structural networks. In Neural Information Processing Systems (NIPS)., Cited by: §2.
  • [10] L. Orellana, J. Robins, and A. Rotnitzky (2008) Dynamic regime marginal structural mean models for estimation of optimal dynamic treatment regimes. The international journal of biostatistics. Cited by: §1.
  • [11] A. Rhodes, W. Evans, M. M. Levy, M. Antonelli, R. Ferrer, A. Kumar, J. E. Sevransky, C. L. Sprung, M. E. Nunnally, et al. (2017) Surviving sepsis campaign: international guidelines for management of sepsis and septic shock: 2016. Critical Care Medicine. External Links: ISBN 9781417642595, Link Cited by: 1st item, §A.4.3.
  • [12] J. Robins, M. Hernan, and B. Brumback (2000) Marginal structural models and causal inference in epidemiology. Epidemiology. Cited by: §1.
  • [13] J. Robins and M. Hernan (2009) Estimation of the causal effects of time varying exposures. In Longitudinal Data Analysis, G. Fitzmaurice, M. Davidian, G. Verbeke, and G. Molenberghs (Eds.), pp. 553–599. External Links: Link Cited by: §1, §2.
  • [14] J. Robins (1986) A new approach to causal inference in mortality studies with a sustained exposure period—application to control of the healthy worker survivor effect. Mathematical Modelling. Cited by: §1, §2, §3, §3.
  • [15] J. Robins (1986) A new approach to causal inference in mortality studies with a sustained exposure period—application to control of the healthy worker survivor effect. Mathematical modelling 7 (9-12), pp. 1393–1512. Cited by: §1.
  • [16] J. Robins (1987) A graphical approach to the identification and estimation of causal parameters in mortality studies with sustained exposure periods. Journal of Chronic Diseases. Cited by: §1.
  • [17] J. Robins (1994) Correcting for non-compliance in randomized trials using structural nested mean models. Communications in Statistics-Theory and Methods. Cited by: §1.
  • [18] P. Schulam and S. Saria (2017) Reliable decision support using counterfactual models. In Neural Information Processing Systems (NIPS)., Cited by: §2.
  • [19] S. Taubman, J. Robins, M. Mittleman, and M. Hernan (2009) Intervening on risk factors for coronary heart disease: an application of the parametric g-formula. International journal of epidemiology. Cited by: §1, §2.
  • [20] M. Titsias (2009) Variational learning of inducing variables in sparse gaussian processes. In International Conference on Aritifical Intelligence and Statistics, Cited by: §2.
  • [21] N. Tomašev, X. Glorot, J. W. Rae, M. Zielinski, H. Askham, A. Saraiva, A. Mottram, C. Meyer, S. Ravuri, I. Protsyuk, et al. (2019) A clinically applicable approach to continuous prediction of future acute kidney injury. Nature 572 (7767), pp. 116. Cited by: §2.
  • [22] M. van der Laan, M. Petersen, and M. Joffe (2005) History-adjusted marginal structural models and statically-optimal dynamic treatment regimens. The international journal of biostatistics. Cited by: §2.
  • [23] S. Vansteelandt and M. Joffe (2014) Structural nested models and g-estimation: the partially realized promise. Statistical Science. Cited by: §1.
  • [24] C. Xiao, E. Choi, and J. Sun (2018-06) Opportunities and challenges in developing deep learning models using electronic health records data: a systematic review. Journal of the American Medical Informatics Association 25 (10), pp. 1419–1428. Cited by: §2.
  • [25] J. Yoon, J. Jordan, and M. van der Schaar (2018) GANITE: estimation of individualized treatment effects using generative adversarial nets. In ICLR., Cited by: §2.
  • [26] J. Young, L. Cain, J. Robins, E. O’Reilly, and M. Hernan (2011) Comparative effectiveness of dynamic treatment regimes: an application of the parametric g-formula. Statistics in biosciences. Cited by: §2.

References

  • [1] M. A. Alaa, M. Weisz, and M. van der Schaar (2017) Deep counterfactual networks with propensity-dropout. In Proceedings of the 34th International Conference on Machine Learning (ICML), Cited by: §2.
  • [2] O. Atan, J. Jordan, and M. van der Schaar (2018) Deep-treat: learning optimal personalized treatments from observational data using neural networks. In Proceedings of AAAI, Cited by: §2.
  • [3] E. Choi, M. T. Bahadori, J. Sun, J. Kulas, A. Schuetz, and W. Stewart (2016) RETAIN: an interpretable predictive model for healthcare using reverse time attention mechanism. In Advances in Neural Information Processing Systems 29, D. D. Lee, M. Sugiyama, U. V. Luxburg, I. Guyon, and R. Garnett (Eds.), pp. 3504–3512. Cited by: §2.
  • [4] S. Finfer, J. Myburgh, and R. Bellomo (2018) Intravenous fluid therapy in critically ill adults. Nature Reviews Nephrology 14 (9), pp. 541. Cited by: §1.
  • [5] Y. Gal and Z. Ghahramani (2016) A theoretically grounded application of dropout in recurrent neural networks. In Advances in Neural Information Processing Systems, Cited by: §2, §4.
  • [6] T. Heldt, R. Mukkamala, G. B. Moody, and R. G. Mark (2010)

    CVSim: an open-source cardiovascular simulator for teaching and research

    .
    The open pacing, electrophysiology & therapy journal 3, pp. 45. Cited by: §A.1, §1.
  • [7] Heldt,T, R. Mukkamala, G. Moody, and R. Mark (2010) CVSim: an open-source cardiovascular simulator for teaching and research. Open Pacing, Electrophysiol & Ther J 3, pp. 45–54. Cited by: §5.
  • [8] M. Hernan and J. Robins (Forthcoming) Causal inference. , Chapman and Hall. External Links: ISBN Cited by: §1, §2.
  • [9] B. Lim, A. Alaa, and M. van der Schaar (2018) Forecasting treatment responses over time using recurrent marginal structural networks. In Neural Information Processing Systems (NIPS)., Cited by: §2.
  • [10] L. Orellana, J. Robins, and A. Rotnitzky (2008) Dynamic regime marginal structural mean models for estimation of optimal dynamic treatment regimes. The international journal of biostatistics. Cited by: §1.
  • [11] A. Rhodes, W. Evans, M. M. Levy, M. Antonelli, R. Ferrer, A. Kumar, J. E. Sevransky, C. L. Sprung, M. E. Nunnally, et al. (2017) Surviving sepsis campaign: international guidelines for management of sepsis and septic shock: 2016. Critical Care Medicine. External Links: ISBN 9781417642595, Link Cited by: 1st item, §A.4.3.
  • [12] J. Robins, M. Hernan, and B. Brumback (2000) Marginal structural models and causal inference in epidemiology. Epidemiology. Cited by: §1.
  • [13] J. Robins and M. Hernan (2009) Estimation of the causal effects of time varying exposures. In Longitudinal Data Analysis, G. Fitzmaurice, M. Davidian, G. Verbeke, and G. Molenberghs (Eds.), pp. 553–599. External Links: Link Cited by: §1, §2.
  • [14] J. Robins (1986) A new approach to causal inference in mortality studies with a sustained exposure period—application to control of the healthy worker survivor effect. Mathematical Modelling. Cited by: §1, §2, §3, §3.
  • [15] J. Robins (1986) A new approach to causal inference in mortality studies with a sustained exposure period—application to control of the healthy worker survivor effect. Mathematical modelling 7 (9-12), pp. 1393–1512. Cited by: §1.
  • [16] J. Robins (1987) A graphical approach to the identification and estimation of causal parameters in mortality studies with sustained exposure periods. Journal of Chronic Diseases. Cited by: §1.
  • [17] J. Robins (1994) Correcting for non-compliance in randomized trials using structural nested mean models. Communications in Statistics-Theory and Methods. Cited by: §1.
  • [18] P. Schulam and S. Saria (2017) Reliable decision support using counterfactual models. In Neural Information Processing Systems (NIPS)., Cited by: §2.
  • [19] S. Taubman, J. Robins, M. Mittleman, and M. Hernan (2009) Intervening on risk factors for coronary heart disease: an application of the parametric g-formula. International journal of epidemiology. Cited by: §1, §2.
  • [20] M. Titsias (2009) Variational learning of inducing variables in sparse gaussian processes. In International Conference on Aritifical Intelligence and Statistics, Cited by: §2.
  • [21] N. Tomašev, X. Glorot, J. W. Rae, M. Zielinski, H. Askham, A. Saraiva, A. Mottram, C. Meyer, S. Ravuri, I. Protsyuk, et al. (2019) A clinically applicable approach to continuous prediction of future acute kidney injury. Nature 572 (7767), pp. 116. Cited by: §2.
  • [22] M. van der Laan, M. Petersen, and M. Joffe (2005) History-adjusted marginal structural models and statically-optimal dynamic treatment regimens. The international journal of biostatistics. Cited by: §2.
  • [23] S. Vansteelandt and M. Joffe (2014) Structural nested models and g-estimation: the partially realized promise. Statistical Science. Cited by: §1.
  • [24] C. Xiao, E. Choi, and J. Sun (2018-06) Opportunities and challenges in developing deep learning models using electronic health records data: a systematic review. Journal of the American Medical Informatics Association 25 (10), pp. 1419–1428. Cited by: §2.
  • [25] J. Yoon, J. Jordan, and M. van der Schaar (2018) GANITE: estimation of individualized treatment effects using generative adversarial nets. In ICLR., Cited by: §2.
  • [26] J. Young, L. Cain, J. Robins, E. O’Reilly, and M. Hernan (2011) Comparative effectiveness of dynamic treatment regimes: an application of the parametric g-formula. Statistics in biosciences. Cited by: §2.

Appendix A Appendix

a.1 CVSim

CVSim is a open-source cardiovascular simulator, aiming for education and research purpose developed by Thomas et al[6]. In this work, we focus on CVSim-6C which consists of 6 components, functioning as pulmonary and systemic veins, arteries, and micro-circulations respectively. CVSim-6C is regulated by arterial baroreflex system to simulate Sah’s lumped hemodynamic model[6]. The aggregate model is capable of simulating pulsatile waveforms, cardiac output and venous return curves, and spontaneous beat-to-beat hemodynamic variability. In this work, we modified and built on CVSim by adding stochasticcomponents to it for the purposes of evaluating our coun-terfactual simulators. We call our stochastic simulation engine S-CVSim.

a.2 Inputs of S-CVSim

By varying hemodynamic parameters of CVSim, it can simulate cardiovascular system under various conditions. We, therefore, sample values of a subset of model parameters while initiating simulation at time 0 to obtain ideal distribution of trajectories. These model parameters are listed at Table 2. Note that this does not necessarily mean values of input covariates would not exceed or drop below such range at any time , where 0..

INPUT COVARIATES RANGE
Total Blood Volume 1,500 - 6,000
Nominal Heart Rate 40 - 160
Total Peripheral Resistance 0.1 - 1.4
Arterial Compliance 0.4 - 1.1
Pulmonary Arterial Compliance 0.1 - 19.9
Total Zero-pressure filling Volume 500 - 3,500
Pulmonary Arterial Compliance 2.0 - 3.4
Pulmonary Microcirculation Resistance 0.4 - 1.00
Table 2: Names and corresponding ranges of input parameters of S-CVSim.

a.3 Outputs of S-CVSim

At each time , CVSim-6C generates hemodynamic data including vascular resistance and varieties of type of flow, pressure, and volume. In addition to original 25 hemodynamic outputs, we introduce 3 new outputs, including systolic blood pressure, diastolic blood pressure, and mean arterial pressure, bringing the total number of output variables to 28. For this work, we only predict a subset of output covariates highlighted in Table 3. A complete list of output are listed in Table 3.

  • Systoblic Blood Pressure (SBP) is defined as the highest measured arterial blood pressure while heart contracting.

  • Diastolic Blood Pressure (DBP) is defined as the lowest measured arterial blood pressure while heart contracting.

  • Mean Arterial Pressure (MAP) is defined as the average pressure in a patient’s arteries during one cardiac cycle as follow formula.

OUTPUT COVARIATES
Left Ventricle Pressure* LVP
Left Ventricle Flow* LVQ
Left Ventricle Volume LVV
Left Ventricle Contractility* LVC
Right Ventricle Pressure* RVP
Right Ventricle Flow* RVQ
Right Ventricle Volume RVV
Right Ventricle Contractility* RVC
Central Venous Pressure* CVP
Central Venous Flow CVQ
Central Venous Volume CVV
Arterial Pressure* AP
Arterial Flow* AQ
Arterial Volume* AV
Pulmonary Arterial Pressure PAP
Pulmonary Arterial Flow PAQ
Pulmonary Arterial Volume PAV
Pulmonary Venous Pressure PVP
Pulmonary Venous Flow PVQ
Pulmonary Venous Volume* PVV
Heart Rate* HR
Arteriolar Resistance* AR
Venous Tone* VT
Total Blood Volume* TBV
Intra-thoracic Pressure* PTH
Mean Arterial Pressure* MAP
Systolic Blood Pressure* SBP
Diastolic Blood Pressure DBP
Table 3: Outputs of S-CVSim, denotes all the output of S-CVSim at time ; whereas, denotes a subset of output e.g. represents mean arterial pressure at time . Covariates highlighted with * are the selected output.

a.4 Simulation Process

To obtain observational data for our intended purpose, we implemented and built two extra events, Disease, and Treatment, . Note that at each time , both and could happen simultaneously but also happens before . With these stochastic events, and , and can be simulated and obtained on S-CVSim.

a.4.1 Data Generation,

To simulate a patient trajectory under treatment strategy , we obtain and with the following algorithm for each individual trajectory. , training set, consists of 10k trajectories based on . , test set, consists of 1k trajectories based on .

  • Initialize input variables

    by drawing from independent uniform distributions using a predefined plausible physiological ranges for each variable as

    Table 2 suggested.

  • For t in 0:K

    • Generate as , where is taken to be .

    • If = and t :

      • Generate as

    • Else

      • Generate as

a.4.2 Disease Simulation,

We introduce the concept to simulate hemodynamic instability of cardiovascular system such as sepsis and bleeding at each timestep. consists of two events, sepsis and blood loss. In module of , we denote that , and ; Note that blood loss and sepsis events are mutually exclusive in the simulation process at any time .

  • When sepsis happens, , meaning , total peripheral resistance, would decrease in at next time

  • When blood loss happens, , meaning that , total blood volume would decrease in at next time .

a.4.3 Treatment Simulation,

We developed two treatment strategy , policy for observational regime and , policy for counterfactual policy, to simulate clinical treatment and validate our model. Under any given , is defined as . Similar to disease simulation, is either , increasing quantity of total blood volume or increasing quantity of arterial resistance. The probability of choosing fluids is equal to vasopressor but will not be administered at the same time. The dosage of depends on a subset of which indicates hemodynamic balance. More specific, since adequate blood pressure is an important clinical goal[11], we denote mean arterial pressure, MAP, of 65 mmHg and central venous pressure, CVP, of 10 mmHg as target goals. Therefore, We define and as proxies of how much dosage should be delivered. The following section will discuss the difference of between and .

a.4.4 Observational Regime,

Under , probability and dosage of treatment are denoted as the followings

  • Probability of treatment, ,where

  • If we administer fluids, we generate the dose (in mL) .

  • If we administer vasopressors, we generate the dose if .

a.4.5 Counterfactual Regime,

Under , probability and dosage of treatment are denoted as the followings

  • Probability of treatment, if and only if and ShockIndex [11],

  • If we administer fluids, we generate the dose (in mL) .

  • If we administer vasopressors, we generate the dose if .

We experimented with multiple parameters and opted to use


Figure 8: Causal DAG for generated under observational regime . Green arrows denote directed paths from treatments to outcomes. Treatment only depends on current covariates. All covariates with arrows into treatment (i.e. MAP, CVP, and ) are observed, satisfying the sequential exchangeability assumption as in Figure 1.

The DAG in Figure 8 depicts the causal structure of . Note that only observed covariates have arrows pointing into treatment (treatment is actually only a function of CVP, MAP, and ), so that there is no unobserved confounding and g-computation may be applied.