1 Introduction
Counterfactual prediction is a fundamental task in decisionmaking. It entails the estimation of expected future trajectories of variables of interest under alternative courses of action (or treatment strategies) given observed history. Treatment strategies of interest are usually time varying (meaning they comprise decisions at multiple time points) and dynamic (meaning the treatment decision at each time point is a function of history up to that time point).
As an example, consider the problem of fluid administration in Intensive Care Units (ICUs) [4]
. It is frequently necessary for physicians to adopt strategies that administer large volumes of fluids to increase blood pressure and promote blood perfusion through organs in septic patients. However, such strategies can lead to fluid overload, which can have serious adverse downstream effects such as pulmonary edema. Fluid administration strategies are time varying and dynamic because at each time point physicians decide the volume of fluid to administer based on observed patient history (e.g. blood pressure and volume of fluid already administered) up to that time point. To aid in the choice between alternative dynamic fluid administration strategies, it would be desirable to obtain counterfactual predictions of a patient’s probability of developing fluid overload (and other outcomes of interest) were they to follow each alternative strategy going forward given their observed covariate history up to the current time.
Counterfactual prediction is an inherently causal task in that it must account for the causal effects of following different treatment strategies. When treatment strategies of interest are timevarying, socalled “gmethods” [8, 13] are required to estimate their effects. Gmethods include gcomputation [14, 16, 19], structural nested models [17, 23], and marginal structural models [12, 10]. Of these methods, gcomputation is best suited for estimating effects of general dynamic treatment strategies conditional on high dimensional patient histories [15].
Gcomputation works by estimating the conditional distribution of relevant covariates given covariate and treatment history at each time point, then producing Monte Carlo estimates of counterfactual outcomes by simulating forward patient trajectories under treatment strategies of interest. Regression model(s) for the covariates and outcomes at each time point conditional on observed history are a critical component of this method. While any regression models could in theory be input to the Gcomputation algorithm, most existing Gcomputation implementations have employed simple regression models with limited capacity to capture complex temporal and nonlinear dependence structures. In recent years, sequential deep learning methods such as Recurrent Neural Networks (RNNs) have achieved state of the art performance in predictive modeling of complex time series data while imposing minimal modeling assumptions. In this paper, we propose GNet, a sequential deep learning framework for Gcomputation. GNet admits the use of recurrent networks such as LSTMs to model covariates in a manner suitable for Gcomputation. The GNet framework is flexible and allows for various configurations depending on the problem at hand. To the best of our knowledge, this is the first work to investigate a RNN based approach to Gcomputation.
Unfortunately, it is impossible to reliably evaluate counterfactual predictions on real data, since only the outcomes corresponding to treatment strategies that were actually followed can be observed. Consequently, to explore and evaluate various implementations of GNet, we used simulated data in which counterfactual ground truth can be known. We used CVSim [6], a well established mechanistic model of the cardiovascular system, to estimate counterfactual simulated patient trajectories under various fluid and vasopressor administration strategies. These experiments provide a template for causal model evaluation using complex and physiologically realistic simulated longitudinal data.
2 Related Work
Several recent works have proposed a deep learning framework for counterfactual prediction from observational data, including [2, 1, 25]. However, these works have mostly focused on learning point exposure as opposed to timevarying treatment effects, which are the focus of this paper.
Gcomputation for estimating timevarying treatment effects was first proposed by Robins [14]. Illustrative applications of the general approach are provided in [19, 26], and summaries of gcomputation (and other “gmethods” for estimating timevarying treatment effects) can be found in [8, 13]. The gcomputation algorithm takes arbitrary regression models as inputs. While most applications (e.g. [19, 26]
) have thus far employed classical generalized linear models, there is no conceptual barrier to using more complex machine learning regression models. RNNs, and in particular LSTMs, have achieved state of the art performance on a wide variety of time series regression tasks, including healthcare related tasks
[21, 24, 3]. However, despite the popularity and success of RNNs for time series regression, we have not seen in the literature any “deep” implementation of gcomputation.Recently, Lim et al. [9] plugged RNN regression models into history adjusted marginal structural models (MSM) [22] to make counterfactual predictions. However, these MSMs can only make counterfactual predictions under timevarying treatment strategies that do not depend on recent covariate history. For example, a history adjusted MSM [9] could estimate the probability of fluid overload given patient history under the (static) treatment strategy “give 1 liter fluid each hour for the next 3 hours”, but it could not estimate the probability of fluid overload given patient history under the (dynamic) treatment strategy “each hour for the next 3 hours, if blood pressure is less than 65 give 1 liter fluids, otherwise give 0 liters”. History adjusted MSMs cannot estimate effects of timevarying treatment strategies that respond to changes in the patient’s health history, but gcomputation can. Further, gcomputation is able to straightforwardly estimate the of a counterfactual outcome under a timevarying treatment strategy. This is not straightforward to do with history adjusted MSMs.
Schulam and Saria [18] propose Counterfactual Gaussian Processes, an implementation of continuous time gcomputation, and like us apply their method to ICU data. They only consider static timevarying treatment strategies, though it appears that their method might straightforwardly be extended to handle dynamic strategies as well. An advantage of Gaussian processes is interpretable uncertainty quantification. However, GPs are intractable for large datasets since they have time complexity of , where N is the number of observations [20]. Sparse GPs, which introduce inducing points, have at least [20] time complexity. RNNs are more scalable. Recurrent dropout can also be employed in RNN based implementations to produce uncertainty estimates that approximate posterior distributions from a Gaussian process [5].
3 Gcomputation for counterfactual prediction
Our goal is to predict patient outcomes under various future treatment strategies given observed patient histories. Let:

denote time, assumed discrete, with being the end of followup;

denote the observed treatment action at time ;

denote the observed value of the outcome at time

denote a vector of covariates at time
that may influence treatment decisions or be associated with the outcome; 
denote the history and denote the future for arbitrary time varying variable .
At each time point, we assume the causal ordering . Let denote patient history preceding treatment at time . A dynamic treatment strategy is a collection of functions , one per time point, such that maps onto a treatment action at time . A simple dynamic strategy for fluid volume might be , i.e. give .5 liters of fluid if mean arterial blood pressure is less than 65 at time .
Let denote the counterfactual outcome that would be observed at time had, possibly contrary to fact, treatment strategy been followed from baseline [14]. Further, let with denote the counterfactual outcome that would be observed had the patient received their observed treatments through time then followed strategy from time onward.
In counterfactual point prediction, our goal is to estimate expected counterfactual patient outcome trajectories
(1) 
given observed patient history through time for any and any specified treatment strategy . We might also be interested in estimating the counterfactual outcome distributions at future time points
(2) 
If we do not condition on anything in , then (1) is an expectation (and (2) a distribution) over the full population. If we condition on a small subset of variables contained in patient history, then (1) is an expectation (and (2) a distribution) over a subpopulation. If we condition on all elements of a patient history, then (1) is still technically only an expectation (and (2) a distribution) over a hypothetical subpopulation with the exact patient history conditioned on, but in this case (1) and (2) practically amount to what is usually meant by personalized prediction.
Under the below standard assumptions, we can estimate (1) and (2) through gcomputation [14].

[labelsep=2pt, leftmargin=10pt, topsep=0pt,itemsep=0ex,partopsep=1ex,parsep=1ex]

Consistency:

Sequential Exchangeability:

Positivity:
Assumption 1 states that the observed outcome is equal to the counterfactual outcome corresponding to the observed treatment. Assumption 2 states that there is no unobserved confounding of treatment at any time and any future outcome. Assumption 2 would hold, for example, under the conditions depicted in Figure 1. Positivity states that the counterfactual treatment strategy of interest has some nonzero probability of actually being followed. Under the assumption that we specify certain predictive models correctly such that their predictions extrapolate to parts of a joint distribution that they were not trained on, positivity is not strictly necessary.
Under assumptions 13, for we have simply that
(3) 
i.e. the conditional distribution of the counterfactual is simply the conditional distribution of the observed outcome given patient history and given that treatment follows the strategy of interest. For , things are slightly more complex because we need to adjust for timevarying confounding. With
for any random variable X, under Assumptions 13 the gformula yields
(4) 
It is not generally possible to compute this integral in closed form, but it could be approximated through MonteCarlo simulation. We repeat the recursive process shown in Algorithm 1 times. (Here the outcome is without loss of generality deemed to be a variable in the vector .)
At the end of this process, we have simulated draws of the counterfactual outcome for each time . For each , the empirical distribution of these draws constitutes a MonteCarlo approximation of the counterfactual outcome distribution (2). The sample averages of the draws at each time are an estimate of the conditional expectations (1) and can serve as point predictions for in a patient with history .
Key to the gcomputation algorithm is the ability to simulate from joint conditional distributions of the covariates given patient history at time
. Of course, in practice we do not have prior knowledge of these conditional distributions and need to estimate them from data. Most implementations use generalized linear regression models to estimate the conditional distributions of the covariates. Often, these models do not capture temporal dependencies present in the patient data. We propose the GNet for this task.
4 GNet Design
The proposed GNet framework depicted in Figure 2 enables the use of sequential deep learning models to estimate conditional distributions of covariates given history at each time point and perform the Gcomputation algorithm described in Algorithm 1 to simulate covariates under various treatment strategies. In this setting, without loss of generality, we set as one of the covariates in for notational simplicity.
Let denote components of the vector , where each component could be multivariate. We impose an arbitrary ordering and estimate the conditional distributions of each given all variables preceding it in this ordering. At simulation time, we exploit the basic probability identity
to simulate from by sequentially simulating each from . There are at least two reasons to allow for subdivision of the covariates. First, if covariates are of different types (e.g. continuous, categorical, count, etc.), it is difficult to simultaneously simulate from their joint distribution. Second, customizing models for each covariate component can potentially lead to better performance. Figure 2 illustrates this decomposition where at each time point the components are depicted via ordered (grey shaded) boxes that are responsible for the estimation of the various terms needed to compute the conditional distributions. One could set and model all covariates simultaneously or at the other extreme set to be the total number of variables.
The sequential model used in GNet provides us with estimates of the conditional expectations for all and . To simulate from , we proceed as follows. If is multinomial, its conditional expectation defines its conditional density. If has a continuous density, there are various approaches we might take to simulate from its conditional distribution. Without making parametric assumptions, we could simulate from
(5) 
where is a draw from the empirical distribution of the residuals in a holdout set not used to fit the model parameters used to generate as an estimate of . This method makes the simplifying assumption that the covariate error distribution does not depend on patient history. This is the approach we take in the experiments in this paper, and is depicted in the simulation noise nodes at the top of Figure 2. Alternatively, we might specify a parametric distribution for , e.g. a Gaussian, and directly estimate its parameters by maximum likelihood.
To obtain the estimates for the conditional distribution of covariates, GNet admits sequential modeling of patients’ data using the group decomposition and arbitrary ordering outlined before. As shown in the yellow boxes in Figure 2, at each time , a representation of the patient history can be computed as
(6) 
where represents model parameters learned during training. In its simplest form, may just be an identity function passing the covariates. In other configurations (e.g. using the selector in Figure 2), can be used to provide abstractions of histories using sequential learning architectures such as RNN. This formulation of allows for a great deal of flexibility in how information is shared across variables and time.
Estimates from each of the covariate groups can then be obtained, as shown in Figure 2, by successive estimation of conditional expectations of covariates. Specifically, the conditional expectation of each given the representation of patient history and the other variables from time that precede it in the arbitrary predefined ordering is estimated by the functions . The complete sequence can be given as follows:
(7) 
where, each of the represents the specialized estimation function for group and are the learnable parameters for the same.
Depending on modeling choices, can be sequential models specialized for group or models focusing only on the time step (e.g. linear models).
Given GNet parameters, the distribution of the Monte Carlo simulations produced by Gcomputation Algorithm 1 constitute an estimate of uncertainty about a counterfactual prediction. But this estimate ignores uncertainty about the GNet parameter estimates themselves. One way to incorporate such model uncertainty would be to fit a Bayesian model and, before each Monte Carlo trajectory simulation in Gcomputation, draw new network parameters from their posterior distribution. These Monte Carlo draws would be from the posterior predictive distribution of the counterfactual outcome. Bayesian deep learning can be prohibitively computationally intensive, but can be approximated through dropout
[5]. First, we fit the RNN component of the GNet using recurrent dropout, with dropout masks at each layer held constant across time as described in [5]. Let denote a dropout mask constant across time for each layer of the RNN, denote estimated RNN parameters with mask applied, and denote the distribution from which was sampled during training. Then, during gcomputation, we add step 0 to the beginning of Algorithm 1 before each simulation:and compute all simulations plugging into (4). Then draws of are from an approximation to the posterior distribution of under a particular Gaussian Process prior. Therefore, the Monte Carlo simulations obtained from gcomputation incorporating dropout (i.e. adding step 0 as above) approximate draws from the posterior predictive distribution of the counterfactual outcome.
The parameters, and
, are learned by optimizing a loss function forcing the GNet to accurately estimate covariates
at each time point using standard gradient descent techniques. It is to be noted that it is necessary to use teacherforcing (using observed values of as arguments toin equation ) to obtain unbiased estimates of
, as shown in Figure 2.5 Simulation Experiments Using CVSim
To evaluate counterfactual predictions, it is necessary to use simulated data in which counterfactual ground truth for outcomes under alternative treatment strategies is known. To this end, we performed experiments on data generated by CVSim [7], a program that simulates the dynamics of the human cardiovascular system.
Data Generation: We generated an ‘observational’ dataset under treatment regime and two ‘counterfactual’ datasets and under treatment regimes and . The data generating processes producing and were the same except for the treatment assignment rules. For each , was identical to for the first simulation time steps before diverging to a different treatment rule for time steps to as illustrated in Figure 3.
A CVSim 6compartment circulatory model takes as inputs 28 variables that together govern a hemodynamic system. It then deterministically simulates forward in time a set of 25 output variables according to a collection of differential equations (parameterized by the input variables) modeling hemodynamics. Important variables in CVSim include arterial pressure (AP), central venous pressure (CVP), total blood volume (TBV), and total peripheral resistance (TPR). In real patients, physicians observe AP and CVP and seek to keep them above a clinically safe threshold. They do this by intervening on TBV (through fluid administration) and TPR (through vasopressors).
We defined simulated treatment interventions that were designed to mimic the impact of fluids and vasopressors. These simulated interventions alter the natural course of the simulation by increasing either TBV (in the case of the simulated fluids intervention) or TPR (in the case of the simulated vasopressor intervention). We generated patients by randomly initiating baseline inputs (which we hid from our GNets to make this a stochastic modeling problem) within plausible physiologic ranges, then using CVSim to simulate covariates forward, intervening according to the relevant treatment strategy at each timestep. Full details of the simulation process can be found in the Appendix.
Under (stochastic) observational treatment strategy
, the probability of receiving a nonzero vasopressor or fluid dose at a given time increases as MAP and CVP decrease according to a logistic regression function. Given that a dose is nonzero, the exact amount is drawn from a normal distribution with mean inversely proportional to MAP and CVP. Since all drivers of treatment under
are observed in our data, the sequential exchangeability assumption holds and gcomputation may be validly applied.is similar to , except it is a deterministic treatment strategy and the functions linking treatment and dose to covariates have different coefficents. Under , treatment is always withheld. Again, details are in the Appendix.
Experiment Setup: We set ourselves the task of training a GNet on and using it to predict the trajectories of patients in for time steps to for each . This setup is designed to evaluate the performance of a GNet in a situation in which we observe data from past patients () who received usual care () for K timesteps and would like to predict how a new patient who has been observed for timesteps would fare were they to follow a different treatment strategy of interest () for timesteps to . This is a standard use case for counterfactual prediction. provides ground truth for a collection of patients whose trajectories follow just the path we are interested in predicting. Thus, by aggregating predictive performance metrics across simulated patients in we generate measures of the population level performance of our GNet at the counterfactual prediction task for which it was intended.
With pass thru  With sequential  

:LR  : identity  : LSTM 
: linear layers.  : linear layers.  
:RNN  : identity  : LSTM 
: LSTMs.  : LSTMs 
As shown in Table 1, we explore two specific criteria: (a) using sequential vs. Identity functions for and (b) using sequential models of entire patient history vs. linear models focusing only on current time point for . This provides us different implementations of GNet.
The best parameters for each model found from grid search are as follows. For the hidden dimension for the representational LSTM is , the hidden dimension for the categorical LSTM is , the hidden dimension for the continuous LSTM is , and the learning rate is . For , the hidden dimension for the categorical LSTM is , the hidden dimension for the continuous LSTM is , and the learning rate is . For , the hidden dimension for the representation LSTM is , the hidden dimension for the categorical LSTM is , the hidden dimension for the continuous LSTM is , and the learning rate is . For all models, the batch size used was 64. These parameters were the ones used to achieve the results presented in the experiments and results section.
Evaluation: We evaluate the accuracy (Mean Squared Error  MSE) and calibration of the counterfactual simulations generated by our GNets as follows. Say comprises trajectories of random variable . Given observed history for patient from , a GNet fit to produces (in our experiments, 100) simulations of the counterfactual covariate trajectory . These simulated trajectories are the light blue lines in Figure 4.
The GNet’s point prediction of is its estimate of , i.e. the average of the simulations . This is the dark blue line in Figure 4.
If has dimension , we compute the MSE of counterfactual predictions by a GNet G in the dataset as .
We assess the calibration of a GNet
as follows. Given lower and upper quantiles
and , the calibration measures the frequency with which the actual counterfactual covariate is between the and quantiles of the simulations . If this frequency is approximately , then is well calibrated.Experiments/Results: Using CVSim, we generated a total of 10,000 trajectories in (), of which 80% were used for training, and the remaining 20% for validation. For testing, we generated 500 observations in the datasets (). We included a total of 18 output variables (including all variables influencing treatment assignment under ) from CVSim to construct and ; each trajectory is of length 64 time steps (d=18, K=64). In each , the switching time point from to is fixed at 34 for all trajectories ().
We fit the four models described in Table 1 to the training portion of
(80%), and used the remaining portion as validation to tune our model hyperparameters. Next, given observed covariate history through 34 time steps and treatment history through 33 time steps of each trajectory in each
, we computed the MSE and calibration of the GNets’ counterfactual predictions for time steps 34 to 64.Figure 5 (a) and (b) illustrates the performance of the various GNet architectures in terms of MSE over time. Overall, for this experiment we found M3 (Identity function for and LSTM for the functions), performed best over both treatment strategies. Also, a comparative analysis of M1 vs M3 (both with Identity representation function for ) and M2 vs M4 (both with LSTM representation functions for ) showed that GNet provide better estimates of conditional distributions by admitting sequential prediction models focusing on the entire patient history compared to prediction models focusing on a single time point.
Figure 5 (c) and (d) depicts calibration for the candidate models. All of the calibration coverage rates are below the nominal levels (they should be .5), though the RNN based GNets again perform better than the linear model implementation. This is in part because the counterfactual predictive density estimates in these experiments do not take into account uncertainty about model parameter estimates, which could be addressed with dropout as discussed in Section 4.
The GNet can also be used to estimate population average counterfactual outcomes and treatment effects, quantities sometimes more relevant to policy decisions than individual level counterfactual predictions. Figure 6 displays GNet (M3 from Table 1) estimates and true values of population average trajectories for select variables under . Figure 7 shows estimates and true values of the population average treatment effect of following as opposed to on select variables. We see that the GNet does a good job of estimating these population level quantities of interest.
6 Discussion and Future Work
In the GNet we have introduced a novel and flexible framework for counterfactual prediction under dynamic treatment strategies through Gcomputation using sequential deep learning models. We have illustrated several implementations of the GNet in a realistically complex simulation setting where we had access to ground truth counterfactual outcomes. In particular, we considered alternative approaches to representation learning that either share representations of patient history across predictive tasks or keep them separate. In the particular implementation we considered, shared representations seemed to aid simple linear classifiers but harm LSTMs.
The GNet framework’s flexibility means that there are many other alternative implementations to explore. For example, we might consider alternative architectures for representing patient history, such as attention mechanisms or memory networks.
Another direction of future work is incorporation of prior causal knowledge. For example, if it is known which variables are confounders and which merely predictive of the outcome, we might include weights in the loss function emphasizing faithful representation and prediction of confounding variables compared to nonconfounders.
References
 [1] (2017) Deep counterfactual networks with propensitydropout. In Proceedings of the 34th International Conference on Machine Learning (ICML), Cited by: §2.
 [2] (2018) Deeptreat: learning optimal personalized treatments from observational data using neural networks. In Proceedings of AAAI, Cited by: §2.
 [3] (2016) RETAIN: an interpretable predictive model for healthcare using reverse time attention mechanism. In Advances in Neural Information Processing Systems 29, D. D. Lee, M. Sugiyama, U. V. Luxburg, I. Guyon, and R. Garnett (Eds.), pp. 3504–3512. Cited by: §2.
 [4] (2018) Intravenous fluid therapy in critically ill adults. Nature Reviews Nephrology 14 (9), pp. 541. Cited by: §1.
 [5] (2016) A theoretically grounded application of dropout in recurrent neural networks. In Advances in Neural Information Processing Systems, Cited by: §2, §4.

[6]
(2010)
CVSim: an opensource cardiovascular simulator for teaching and research
. The open pacing, electrophysiology & therapy journal 3, pp. 45. Cited by: §A.1, §1.  [7] (2010) CVSim: an opensource cardiovascular simulator for teaching and research. Open Pacing, Electrophysiol & Ther J 3, pp. 45–54. Cited by: §5.
 [8] (Forthcoming) Causal inference. , Chapman and Hall. External Links: ISBN Cited by: §1, §2.
 [9] (2018) Forecasting treatment responses over time using recurrent marginal structural networks. In Neural Information Processing Systems (NIPS)., Cited by: §2.
 [10] (2008) Dynamic regime marginal structural mean models for estimation of optimal dynamic treatment regimes. The international journal of biostatistics. Cited by: §1.
 [11] (2017) Surviving sepsis campaign: international guidelines for management of sepsis and septic shock: 2016. Critical Care Medicine. External Links: ISBN 9781417642595, Link Cited by: 1st item, §A.4.3.
 [12] (2000) Marginal structural models and causal inference in epidemiology. Epidemiology. Cited by: §1.
 [13] (2009) Estimation of the causal effects of time varying exposures. In Longitudinal Data Analysis, G. Fitzmaurice, M. Davidian, G. Verbeke, and G. Molenberghs (Eds.), pp. 553–599. External Links: Link Cited by: §1, §2.
 [14] (1986) A new approach to causal inference in mortality studies with a sustained exposure period—application to control of the healthy worker survivor effect. Mathematical Modelling. Cited by: §1, §2, §3, §3.
 [15] (1986) A new approach to causal inference in mortality studies with a sustained exposure period—application to control of the healthy worker survivor effect. Mathematical modelling 7 (912), pp. 1393–1512. Cited by: §1.
 [16] (1987) A graphical approach to the identification and estimation of causal parameters in mortality studies with sustained exposure periods. Journal of Chronic Diseases. Cited by: §1.
 [17] (1994) Correcting for noncompliance in randomized trials using structural nested mean models. Communications in StatisticsTheory and Methods. Cited by: §1.
 [18] (2017) Reliable decision support using counterfactual models. In Neural Information Processing Systems (NIPS)., Cited by: §2.
 [19] (2009) Intervening on risk factors for coronary heart disease: an application of the parametric gformula. International journal of epidemiology. Cited by: §1, §2.
 [20] (2009) Variational learning of inducing variables in sparse gaussian processes. In International Conference on Aritifical Intelligence and Statistics, Cited by: §2.
 [21] (2019) A clinically applicable approach to continuous prediction of future acute kidney injury. Nature 572 (7767), pp. 116. Cited by: §2.
 [22] (2005) Historyadjusted marginal structural models and staticallyoptimal dynamic treatment regimens. The international journal of biostatistics. Cited by: §2.
 [23] (2014) Structural nested models and gestimation: the partially realized promise. Statistical Science. Cited by: §1.
 [24] (201806) Opportunities and challenges in developing deep learning models using electronic health records data: a systematic review. Journal of the American Medical Informatics Association 25 (10), pp. 1419–1428. Cited by: §2.
 [25] (2018) GANITE: estimation of individualized treatment effects using generative adversarial nets. In ICLR., Cited by: §2.
 [26] (2011) Comparative effectiveness of dynamic treatment regimes: an application of the parametric gformula. Statistics in biosciences. Cited by: §2.
References
 [1] (2017) Deep counterfactual networks with propensitydropout. In Proceedings of the 34th International Conference on Machine Learning (ICML), Cited by: §2.
 [2] (2018) Deeptreat: learning optimal personalized treatments from observational data using neural networks. In Proceedings of AAAI, Cited by: §2.
 [3] (2016) RETAIN: an interpretable predictive model for healthcare using reverse time attention mechanism. In Advances in Neural Information Processing Systems 29, D. D. Lee, M. Sugiyama, U. V. Luxburg, I. Guyon, and R. Garnett (Eds.), pp. 3504–3512. Cited by: §2.
 [4] (2018) Intravenous fluid therapy in critically ill adults. Nature Reviews Nephrology 14 (9), pp. 541. Cited by: §1.
 [5] (2016) A theoretically grounded application of dropout in recurrent neural networks. In Advances in Neural Information Processing Systems, Cited by: §2, §4.

[6]
(2010)
CVSim: an opensource cardiovascular simulator for teaching and research
. The open pacing, electrophysiology & therapy journal 3, pp. 45. Cited by: §A.1, §1.  [7] (2010) CVSim: an opensource cardiovascular simulator for teaching and research. Open Pacing, Electrophysiol & Ther J 3, pp. 45–54. Cited by: §5.
 [8] (Forthcoming) Causal inference. , Chapman and Hall. External Links: ISBN Cited by: §1, §2.
 [9] (2018) Forecasting treatment responses over time using recurrent marginal structural networks. In Neural Information Processing Systems (NIPS)., Cited by: §2.
 [10] (2008) Dynamic regime marginal structural mean models for estimation of optimal dynamic treatment regimes. The international journal of biostatistics. Cited by: §1.
 [11] (2017) Surviving sepsis campaign: international guidelines for management of sepsis and septic shock: 2016. Critical Care Medicine. External Links: ISBN 9781417642595, Link Cited by: 1st item, §A.4.3.
 [12] (2000) Marginal structural models and causal inference in epidemiology. Epidemiology. Cited by: §1.
 [13] (2009) Estimation of the causal effects of time varying exposures. In Longitudinal Data Analysis, G. Fitzmaurice, M. Davidian, G. Verbeke, and G. Molenberghs (Eds.), pp. 553–599. External Links: Link Cited by: §1, §2.
 [14] (1986) A new approach to causal inference in mortality studies with a sustained exposure period—application to control of the healthy worker survivor effect. Mathematical Modelling. Cited by: §1, §2, §3, §3.
 [15] (1986) A new approach to causal inference in mortality studies with a sustained exposure period—application to control of the healthy worker survivor effect. Mathematical modelling 7 (912), pp. 1393–1512. Cited by: §1.
 [16] (1987) A graphical approach to the identification and estimation of causal parameters in mortality studies with sustained exposure periods. Journal of Chronic Diseases. Cited by: §1.
 [17] (1994) Correcting for noncompliance in randomized trials using structural nested mean models. Communications in StatisticsTheory and Methods. Cited by: §1.
 [18] (2017) Reliable decision support using counterfactual models. In Neural Information Processing Systems (NIPS)., Cited by: §2.
 [19] (2009) Intervening on risk factors for coronary heart disease: an application of the parametric gformula. International journal of epidemiology. Cited by: §1, §2.
 [20] (2009) Variational learning of inducing variables in sparse gaussian processes. In International Conference on Aritifical Intelligence and Statistics, Cited by: §2.
 [21] (2019) A clinically applicable approach to continuous prediction of future acute kidney injury. Nature 572 (7767), pp. 116. Cited by: §2.
 [22] (2005) Historyadjusted marginal structural models and staticallyoptimal dynamic treatment regimens. The international journal of biostatistics. Cited by: §2.
 [23] (2014) Structural nested models and gestimation: the partially realized promise. Statistical Science. Cited by: §1.
 [24] (201806) Opportunities and challenges in developing deep learning models using electronic health records data: a systematic review. Journal of the American Medical Informatics Association 25 (10), pp. 1419–1428. Cited by: §2.
 [25] (2018) GANITE: estimation of individualized treatment effects using generative adversarial nets. In ICLR., Cited by: §2.
 [26] (2011) Comparative effectiveness of dynamic treatment regimes: an application of the parametric gformula. Statistics in biosciences. Cited by: §2.
Appendix A Appendix
a.1 CVSim
CVSim is a opensource cardiovascular simulator, aiming for education and research purpose developed by Thomas et al[6]. In this work, we focus on CVSim6C which consists of 6 components, functioning as pulmonary and systemic veins, arteries, and microcirculations respectively. CVSim6C is regulated by arterial baroreflex system to simulate Sah’s lumped hemodynamic model[6]. The aggregate model is capable of simulating pulsatile waveforms, cardiac output and venous return curves, and spontaneous beattobeat hemodynamic variability. In this work, we modified and built on CVSim by adding stochasticcomponents to it for the purposes of evaluating our counterfactual simulators. We call our stochastic simulation engine SCVSim.
a.2 Inputs of SCVSim
By varying hemodynamic parameters of CVSim, it can simulate cardiovascular system under various conditions. We, therefore, sample values of a subset of model parameters while initiating simulation at time 0 to obtain ideal distribution of trajectories. These model parameters are listed at Table 2. Note that this does not necessarily mean values of input covariates would not exceed or drop below such range at any time , where 0..
INPUT COVARIATES  RANGE 

Total Blood Volume  1,500  6,000 
Nominal Heart Rate  40  160 
Total Peripheral Resistance  0.1  1.4 
Arterial Compliance  0.4  1.1 
Pulmonary Arterial Compliance  0.1  19.9 
Total Zeropressure filling Volume  500  3,500 
Pulmonary Arterial Compliance  2.0  3.4 
Pulmonary Microcirculation Resistance  0.4  1.00 
a.3 Outputs of SCVSim
At each time , CVSim6C generates hemodynamic data including vascular resistance and varieties of type of flow, pressure, and volume. In addition to original 25 hemodynamic outputs, we introduce 3 new outputs, including systolic blood pressure, diastolic blood pressure, and mean arterial pressure, bringing the total number of output variables to 28. For this work, we only predict a subset of output covariates highlighted in Table 3. A complete list of output are listed in Table 3.

Systoblic Blood Pressure (SBP) is defined as the highest measured arterial blood pressure while heart contracting.

Diastolic Blood Pressure (DBP) is defined as the lowest measured arterial blood pressure while heart contracting.

Mean Arterial Pressure (MAP) is defined as the average pressure in a patient’s arteries during one cardiac cycle as follow formula.
OUTPUT COVARIATES  

Left Ventricle Pressure*  LVP 
Left Ventricle Flow*  LVQ 
Left Ventricle Volume  LVV 
Left Ventricle Contractility*  LVC 
Right Ventricle Pressure*  RVP 
Right Ventricle Flow*  RVQ 
Right Ventricle Volume  RVV 
Right Ventricle Contractility*  RVC 
Central Venous Pressure*  CVP 
Central Venous Flow  CVQ 
Central Venous Volume  CVV 
Arterial Pressure*  AP 
Arterial Flow*  AQ 
Arterial Volume*  AV 
Pulmonary Arterial Pressure  PAP 
Pulmonary Arterial Flow  PAQ 
Pulmonary Arterial Volume  PAV 
Pulmonary Venous Pressure  PVP 
Pulmonary Venous Flow  PVQ 
Pulmonary Venous Volume*  PVV 
Heart Rate*  HR 
Arteriolar Resistance*  AR 
Venous Tone*  VT 
Total Blood Volume*  TBV 
Intrathoracic Pressure*  PTH 
Mean Arterial Pressure*  MAP 
Systolic Blood Pressure*  SBP 
Diastolic Blood Pressure  DBP 
a.4 Simulation Process
To obtain observational data for our intended purpose, we implemented and built two extra events, Disease, and Treatment, . Note that at each time , both and could happen simultaneously but also happens before . With these stochastic events, and , and can be simulated and obtained on SCVSim.
a.4.1 Data Generation,
To simulate a patient trajectory under treatment strategy , we obtain and with the following algorithm for each individual trajectory. , training set, consists of 10k trajectories based on . , test set, consists of 1k trajectories based on .

Initialize input variables
by drawing from independent uniform distributions using a predefined plausible physiological ranges for each variable as
Table 2 suggested. 
For t in 0:K

Generate as , where is taken to be .

If = and t :

Generate as


Else

Generate as


a.4.2 Disease Simulation,
We introduce the concept to simulate hemodynamic instability of cardiovascular system such as sepsis and bleeding at each timestep. consists of two events, sepsis and blood loss. In module of , we denote that , and ; Note that blood loss and sepsis events are mutually exclusive in the simulation process at any time .

When sepsis happens, , meaning , total peripheral resistance, would decrease in at next time

When blood loss happens, , meaning that , total blood volume would decrease in at next time .
a.4.3 Treatment Simulation,
We developed two treatment strategy , policy for observational regime and , policy for counterfactual policy, to simulate clinical treatment and validate our model. Under any given , is defined as . Similar to disease simulation, is either , increasing quantity of total blood volume or increasing quantity of arterial resistance. The probability of choosing fluids is equal to vasopressor but will not be administered at the same time. The dosage of depends on a subset of which indicates hemodynamic balance. More specific, since adequate blood pressure is an important clinical goal[11], we denote mean arterial pressure, MAP, of 65 mmHg and central venous pressure, CVP, of 10 mmHg as target goals. Therefore, We define and as proxies of how much dosage should be delivered. The following section will discuss the difference of between and .
a.4.4 Observational Regime,
Under , probability and dosage of treatment are denoted as the followings

Probability of treatment, ,where

If we administer fluids, we generate the dose (in mL) .

If we administer vasopressors, we generate the dose if .
a.4.5 Counterfactual Regime,
Under , probability and dosage of treatment are denoted as the followings

Probability of treatment, if and only if and ShockIndex [11],

If we administer fluids, we generate the dose (in mL) .

If we administer vasopressors, we generate the dose if .
We experimented with multiple parameters and opted to use
The DAG in Figure 8 depicts the causal structure of . Note that only observed covariates have arrows pointing into treatment (treatment is actually only a function of CVP, MAP, and ), so that there is no unobserved confounding and gcomputation may be applied.
Comments
There are no comments yet.