Bayesian analysis of longitudinal studies with treatment by indication

by   Reagan Mozer, et al.
Bentley University

It is often of interest in observational studies to measure the causal effect of a treatment on time-to-event outcomes. In a medical setting, observational studies commonly involve patients who initiate medication therapy and others who do not, and the goal is to infer the effect of medication therapy on time until recovery, a pre-defined level of improvement, or some other time-to-event outcome. A difficulty with such studies is that the notion of a medication initiation time does not exist in the control group. We propose an approach to infer causal effects of an intervention in longitudinal observational studies when the time of treatment assignment is only observed for treated units and where treatment is given by indication. We present a framework for conceptualizing an underlying randomized experiment in this setting based on separating the process that governs the time of study arm assignment from the mechanism that determines the assignment. Our approach involves inferring the missing times of assignment followed by estimating treatment effects. This approach allows us to incorporate uncertainty about the missing times of study arm assignment, which induces uncertainty in both the selection of the control group and the measurement of time-to-event outcomes for these controls. We demonstrate our approach to study the effects on mortality of inappropriately prescribing phosphodiesterase type 5 inhibitors (PDE5Is), a medication contraindicated for groups 2 and 3 pulmonary hypertension, using administrative data from the Veterans Affairs (VA) health care system.



There are no comments yet.


page 1

page 2

page 3

page 4


Assessing causal effects in the presence of treatment switching through principal stratification

Clinical trials focusing on survival outcomes often allow patients in th...

Causal-BALD: Deep Bayesian Active Learning of Outcomes to Infer Treatment-Effects from Observational Data

Estimating personalized treatment effects from high-dimensional observat...

Natural Experiments

The term natural experiment is used inconsistently. In one interpretatio...

Factor-augmented Bayesian treatment effects models for panel outcomes

We propose a new, flexible model for inference of the effect of a binary...

Balance Regularized Neural Network Models for Causal Effect Estimation

Estimating individual and average treatment effects from observational d...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

In certain observational studies, particularly in a medical context, interest centers on estimating the causal effects of an action, treatment, or intervention on a time-to-event outcome (e.g., the effects of surgery on post-operative time to death). Insights about the effects of these non-randomized treatments have the potential to help answer important questions both in population health and in individualized medicine. In these settings, measuring time-to-event outcomes in a treated group is straightforward; time is measured from treatment initiation. However, a key obstacle for inference is the lack of observed initiation times in the control group, without which a time-to-event outcome among the controls is not defined.

The question of how to make valid causal inferences in this setting using longitudinal, observational data becomes further complicated with the possibility of confounding due to “treatment by indication” (Poses et al., 1995). This issue stems from the principle that a good clinician will prescribe a new medication or recommend a new treatment for a patient only when evidence exists that the candidate treatment is necessary (Poses et al., 1995). In studies evaluating the effects of a non-randomized treatment on a population with a common disease or condition, this means that treatment is initiated (and, therefore, assignment to treatment is observed) only when the potential benefits outweigh the anticipated risks or side effects associated with treatment. As a result, subjects who receive treatment during the study period may differ systematically from untreated subjects observed during the same time period in terms of their prognosis. This type of confounding can generally be avoided by comparing groups of subjects who presented with similar indications at similar times after their initial diagnosis; however, these variables are typically not recorded for subjects who do not receive treatment. Thus, for the group of untreated subjects, it is unclear whether treatment was indicated at an unknown time during the study and was purposefully withheld at that time or if treatment was simply never considered due to lack of indications. A primary challenge in this domain therefore concerns how to infer the unknown indication times from the observed data (Basse et al., 2016).

In this paper, we consider how to draw causal inferences about a binary treatment, which can either be initiated or withheld from a given subject at a single point in time. This time point, which we refer to as the “indication time”, is the time at which a subject presents with a particular set of symptoms or pre-specified indications that necessitate clinical intervention. The indication time for any individual might be a function of the subject’s behavior (e.g., when a subject requests a medication therapy), or might be solely determined by clinical factors. For instance, in our applied example, we consider a population of patients in the VA health system who are diagnosed with pulmonary hypertension (PH) World Health Organization (WHO) Groups 2 and 3, where individuals may receive medication therapy in the form of phosphodiesterase-5-inhibitors (PDE5Is) for treatment of PH-related symptoms only when it is determined that the medication may be beneficial given the severity of the patient’s condition. As a result, indication times may vary greatly across patients in a sample, which often induces a complicated time structure in the observed data. To accommodate this structure when attempting to draw causal inferences, researchers often focus on estimands that are defined relative to a time-varying treatment (e.g., the effect of initiating treatment now versus later). We propose an approach for treatment effect estimation that views indication times as a fixed but possibly unknown pre-treatment covariate (i.e., a variable that is unaffected by assignment to treatment or control) representing relevant characteristics related to the patient’s health. We then condition on these indication times to construct an estimator of the causal effect of treatment versus control that is independent from the time-structure of the underlying data.

The paper proceeds as follows. In Section 2, we first review existing methods in this domain and describe a strategy for making causal inferences using longitudinal observational data by approximating a sequence of randomized experiments. We then present an alternative conceptualization of the hypothetical randomized experiment that can be used to facilitate causal inferences, which is based on separating the process that governs the time of indication from the mechanism that determines the receipt of treatment versus control upon indication. In Section 3, we present a framework for designing comparative effectiveness studies to approximate this underlying randomized experiment and describe the formal assumptions required for inference. The core of this framework is a joint state-space model for predicting indication times as a function of longitudinal covariates, described in Section 4, which we use to infer the missing indication times for untreated patients. In this Section, we also propose an approach for estimating treatment effects that directly incorporates uncertainty about the inferred indication times. We then illustrate the proposed framework in Section 5 to study the effects of prescribing contraindicated PDE5Is for treatment of PH using administrative data from the VA health care system.

2 Background

When attempting to draw causal inferences in observational studies, it is desirable to “design” the study in a manner that approximates a randomized experiment (Rubin, 1984). While this design-based approach to covariate adjustment in observational studies has been widely applied to estimate causal effects using cross-sectional data, little work has focused on extending these methods to longitudinal settings. In fact, how to properly design and analyze longitudinal studies of non-randomized medical interventions (comparative effectiveness studies) remains a point of controversy (Rubin, 2010). A primary concern in this domain is related to the lack of a well-defined control intervention (Huitfeldt et al., 2015). For instance, in a study evaluating the efficacy of a particular medication, drug A, for treating depression, initiation of control might be marked by the receipt of an alternative medication, drug B. Alternatively, if the control intervention is defined as the decision to withhold drug A in favor of a non-pharmacological approach (e.g., self-care or psychotherapy), then receipt of assignment to control may not be available in the observed data. In these settings, unknown values of the assignment may be missing due to measurement errors (e.g., if receipt of the control intervention is not recorded) or may be (right-)censored due to follow-up (e.g., for patients whose treatment assignment occurs after the study has ended). When criteria for determining indications for treatment cannot be applied to the pool of potential controls, an indication time must be inferred for these units.

In many observational studies, controls are selected based on their similarity to treated units with respect to pre-treatment variables observed during a baseline period. However, control group selection becomes complicated in longitudinal settings where units may enter the study at different points in calendar time. Instead of having a single baseline period for all units, in this setting we observe a (potentially unique) baseline period for each treated unit, defined as the time from study entry to the time of treatment initiation. The primary challenge is then how to define a comparable pre-treatment period for potential controls who do not receive treatment at any point during the study window (Basse et al., 2016). To address this issue, some studies have attempted to identify alternative indication times (i.e., “phantom” treatment times) for untreated units who do not receive the treatment of interest and for whom there is no receipt of treatment. For instance, in an observational study evaluating the effects of selective cyclo-oxygenase-2 inhibitors for treatment of upper gastrointestinal hemorrhage in eldery patients, Mamdani et al. (2002) randomly assigned indication times for untreated units. Zhou et al. (2005) proposed a similar approach for prescription time-distribution matching, whereby indication times for each untreated unit are selected at random from the distribution of indication times observed among the treated units. This approach ensures that the treatment and control groups will be balanced with respect to time of treatment initiation, but requires the strong assumption that indication times are independent and identically distributed for all units in the study population.

Other studies have attempted to recover the unknown indication times for untreated subjects using dispensement dates of other prescription medications during the study period (McGettigan and Henry, 2006). This is the general approach of studies that utilize an “active-comparator design” (Yoshida et al., 2015), which restricts the control group to only the set of untreated units who received another active drug used in clinical practice during the study period. The core idea of this design is that any given sample of untreated units is likely to include subjects with no indications for any treatment (e.g., very mild disease) as well as those with contraindication for all treatment (e.g., very severe coexisting conditions). While this framework provides a useful template for designing observational studies in some settings, if the treatment of interest is the most commonly used therapy for a given condition (i.e., a first-line therapy) and the alternatives are used infrequently, selecting an active comparator can be challenging.

To estimate the causal effects of a candidate treatment using longitudinal observational data, a number of other approaches have been proposed that attempt avoid the issue of control group identification by framing the data-generating process in terms of a sequence of randomized experiments occurring over time, where at each time treatment can either be initiated or withheld (Hernán et al., 2008; Kennedy et al., 2010; Danaei et al., 2013; Li et al., 2001). These methods aim to assess the effects of different treatment sequences on an outcome observed at a final point. The implicit assumption within this general framework is that, unlike in a classical randomized experiment, all patients will receive treatment eventually and it is rather their times of treatment that are a result of randomization. Accordingly, causal effects are estimated by comparing the set of patients who received treatment at each time point with the set of patients who were not-yet-treated at that point. The resulting inferences may be useful to practitioners interested in the causal effects of delaying treatment with a single medical intervention (e.g., the effect of initiating treatment at time versus initiating at time ) and may help answer questions about the optimal time to apply a particular medical treatment given that this treatment must be applied at some point. However, these inferences may not be appropriate in settings with treatment by indication, where the time of initiation is not under the control of the treatment initiator, usually the clinician. In these settings, when a patient presents with symptoms that indicate the need for medical intervention, their clinician is faced with a decision about which among available treatments to initiate at that time. Here, it is not sensible to hypothesize about how a patient’s outcomes would have changed if their indications had presented at a different time (Angrist et al., 1996). Rather, the time of indication for treatment should be viewed as a fixed pre-treatment covariate and causal effects should be framed as contrasts of outcomes under different treatment strategies given the observed indication times.

To motivate our proposed framework, consider the following hypothetical randomized experiment with a binary treatment, where interest is in estimating the causal effects of treatment on a time-to-event outcome for patients with a particular medical condition. Here, treatment is defined as the decision to apply the intervention, and the control treatment is defined as the decision to withhold the intervention. Assume that patients become eligible for inclusion in the experiment only when their health reaches a point where clinical intervention may be beneficial to the patient despite any associated risks or side effects (the so-called “indication time”). Upon enrollment, suppose that patients are then randomized to receive either treatment or control with probability that depends only on their indication time. After randomization, suppose each patient is observed until the earliest occurrence of a pre-specified event (e.g., renal failure) or the end of follow-up, whichever comes first. The outcome is an event time calculated with respect to indication time. In this idealized experiment, contrasts of outcomes between treated and control units with similar indication times will be unbiased estimates of the causal effects of interest.

This conceptualization of the underlying randomized experiment explicitly defines the control group as the set of untreated patients for whom treatment was indicated during the study period, but treatment was deliberately withheld at that time (“true controls”). The remaining untreated patients are those for whom no indications for treatment were present during the study (“ineligible controls”). Because these units are not assigned to treatment or control during the study period, they are not relevant units for the purposes of causal inference. When indication times are known, as in the ideal randomized experiment described above, these ineligible controls can easily be identified and discarded prior to analysis. However, in practice when analyzing non-randomized data, indication times may be censored due to follow-up or death and are often only partially observed, typically for patients receiving treatment. In addition, the probabilities of assignment to treatment over time are generally unknown and may be difficult to infer without expert domain knowledge.

3 A causal effects framework

We assume a cohort of patients (i.e., observational study units), each of whom is observed in a specified study time window divided into discrete time periods. If a patient in the cohort receives the treatment of interest during their observation period, then he or she is included in the study as a treated unit. Other members of the study base are eligible to be controls during their respective observation periods.

For each unit

, suppose we observe a vector of

covariate measurements collected at the time periods , where is a -vector of baseline covariates observed at study entry. These covariates capture characteristics of each unit or the unit’s environment (such as age, gender, physiological factors, diet, medical treatments, and environmental exposures) over the course of the study. Let denote the indication time of unit , which may occur at a discrete time within the study or may be censored at the end of follow-up (i.e., ). By construction, we assume that units whose indication times are censored do not receive either treatment or control and we let be an indicator of eligibility for inclusion in the study as a control. Similarly, let be an indicator for the missingness of with for units whose indication time is unobserved. Finally, let be an indicator for assignment to treatment upon indication for unit , which equals 1 if unit is assigned treatment at their indication time and equals 0 if unit is assigned control upon indication. Unlike the classical setting of a randomized experiment, suppose that is only observed for units who are assigned to treatment at indication times within the study period (i.e., when and ).

1 1
2 1
3 ? ? ?
4 ? ?
Table 1: Structure of the observed and missing data. ‘’ denotes endogenous missingness and ‘?’ denotes exogenous missingness.

Table 1 shows the structure of the data in this setting. Here, it is important to distinguish between two distinct missing data mechanisms that give rise to the observed and missing values. The first type of missing data, which is commonly encountered in longitudinal studies, is the indication times that are naturally right-censored at the end of the study. Although, in principle, these values are observable over a sufficient follow-up period, the missing-data mechanism is determined by the specified observation period. Because missing data of this type are not “missing at random” (MAR; Rubin, 1976), a model for the missing data mechanism must be incorporated into the analysis in order to yield valid causal inferences (Little and Rubin, 2019). The second type of missingness in this setting is due to the fundamental problem of causal inference, which states that we can observe at most one potential outcome (i.e., the potential outcome corresponding to the treatment actually received) for each subject in the study (Rubin, 1976). These missing potential outcomes are therefore “endogenous” missing values in the sense that the missingness mechanism is completely determined by treatment assignment. As in a standard randomized experiment, the unit-level missing potential outcomes are impossible to observe under a given treatment assignment, and the goal of causal inference is therefore to recover these values under plausible modeling assumptions in order to make inferences about the causal effects of interest.

3.1 Causal estimands and assumptions for identification

Interest is in estimating the causal effects of assignment to treatment versus control on the time from indication to the first occurrence of an event of interest (e.g., death, hospital discharge, symptom remission). We consider time-to-event outcomes defined as the times from indication to the event of interest. Under the Rubin Causal Model (RCM; Holland, 1986), each participant has two potential outcomes, and , which represent the outcomes that would be observed for unit under assignment to control or treatment, respectively, when assigned at the indication time . Let and , and let denote the complete set of potential outcomes for all units relative to indication time . We make the Stable Unit Treatment Value Assumption (SUTVA, Rubin, 1980), which asserts that there is no interference between units and no hidden forms of treatment. Under this assumption, the average treatment effect (ATE) at a single indication time is defined as:

where is the average across all units. In longitudinal studies where indication may occur over a fixed study period , we can construct an aggregate average measure of these time-specific effects as

In settings where the outcome of interest is defined relative to a time of death or failure, captures the average change in survival time under treatment compared to control for units who present with indications for treatment over the study period. Alternatively, as we will see in Section 5, causal estimands can also be specified to quantify the average difference in survival rates at specified points during a follow-up period. To construct an unbiased estimator of this quantity using non-randomized data, we assume that treatment assignment is conditionally independent of the potential outcomes given all pre-treatment covariates including the indication times (i.e., ). In the clinical context we consider, this asserts that assignment to treatment versus control is unconfounded given indication times, , and observed pre-treatment covariates, . Under this assumption, we can obtain an unbiased estimate of the treatment effect as:


where and are the numbers of treated and control units who are eligible for analyses, respectively. Here, in contrast to the classical setting, the outcomes that are actually observed for each unit depend not only on their treatment assignment but also on their indication time. For units whose indication times are not observed, these missing times must be inferred in order to evaluate (1). In the section below, we develop a strategy to infer the missing times of indication that can be used to facilitate inference in this setting.

Note that with an untreated comparison group, estimating the average treatment effect for the treated (ATT) using standard techniques such as propensity score matching is often be preferred because we would rarely want to initiate treatment for all units in the population. However, in the setting of treatment by indication or when an active comparator is present (e.g., comparative effectiveness research), estimating the treatment effect for the entire population is desirable because we are focusing on the choice of treatment after indications for treatment have already occurred.

3.2 Overview of inferential approach

Our conceptual framework implies by design that the missingness of the times of treatment assignment, , and the indicators for assignment to control, , are completely dependent on the values of those missing measurements. That is, we assume that the data for unit will be missing if (regardless of the value of ) or if (regardless of the value of ). This means that the missing measurements (indicated by ) are “missing not at random” (MNAR; Imbens and Rubin, 2015). By inferring the missing times of assignment for the untreated units, we can minimize the information loss that arises from the missing data mechanism in order to make more precise inferences from the observed data.

The first goal is therefore to construct a model for predicting the observed indication times based on baseline and time-varying covariates

, which we will use to infer the missing indication times. This model will also induce a probability distribution on the potential outcomes in the control group, since the observed outcomes must be calculated relative to the indication times. By applying the assumptions described above in Section


, we can separate the joint distribution of the complete data, including all observed potential outcomes

as well as both the observed and unobserved indication times, given some global parameter partitioned as as


For Bayesian inference with prior density

, the posterior density of given the complete data is given by


Posterior inference on

can then proceed by straightforward application of Markov Chain Monte Carlo (MCMC) techniques, such as the Gibbs sampler

(Geman and Geman, 1984; Gelman et al., 2014). For example, in each iteration of the Gibbs sampler, we draw the missing indication times

from the conditional posterior predictive distribution of

given covariates and the current draw of the parameter .

The completed indication times can then be used to classify untreated patients into distinct groups of true controls and ineligible controls based on eligibility

, where the true control group consists of patients with and . For the true controls, we can then calculate values for the potential outcomes given the generated values of . These values are then regarded as observed potential outcomes, denoted , which are equal to the calculated for units classified as true controls and equal to for treated units. Given the observed potential outcomes , the completed times of indication and the corresponding assignment vector , we can then update the parameters and by drawing from the conditional posterior distribution,


Posterior inference on the causal effects of interest can be obtained by computing the values of the constructed estimator within each MCMC iteration and then summarizing their distribution from the posterior sample. Thus, in each iteration, we construct a dataset consisting of all observed indication times, the simulated indication times, and all observed potential outcomes, and then use these completed data to calculate an estimate of the treatment effect as in Equation (1). Alternatively, we could specify a joint distribution for the potential outcomes

that we could then use to impute the missing potential outcomes

for each patient in each iteration by drawing from the conditional distribution with density function . Repeating this process over many such simulated datasets produces the approximate posterior distribution for all causal effects of interest. In the same way, posterior samples of can provide posterior estimates of the parameters that characterize the data-generating process; this is described in greater detail in Section 4.

4 State-space model for time of treatment indication

Based on the conceptual framework presented in Section 3, we propose a specific and pragmatic model for predicting the time of indication for treatment - the earliest time at which a patient presents with indications for clinical intervention - as a function of both fixed and time-varying covariates. In particular, we hypothesize that observed covariate measurements that reflect worsening health and diminished functional capacity will be predictive of indication times. Similarly, we assume that covariates capturing provider characteristics or temporal features (e.g., month or year when measurements are recorded) are independent of the indication times but may influence the probability of assignment to treatment upon indication. For instance, in the application presented in Section 5, we expect the probability of treatment to decline over time as clinicians become more informed about populations where the medication of interest may be contraindicated. We capture these separate dependencies through separate components of a hierarchical model; the first component characterizes the stochastic process that governs patients’ indication times, and the second component describes the conditional probability of assignment to treatment given the indication times. Specifically, we model a patient-level health process generated by time-varying covariates together with random fluctuations over time and adopt a so-called “threshold approach” (Albert and Chib, 1993), which views the indication time as the first hitting time of this latent process. Similarly, we assume that the conditional probability of assignment to treatment versus control given the indication time for each patient varies based on institutional preferences, which may systematically change over time with widespread changes in the established guidelines for treatment.

4.1 Model formulation

Suppose that the time series of covariate measurements for each unit, , is independent from the measurements of other units, and let be a state variable representing fluctuations in unit

’s overall health over the course of the study that are not explained by the covariates. We assume a normal distribution for the daily health fluctuations for unit

between time periods and , such that


where denotes the normal distribution. To make the model identifiable in our setting with a binary observation process, we fix . We assume (i.e., the latent state process

follows a stationary autoregressive model of order one). Thus, the model balances short-term changes in health status with information from long-term health trajectories.

To initialize this process, we assume a standard normal prior distribution for . Here, the stochastic component captures the unexplained variation of unit ’s health over time. Conditional on the latent states , we then define the observation process , which relates the overall health trajectory of unit to their indication time according to a probit regression model as



denotes the cumulative distribution function of a standard normal random variable and

is a vector of regression coefficients. Under the latent variable representation of (5) and (6) (also referred to as a state-space mixed model; Czado and Song, 2008), the indication time for each unit can be expressed as


Thus, the model for indication times induces a conditional distribution for the time of indication given a vector of longitudinal, pre-treatment covariates. One of the main advantages of this latent state representation is that it can flexibly accommodate both fixed and time-varying covariates. The proposed modeling approach can also be extended to settings with non-linear covariate effects by replacing the linear effects in (6) with nonlinear contributions, such as those described in Denison et al. (2002).

We consider a separate model for the assignment mechanism, which determines assignment to treatment versus control upon indication. Here, our model formulation is based on the assumption that, in many settings, variation in treatment practices may be due to clinician and/or institutional preferences rather than differences in patient characteristics (Slaughter et al., 2017). In particular, we assume that each unit is assigned to treatment with a probability that depends on their indication time. Conditional on , and on exogenous covariates , the assignment mechanism for our specific model can be expressed as


where is a parameterized, possibly non-linear, function. deterministic transformation of the indication time (e.g., calendar year or month). This model can be easily extended to accommodate other artifacts of the study that are believed to influence the probability of treatment assignment. For example, in Section 5, we let where is calendar time, and is restricted to be negative so that the probability of receiving the treatment is monotonically decreasing over time.

4.2 Inferential procedure

Our procedure uses the Kalman Filter

(Carlin and Polson, 1992) to marginalize out the latent state parameters for more efficient inferences. The full likelihood of the parameters can be written as



Letting and be the vectors of the and across the patients, the associated posterior density is therefore

Our modeling framework permits a flexible choice of prior distributions. For example, we can assume a prior distribution that factors into independent densities as

Many distributional choices are possible for each prior component. One flexible set of choices might be

Posterior sampling is straightforward to implement using standard software for implementing MCMC methods such as JAGS (Plummer et al., 2003). Rather than sampling each , , conditional on the rest, it is more efficient to sample them jointly via MCMC using the Kalman filter (Carlin and Polson, 1992). As previously described, these posterior samples allow us to measure the missing times of treatment assignment and assignment to control according to (7), where for units with . Given the inferred indication times, our framework implies that for units with . Recall that treatment assignment is undefined for those whose time of treatment assignment was censored by the end of the study (i.e. units with ). Finally, we can calculate the observed outcomes for units inferred to belong to the group of true controls by computing the difference between the observed event times and the inferred indication times for those units.

4.3 Bayesian analysis with inferred indication times

One of the benefits of the proposed approach for estimating treatment effects in this setting is that it allows us to make inferences about the missing indication times that are free from the outcome analysis. Our approach allows for flexible specification of the causal estimands of interest and also allows researchers to choose any mode of inference for analysis of the outcomes that they see appropriate.

For example, one might use the posterior mode of inferred times of indication for each unexposed unit calculated over a large number of MCMC samples as the point estimate for that unit’s indication time. This can then be viewed as a single imputation of the missing values, and conditional on these estimates one could estimate the treatment effects by simple comparisons of means of the time-to-event outcomes (e.g., using a Neymanian or Fisherian mode of inference). Alternatively, our framework also allows for more sophisticated analysis of outcomes. For instance, we could first obtain a large number of posterior samples for the missing times of indication across for all untreated units and use these samples to form unit-level empirical posterior distributions of the missing indication times. Then, one might iteratively impute values from these distributions and calculate the corresponding observed potential outcomes in each iteration. Taking the difference in means between treatment and control groups across all iterations would then produce a distribution of treatment effect estimates (given the potential outcomes) that incorporates uncertainty about the missing indication times.

It is important to note is that this type of approach is only partially Bayesian in that we are using Bayesian methods to infer missing values that we need in order to measure the observed outcomes. To make this a fully Bayesian approach, as previously mentioned, one could also specify a model for the missing potential outcomes (i.e., the missing time-to-event outcomes under alternative assignment) conditional on the partially observed outcomes and the assignment vector.

5 Application

To illustrate our proposed methods, we analyzed data from a recent study using electronic medical records from the VA health system evaluating the impacts of inappropriate prescribing practices for the treatment of pulmonary hypertension (Freiman et al., 2015). Pulmonary hypertension is a condition of high blood pressure that affects arteries in the lungs and heart. One common treatment for PH is a class of medications called phosphodiesterase-5-inhibitors (PDE5Is), which act on enzymes causing blood vessels to relax in order to lower blood pressure. While PDE5Is have been shown to be effective for treating some rare forms of PH, Freiman et al. (2015) identified the use of these drugs as wasteful, ineffective, and potentially harmful for treating patients with more common types of PH caused by left heart disease (WHO Group 2) or hypoxemic lung disease (WHO Group 3). Despite its contraindication, a study of veterans diagnosed with these types of PH over the years of 2005 to 2012 identified over 2,000 prescriptions for PDE5Is that were inappropriately administered to patients in the VA health system. To understand the impact of these inappropriate treatment practices on patient outcomes, it is of interest to measure the causal effects of prescribing PDE5Is to patients with PH Groups 2 and 3 on the time lag between the application of the intervention and the occurrence of a clinical event of interest (e.g., time from treatment to death).

The primary question that we sought to address through this analysis concerned the extent to which treatment with PDE5Is for patients diagnosed with PH Groups 2 and 3 impacted the one-year survival rate compared to patients who had similar indications for treatment but were not prescribed PDE5Is during the study period. As a secondary objective, we were also interested in inferring which among a large set of observed patient and provider characteristics were strongly associated with an increased risk of receiving a contraindicated PDE5I for treatment of PH.

5.1 Data

Our data set contains demographic and laboratory measurements as well as records of the utilization of medications, inpatient and outpatient services for over 350,000 veterans who were diagnosed with PH Groups 2 and 3 and received prescription medications from the VA from 2005 to 2016. For all patients, baseline health-related measurements were collected at the time of first PH diagnosis based on ICD-9 diagnosis codes within either the VA health system or Medicare. Subsequent health-related measurements were collected at intermittent observation times corresponding to patient-provider clinical interactions (e.g., an inpatient or outpatient visit). The exact number of measurements recorded and the time elapsed between subsequent measurements varied by patient. Observations with implausible values for lab measurements or demographic variables were excluded prior to analysis.

For the present analysis, we considered patients who at the time of PH diagnosis (Group 2 or 3) were between 65 to 95 years of age, were eligible for Medicare benefits, and who had not received prescriptions for a PDE5I medication prior to PH diagnosis. Because female patients comprised less than 3% of the patient population who met these initial eligibility criteria, we further limited our analysis to only male patients. For data integrity purposes, we also excluded any patients who were receiving Medicare part C at the time of diagnosis as well as patients who did not fill any prescriptions within the VA health system during the one year prior to their diagnosis, since these patients may have received PH related care from private providers that we could not track. Finally, we excluded any patients who received a prescription for nitrates after PH diagnosis and prior to a PDE5I prescription because nitrates are contraindicated for treatment with PDE5Is. Among patients who met all initial eligibility criteria, we defined the “treatment group” as the set of patients who filled at least one prescription during the study period for a PDE5I medication primarily indicated for the treatment of PH. In particular, we excluded medications with secondary or off-label indications for the treatment of PH such as sildenafil, tadalafil, and verdenafil.

Because we observed the prescription dispense dates rather than the dates when the medications were first prescribed, we excluded from our analyses any treated patient whose first PDE5I prescription was dispensed more than 60 days after a hospital visit. After applying all exclusion restrictions, the remaining sample was comprised of 534 patients who received treatment for PH with a PDE5I medication within a one-year period following their diagnosis date and 167,701 potential controls who did not receive a PDE5I during that period.

Our outcome of interest is survival time in the period following indication for treatment, which we observed for treated units and must be inferred for units in the control group. We consider the date of PH diagnosis as the time of “earliest eligibility” for indication. For each potential control patient, we base our inferences on clinical data observed at intermittent intervals from the time of earliest eligibility until death or the end of follow-up in December 2016, whichever occurred first.

5.2 Constructing comparison groups and time-varying covariates

To make the assumptions of our proposed framework described in Section 3 more plausible in this setting, we selected a set of potential control units who appeared similar to the treatment group at baseline based on health-related measurements collected in the 1 year prior to PH diagnosis (excluding the date of diagnosis). We considered a set of baseline covariates identified as predictive of the time of assignment to treatment within the treatment group. See Appendix A for details of this variable selection procedure. Potential controls were selected using 1:1 nearest-neighbor matching with replacement (Rubin, 1973), whereby each treated unit is matched to its closest control unit based on the Mahalanobis distance calculated over baseline covariates.

This produced a final sample of 534 treated units and 531 matched control units who were similar to the treatment group at their times of PH diagnosis but did not receive a PDE5I at any point during the observation period. Diagnostics performed after matching confirmed that the covariate distributions were adequately balanced between the treatment group and the matched potential control group. Table 2

shows descriptive statistics on baseline variable for the final matched samples. After matching, we proceeded under the assumption that overall health status was unconfounded given baseline covariates such that patients with similar covariates at baseline were expected to have similar health trajectories and therefore, similar indication times over the course of study.

Potential Control Treatment
PH Group 2 88% 88%
Recently Hospitalized 13% 12%
Recent Procedure 65% 75%
Recent ER Visit 52% 53%
Age 74.8 74.5
Weight 203.3 204.5
Height 69.5 69.5
Resting Heart Rate 74.2 75.1
Systolic Blood Pressure 130.1 130.5
Diabolic Blood Pressure 71.2 71.2
Inpatient Days 1.19 1.22
Outpatient Days 27.3 27.9
Number of Comorbidities 0.52 0.54
Cardiac Events 1.60 1.49
Pulmonary Events 0.61 0.69
Organ Failure Events 1.09 1.16
Number of Medications 11.4 11.8
Table 2: Baseline covariate data for potential controls and treatment group.

In addition to baseline covariates, we also included in our analyses a number of time-varying covariates collected at intermittent observation times throughout the study. Specifically, we considered six time-varying covariates that indicate changes in binary variables over time, including an indicator for whether the patient was most recently observed in an outpatient versus inpatient setting, an indicator for the presence of new comorbidities, and separate indicators for recent hospitalization, organ failure events, cardiac events or pulmonary procedures recorded during the 30 days prior to each visit. We also included as a single time-varying covariate the Mahalanobis distance calculated between each patient’s laboratory measurements at baseline (e.g., heart rate and blood pressure) and values of the same laboratory variables collected at each follow-up visit. This allowed us to greatly reduce the dimension of the covariate space and operationalize the laboratory variables in terms of gain scores.

We observed a strong positive correlation between the empirical probabilities of survival and indication times for patients in the treatment group. Among 77 patients who received treatment within 7 days following their PH diagnosis (i.e., patients with ), only 53 patients (68.8%) were alive one year post-diagnosis. Averaged over all 534 patients whose indication times occurred within one year of diagnosis (i.e., the treatment group), this survival rate increased to 80.6%. In contrast, the survival rate at one-year after PH diagnosis for the 531 matched potential controls was approximately 81.7%; however, because we do not observe the indication times for these patients, this is a naive estimate that is likely negatively biased. Also note that these survival probabilities are defined relative to diagnosis times, which we regard as a fixed pre-treatment covariate. As a result, these measurements are purely descriptive and should not be regarded as treatment effects.

5.3 Results

Using the approach described in Section 3, we fit the model defined by equations (5)–(8) using MCMC posterior simulation. We constrained

, the effect of calendar time on the log-odds of PDE5I initiation, to be negative for this application since the probability of receiving PDE5Is was believed to be monotonically decreasing over the course of the study. This is because PDE5Is were believed to be the standard, first-line therapy for treatment of PH at the beginning of the study with use steadily decreasing as knowledge of its contraindication for some PH patients spread throughout the medical community.

To estimate the treatment effects of interest, we fit eight distinct models. Each model assumed differing time windows within which the indication time might occur, or that (for some untreated units) the indication time occurred beyond the time window. Specifically, we assumed time windows corresponding to indication times occurring within the first 14 days, 30 days, 60 days, 90 days, 120 days, 180 days, 270 days, and 365 days (1 year) after PH diagnosis. Our health process model component assumed that changes could occur daily, so that the number of time periods varied from to

. For each interval, we estimated the average causal effects of treatment versus control on a binary measure of survival at follow-up one year after indication. In addition, we estimated the conditional average treatment effect (CATE) for each of the eight time intervals, where in each interval the average is over the subset of patients whose indication times were within a specified range (e.g., the conditional average treatment effect for patients whose indication times occurred within 15-30 days following PH diagnosis). To estimate these effects, we first identified the subset of units whose indication times (either observed or inferred) were within the specified study period based on the current values sampled using MCMC. Within this subset, we then calculated survival rates at one year post-indication for treated units as well as for units classified as true controls. Finally, we measured the treatment effect as the difference between these two proportions. For each of the specified study periods, we ran the MCMC sampler with four parallel chains each run for 20,000 iterations, where the first 5,000 draws of each chain were discarded as a burn-in period. With the resulting 60,000 samples, we calculated the posterior means and 95% credible intervals (CI) for all model parameters. In all cases, the MCMC simulated model parameters and quantities of interest raised no concerns using standard diagnostics including those by

Geweke (1992) and Gelman et al. (1992)

. As an additional sensitivity check, we evaluated the performance of the proposed model in each setting under different choices of hyperparameters using the deviance information criterion (DIC;

Spiegelhalter et al., 2002) and found the results to be generally unaffected.

Posterior means and 95% credible intervals for the treatment effects of interest are presented in Table 3. Here, the causal estimand at each time point is defined as the average effect of treatment on one-year survival for all patients with indication times occurring at or before that time (i.e., for ). Table 3 also summarizes the cumulative number of potential control units identified as “true controls” at each time point based on the posterior median of the 60,000 MCMC samples (i.e., ). Figure 1 shows the smoothed effects of treatment over a one year period following PH diagnosis, calculated using a cubic smoothing spline (Marsh and Cormier, 2001), which we specified with eight knots located at each posterior mean treatment effect estimated using our model.

Study Number Inferred number One-year survival Estimated impact
period treated of “true controls” after indication of PDE5I
() () 95% CI Treatment Control 95% CI
14 Days 109 48 (34, 62) 71.6% 82.0% -10.4% (-18.6%, -1.6%)
30 Days 164 88 (72, 103) 73.2% 82.8% -9.6% (-14.5%, -3.8%)
60 Days 233 113 (96, 127) 75.1% 83.3% -8.2% (-12.8%, -5.0%)
90 Days 280 127 (111, 141) 72.5% 83.3% -10.8% (-14.7%, -6.8%)
120 Days 318 134 (119, 147) 73.0% 85.1% -10.1% (-14.1%, -6.4%)
180 Days 380 150 (135, 162) 73.6% 84.3% -11.0% (-14.1%, -7.5%)
270 Days 456 165 (151, 180) 74.0% 84.5% -10.5% (-13.9%, -7.5%)
365 Days 534 183 (170, 195) 72.5% 84.5% -12.0% (-14.6%, -9.8%)
Table 3: Posterior median number of true control units inferred from matched sample of potential controls at each time point and estimated effects of treatment compared to control with 95% posterior intervals.
Figure 1: Cubing smoothing spline curve for estimated effects of PDE5I compared to control on one-year survival rate based on time from diagnosis to indication. Shaded area shows pointwise 95% posterior credible interval.

These results suggest that the majority of the matched potential controls were ineligible to receive treatment during the one year period following their PH diagnosis. For these patients, lack of treatment can therefore be interpreted as a lack of indication for treatment. On the other hand, potential controls with inferred indication times occurring at or before each specified time point are regarded as “true controls” for whom, upon indication for treatment, PDE5Is were actively withheld, possibly in favor of alternative medication or treatment strategy. Given the inferred times of indication, these patients provide a credible comparison group with which we can make causal inferences. In particular, our findings indicate that among patients with PH (Groups 2 and 3) who are indicated for treatment at any point in the one year following their diagnosis, treatment with PDE5Is has a large, negative effect on one-year survival probability.

Table 4 shows posterior estimates for the baseline and time-varying covariate effects, and , as well as for the correlation between patients’ latent health states, , and parameters governing the probability of assignment to treatment upon indication, and . Here again, our results are largely consistent across the settings with credible intervals shrinking as study period widens. In general we find a small negative posterior correlation between latent health measurements, which decreases in magnitude over time. This suggests there may be some variation in patients’ overall health that is not explained by the observed covariates in the time period immediately following PH diagnosis, but this systematic variation dissipates over time.

Posterior estimates for parameters of the treatment assignment mechanism are also relatively stable across the different study periods. Baseline probabilities of treatment upon indication range from for the study period of 14 days to for the study period of 365 days, with the probability of treatment slowly decreasing over time. Figure 2 shows this trend over time for eligible patients in the VA health care system diagnosed with PH between 2002 and 2016 based on posterior median estimates of and within each study period.

Figure 2: Cubic smoothing spline curves showing the conditional probability of receiving PDE5I based on time of indication for eligible patients in the VA health care system diagnosed with PH between 2002 and 2016. Dashed lines show 95% posterior credible intervals.
Parameter 14 Days 30 Days 60 Days 90 Days 120 Days 180 Days 270 Days 365 Days
(Baseline covariates)
PH Type 2111Indicates binary variables. 1.588 1.759 1.851 1.901 1.958 1.961 2.04 2.18
History of organ failure111Indicates binary variables.222Measured during one-year period prior to PH diagnosis. 0.196 0.215 0.236 0.259 0.278 0.275 0.294 0.315
Recently hospitalized111Indicates binary variables.222Measured during one-year period prior to PH diagnosis. 0.042 0.017 -0.016 -0.084 -0.122 -0.126 -0.165 -0.210
Age -0.103 -0.079 -0.086 -0.095 -0.085 -0.087 -0.071 -0.050
Comorbidities 0.253 0.256 0.369 0.406 0.445 0.441 0.398 0.430
No. of recent operations222Measured during one-year period prior to PH diagnosis. 0.779 0.911 1.023 1.072 1.120 1.114 1.129 1.162
(Time-varying covariates)
Outpatient visit111Indicates binary variables. -0.184 -0.334 -0.521 -0.608 -0.654 -0.642 -0.649 -0.710
New hospitalizations111Indicates binary variables. -0.399 -0.301 -0.333 -0.331 -0.318 -0.315 -0.258 -0.235
New organ failure events111Indicates binary variables. 0.048 -0.018 -0.021 -0.020 -0.019 -0.019 -0.021 -0.016
No. of new comorbidities 0.004 0.079 0.050 0.074 0.092 0.090 0.068 0.059
No. of recent operations 0.222 0.046 -0.037 -0.092 -0.108 -0.104 -0.148 -0.232
Change in labs -0.002 -0.002 -0.001 -0.001 -0.001 -0.001 -0.001 -0.001
-0.339 -0.206 -0.130 -0.095 -0.042 -0.046 -0.017 0.011
1.097 0.760 0.878 0.924 0.988 1.008 1.045 1.098
-0.053 -0.023 -0.025 -0.015 -0.022 -0.025 -0.021 -0.014
Table 4: Posterior medians with 95% credible intervals for parameters , , , (baseline covariate effects) and (time-varying covariate effects) in each study period.

A key feature of our model is that it allows us to directly evaluate which of the baseline and time-varying covariates carry more or less information about patients’ times of indication. In the present study, posterior inferences for covariate effects within each of the study periods were similar. In general, our findings suggest that patients with PH Group 2 were more likely to have indications for treatment shortly after PH diagnosis than patients with PH Group 3. Results also indicate that occurrence of one or more incidental medical procedures (e.g., cardiac surgery or pulmonary function testing) within 30 days prior to PH diagnosis is strongly associated with earlier indication times. Further, we found that patients who regularly receive care in an inpatient setting generally have earlier indication times than patients whose follow-up visit typically occur in an outpatient setting. Among the other baseline covariates included in our analysis, occurrence of one or more pulmonary disease events (e.g., pneumonia) within 30 days prior to PH diagnosis and number of comorbidities present at baseline were also positively associated with indication for treatment during the study period. These results may offer insights for clinicians about best practices for health management of PH patients, and may also be used to guide modeling decisions in other applications.

5.4 Comparison with inferences based on risk set matching

Balanced risk set matching (RSM; Li et al., 2001) is a tool designed to facilitate causal inference in observational longitudinal studies and is an effective strategy for estimating the causal effects of delaying initiation of a particular treatment. This approach is based on comparing patients who were treated at a particular time point to a group of patients with a similar probability of receiving treatment at that time, thereby ensuring that the treatment and control groups are balanced on pre-treatment covariates. RSM appeals to the RCM philosophy of causal inference in the sense that it clearly separates all outcome data from the design of the study, as neither outcomes nor any post-treatment data are used to perform matching. However, this approach has two important limitations. First, risk set matching assumes that only one treatment is possible (i.e., there is no active comparator that may be initiated at the time of indication) and that therefore the indication time is itself the object of manipulation, which may or may not be controlled by the experimenter. Second, in studies with a time-to-event outcome defined relative to the time of indication for treatment, inferences based on comparisons of patients in similar risk sets will fail to incorporate uncertainty in the time of indication within the control group.

Despite its limitations, RSM serves as a useful reference for examining the relative performance of our proposed approach. Table 5 presents the one-year post-indication survival rates within the treatment and control groups, where controls are identified using risk-set matching. Here, for each treated patient with observed indication time , we identified the not-yet-treated patient that was most similar in terms of their probability of receiving treatment based on covariates observed up to time .

Study One-year survival Estimated
period after indication impact
Treatment Matched Control of PDE5I
14 Days 71.5% 71.5% 0.0%
30 Days 73.2% 73.6% -0.5%
60 Days 75.1% 76.7% -1.6%
90 Days 72.5% 76.9% -4.4%
120 Days 73.0% 77.4% -4.4%
180 Days 73.6% 77.7% -4.1%
270 Days 74.0% 77.1% -3.3%
365 Days 72.5% 75.8% -3.1%
Table 5: One-year survival rates following treatment indication for treated patients and controls identified using risk set matching

These findings generally agree with the results previously described. We see that estimates obtained using RSM are systematically smaller in magnitude than results based on the proposed approach. However, because the estimand targeted by RSM captures the average effect of treatment on the treated rather than the average treatment effect across the entire patient population, these differences are not surprising. In particular, estimates calculated using RSM reflect the effects of delaying the initiation of PDE5I for the subset of patients that have indications for treatment during the study period. This means that for each study period, the group of treated patients who received PDE5I within that time interval are compared to a matched control group consisting of patients who had not yet received treatment at that point but may have received treatment at a later point in the study.

6 Discussion

In this paper, we propose a novel conceptualization of observational studies with treatment by indication and provide a template for designing comparative effectiveness studies that approximate a randomized-controlled experiment with a binary treatment. This conceptualization reformulates the problem of evaluating a time-varying treatment in the presence of time-varying confounders as one of confounding due to sequential enrollment. Our hope is that this simplified representation of a traditionally complex data structure will allow for more straightforward analyses of health data in the digital age (e.g., data from electronic medical records).

The merit of the proposed model is that it allows for model-based assessments of the times of treatment indication based on disease progression that can accommodate time-varying covariates measured at intermittent observation times (i.e., when there is missing data and/or substantial variation in the timing of patient-provider visits). Our approach allows for systematic evaluation of the underlying health factors that may be most influential in determining a patient’s need for treatment. Inferences based on this modeling strategy may be useful to address a number of important questions in health services research - for example, what preventative health strategies might be most effective for delaying the onset of indications for treatment? The model also allows us to obtain conditional probabilities for a subject having indications for treatment at different points over time, which could be used to inform preventative treatment strategies. We note that the proposed model assumes that subjects’ overall health fluctuates under natural conditions.

As described in Section 4, the proposed model for time of indication for treatment can accommodate both fixed and time-varying covariates, which can be useful in explaining differences in the aspects of health associated with subject-specific characteristics and/or conditions that vary between hospital visit within a subject. Alternatively, covariate data could be excluded entirely from the model to make inferences about indication times that are viewed as fully stochastic. Another possibility for modeling variability in the parameters involves the inclusion of facility-based, provider-level, or geographic-specific random effects. However, this complicates the evaluation of the likelihood considerably, rendering model fitting more challenging.

Finally, we note that our application and inferential results should be regarded as an illustrative example of the proposed methods rather than an attempt to provide definitive answers about the causal effects of inappropriate prescribing practices on health-related outcomes for PH patients. In particular, for researchers interested in drawing causal inferences from observational studies with time-varying exposures, which are typically analyzed using marginal structural models or risk-set matching, the conceptual framework we present offers an alternative formulation of the underlying causal problem that is both intuitive and relatively straightforward to implement. The proposed approach can also be flexibly extended to accommodate a range of data structures, for instance in studies with multivariate outcomes. With this in mind, an exciting area of future work is in applying the proposed framework to new applications, including observational studies outside of the clinical domain.


  • Albert and Chib (1993) Albert, J. H. and Chib, S. (1993). Bayesian analysis of binary and polychotomous response data. Journal of the American statistical Association, 88(422):669–679.
  • Angrist et al. (1996) Angrist, J. D., Imbens, G. W., and Rubin, D. B. (1996). Identification of causal effects using instrumental variables. Journal of the American statistical Association, 91(434):444–455.
  • Basse et al. (2016) Basse, G. W., Volfovsky, A., and Airoldi, E. M. (2016). Observational studies with unknown time of treatment. arXiv preprint arXiv:1601.04083.
  • Breiman (2001) Breiman, L. (2001). Random forests. Machine learning, 45(1):5–32.
  • Breiman and Cutler (2003) Breiman, L. and Cutler, A. (2003). Manual for setting up. Using, and Understanding Random Forest, 4.
  • Carlin and Polson (1992) Carlin, B. P. and Polson, N. G. (1992). Monte carlo bayesian methods for discrete regression models and categorical time series. Bayesian statistics, 4:577–586.
  • Czado and Song (2008) Czado, C. and Song, P. X.-K. (2008). State space mixed models for longitudinal observations with binary and binomial responses. Statistical Papers, 49(4):691–714.
  • Danaei et al. (2013) Danaei, G., Rodríguez, L. A. G., Cantero, O. F., Logan, R., and Hernán, M. A. (2013). Observational data for comparative effectiveness research: an emulation of randomised trials of statins and primary prevention of coronary heart disease. Statistical methods in medical research, 22(1):70–96.
  • Denison et al. (2002) Denison, D. G., Holmes, C. C., Mallick, B. K., and Smith, A. F. (2002). Bayesian methods for nonlinear classification and regression, volume 386. John Wiley & Sons.
  • Freiman et al. (2015) Freiman, M. R., Rose, A. J., Powell, R. W., Miller, D. R., and Wiener, R. S. (2015). Patterns of potentially inappropriate prescribing of phosphodiesterase inhibitors in pulmonary hypertension in the va. In C13. ACCOUNTING FOR COSTS AND RESOURCE UTILIZATION IN RESPIRATORY HEALTH, pages A3889–A3889. American Thoracic Society.
  • Gelman et al. (2014) Gelman, A., Carlin, J. B., Stern, H. S., Dunson, D. B., Vehtari, A., and Rubin, D. B. (2014). Bayesian Data Analysis, volume 2. Chapman & Hall/CRC, Boca Raton, FL.
  • Gelman et al. (1992) Gelman, A., Rubin, D. B., et al. (1992). Inference from iterative simulation using multiple sequences. Statistical science, 7(4):457–472.
  • Geman and Geman (1984) Geman, S. and Geman, D. (1984). Stochastic relaxation, gibbs distributions, and the bayesian restoration of images. IEEE Transactions on Pattern Analysis and Machine Intelligence, 6:721–741.
  • Geweke (1992) Geweke, J. (1992).

    Evaluating the accuracy of sampling-based approaches to the calculations of posterior moments.

    Bayesian statistics, 4:641–649.
  • Hernán et al. (2008) Hernán, M. A., Alonso, A., Logan, R., Grodstein, F., Michels, K. B., Stampfer, M. J., Willett, W. C., Manson, J. E., and Robins, J. M. (2008). Observational studies analyzed like randomized experiments: an application to postmenopausal hormone therapy and coronary heart disease. Epidemiology (Cambridge, Mass.), 19(6):766.
  • Holland (1986) Holland, P. W. (1986). Statistics and causal inference. Journal of the American statistical Association, 81(396):945–960.
  • Huitfeldt et al. (2015) Huitfeldt, A., Kalager, M., Robins, J. M., Hoff, G., and Hernán, M. A. (2015). Methods to estimate the comparative effectiveness of clinical strategies that administer the same intervention at different times. Current epidemiology reports, 2(3):149–161.
  • Imbens and Rubin (2015) Imbens, G. W. and Rubin, D. B. (2015). Causal inference in statistics, social, and biomedical sciences. Cambridge University Press.
  • Kennedy et al. (2010) Kennedy, E. H., Taylor, J. M., Schaubel, D. E., and Williams, S. (2010). The effect of salvage therapy on survival in a longitudinal study with treatment by indication. Statistics in medicine, 29(25):2569–2580.
  • Li et al. (2001) Li, Y. P., Propert, K. J., and Rosenbaum, P. R. (2001). Balanced risk set matching. Journal of the American Statistical Association, 96(455):870–882.
  • Little and Rubin (2019) Little, R. J. and Rubin, D. B. (2019). Statistical analysis with missing data, volume 793. Wiley.
  • Mamdani et al. (2002) Mamdani, M., Rochon, P. A., Juurlink, D. N., Kopp, A., Anderson, G. M., Naglie, G., Austin, P. C., and Laupacis, A. (2002). Observational study of upper gastrointestinal haemorrhage in elderly patients given selective cyclo-oxygenase-2 inhibitors or conventional non-steroidal anti-inflammatory drugs. Bmj, 325(7365):624.
  • Marsh and Cormier (2001) Marsh, L. C. and Cormier, D. R. (2001). Spline regression models, volume 137. Sage.
  • McGettigan and Henry (2006) McGettigan, P. and Henry, D. (2006). Cardiovascular risk and inhibition of cyclooxygenase: a systematic review of the observational studies of selective and nonselective inhibitors of cyclooxygenase 2. Jama, 296(13):1633–1644.
  • Plummer et al. (2003) Plummer, M. et al. (2003). Jags: A program for analysis of bayesian graphical models using gibbs sampling. In Proceedings of the 3rd international workshop on distributed statistical computing, volume 124. Vienna, Austria.
  • Poses et al. (1995) Poses, R. M., Smith, W. R., McClish, D. K., and Anthony, M. (1995). Controlling for confounding by indication for treatment: Are administrative data equivalent to clinical data? Medical care, pages AS36–AS46.
  • Rubin (1973) Rubin, D. B. (1973). Matching to remove bias in observational studies. Biometrics, pages 159–183.
  • Rubin (1976) Rubin, D. B. (1976). Inference and missing data. Biometrika, 63(3):581–592.
  • Rubin (1980) Rubin, D. B. (1980). Randomization analysis of experimental data: The fisher randomization test comment. Journal of the American Statistical Association, 75(371):591–593.
  • Rubin (1984) Rubin, D. B. (1984). William g. cochran’s contributions to the design, analysis, and evaluation of observational studies. WG Cochran’s impact on statistics, pages 37–69.
  • Rubin (2010) Rubin, D. B. (2010). On the limitations of comparative effectiveness research. Statistics in medicine, 29(19):1991–1995.
  • Slaughter et al. (2017) Slaughter, J. L., Reagan, P. B., Newman, T. B., and Klebanoff, M. A. (2017). Comparative effectiveness of nonsteroidal anti-inflammatory drug treatment vs no treatment for patent ductus arteriosus in preterm infants. JAMA pediatrics, 171(3):e164354–e164354.
  • Spiegelhalter et al. (2002) Spiegelhalter, D. J., Best, N. G., Carlin, B. P., and Van Der Linde, A. (2002). Bayesian measures of model complexity and fit. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 64(4):583–639.
  • Yoshida et al. (2015) Yoshida, K., Solomon, D. H., and Kim, S. C. (2015). Active-comparator design and new-user design in observational studies. Nature Reviews Rheumatology, 11(7):437.
  • Zhou et al. (2005) Zhou, Z., Rahme, E., Abrahamowicz, M., and Pilote, L. (2005). Survival bias associated with time-to-treatment initiation in drug effectiveness evaluation: a comparison of methods. American journal of epidemiology, 162(10):1016–1023.

Appendix A Covariate selection procedure for VA application

Covariate measurements were collected for each patient during each visit that occurred in the study period, and each of the 534 treated patients in our analysis was observed for between one to 611 distinct visits during the study period. For variable selection, we first constructed a new dataset containing observations for each of the 534 treated patients at each of two time periods: 1) the visit associated with assignment to treatment, and 2) the preceding visit. Patients who received assignment to treatment at their first visit were included in the dataset only once. Each observation vector in the resulting dataset therefore corresponds to the covariate values of a patient who has not yet been assigned or to a patient at their time of assignment. In principle, contrasts of covariates between these two groups should capture information about how the latent health process - and the corresponding indication for assignment to either treatment or control - varies with these longitudinal measurements.

For variable selection, we performed random forest analysis (Breiman, 2001) implemented using the “randomForest” package in R (Breiman and Cutler, 2003) to evaluate the relative importance of each of the time-varying covariates for predicting the time of indication for treatment. Using this procedure, we identified the 20 most influential time-varying covariates from over 150 available variables. Among these 20 covariates were several related to the current physical characteristics of the patient (e.g., current weight and blood pressure) as well as variables that captured changes in these physical measurements (e.g., change in weight). Other important covariates include the number of comorbidities that the patient has at the time of the visit and changes in health care utilization such as recent organ failure events (e.g., right heart failure), recent inpatient visits, or receipt of incidental procedures (e.g., echocardiograph).