# Separable Effects for Causal Inference in the Presence of Competing Risks

In time-to-event settings, the presence of competing events complicates the definition of causal effects. Here we propose the new separable effects to study the causal effect of a treatment on an event of interest. The separable direct effect is the treatment effect on the event of interest not mediated by its effect on the competing event. The separable indirect effect is the treatment effect on the event of interest only through its effect on the competing event. Similar to Robins and Richardson's extended graphical approach for mediation analysis, the separable effects can only be identified under the assumption that the treatment can be decomposed into two distinct components that exert effects through distinct causal pathways. Unlike existing definitions of causal effects in the presence of competing risks, our estimands do not require cross-world contrasts or hypothetical interventions to prevent death. As an illustration, we implement our approach in a randomized clinical trial on estrogen therapy in individuals with prostate cancer.

## Authors

• 9 publications
• 8 publications
• 11 publications
• 24 publications
• 19 publications
09/08/2021

### Estimating causal effects in the presence of competing events using regression standardisation with the Stata command standsurv

When interested in a time-to-event outcome, competing events that preven...
04/30/2020

### Generalized interpretation and identification of separable effects in competing event settings

In competing event settings, a counterfactual contrast of cause-specific...
03/20/2019

### A Bayesian Nonparametric Approach for Evaluating the Effect of Treatment in Randomized Trials with Semi-Competing Risks

We develop a Bayesian nonparametric (BNP) approach to evaluate the effec...
02/26/2021

### A multistate approach for mediation analysis in the presence of semi-competing risks with application in cancer survival disparities

We propose a novel methodology to quantify the effect of stochastic inte...
06/28/2020

### Conditional separable effects

Researchers are often interested in treatment effects on outcomes that a...
02/15/2019

### Survivor average causal effects for continuous time: a principal stratification approach to causal inference with semicompeting risks

In semicompeting risks problems, nonterminal time-to-event outcomes such...
07/07/2020

### Longitudinal mediation analysis of time-to-event endpoints in the presence of competing risks

This proposal is motivated by an analysis of the English Longitudinal St...
##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1. Introduction

A competing event is any event that makes it impossible for the event of interest to occur. For example, consider a randomized trial to estimate the effect of a new treatment on the 3-year risk of prostate cancer in which 1000 individuals with prostate cancer were assigned to the treatment and 1000 to placebo. All participants adhered to the protocol and remained under follow-up. After 3 years, 100 individuals in the treatment arm and 200 in the placebo arm died of prostate cancer. Also, 150 individuals in the treatment arm and 50 in the placebo arm died of other causes (e.g., cardiovascular disease). Death from cardiovascular disease is a competing event for death from prostate cancer: individuals who die of cardiovascular disease cannot subsequently die of prostate cancer. When competing events are present, several causal estimands may be considered to define the causal effect of treatment on a time-to-event outcome

[1].

Consider first the total treatment effect defined by the contrast of the cumulative incidence [2, 3] of the event of interest under different treatment values [1]. In our example, the total treatment effect on death from prostate cancer is the contrast of the cumulative incidence of death from prostate cancer under treatment, consistently estimated by , and under placebo, consistently estimated by . Therefore, the estimate of the total treatment effect on the additive scale is , which indicates that treatment reduced the risk of death from prostate cancer.

However, in our trial, the interpretation of the total treatment effect on the event of interest is difficult because the treatment also increased the risk of the competing event. The estimate of the total effect of treatment on the competing event is on the additive scale. Thus, it is possible that the beneficial effect of treatment on death from prostate cancer is simply a consequence of the harmful effect of treatment on death from other causes: when more people die from other causes, fewer people can die from prostate cancer.

One way to deal with this problem is to consider a second causal estimand: the (controlled) direct effect of treatment on the event of interest not mediated by the effect of treatment on the competing events, that is, the effect of treatment if we had somehow intervened to eliminate all competing events. This estimand corresponds to defining the competing events as censoring events [1]. Unlike the total effect, identification of the controlled direct effect requires untestable assumptions even in an ideal randomized trial with perfect adherence and no loss to follow-up [1]. Also, this causal estimand often introduces a new conceptual challenge: the direct effect is not sufficiently well-defined because there is no scientific agreement as to which hypothetical intervention, if any, would eliminate the competing events [4]. For example, in our prostate cancer trial, no intervention has ever been proposed that can prevent all deaths from causes other than prostate cancer. As a byproduct of the ill-defined intervention to prevent competing events, effect estimates cannot be empirically verified – not even in principle – in a randomized experiment.

A third causal estimand is the survivor average causal effect (SACE) [5], which is the total treatment effect in the principal stratum of patients who would never experience the competing event under either level of treatment [1, 6, 7]. Unlike the total effect, the presence of competing events is not a problem when interpreting the SACE, because the SACE is restricted to subjects who do not experience competing events. However, identification of the SACE requires strong untestable assumptions, e.g. about cross-world counterfactuals, even in a perfectly executed trial. Also, the SACE could never, even in principle, be confirmed in a real-world experiment as it will never be possible to observe the status of the competing event for the same individual under two different levels of treatment.

The problems of the previous estimands can be overcome in settings in which the treatment exerts its effect on the event of interest and its effect on the competing event through different causal pathways. Here, we define the separable direct and indirect effects for settings with competing events. Like the controlled direct effect and the SACE, identification of separable effects relies on untestable assumptions even when the treatment is randomized. However, unlike the controlled direct effect and the SACE, separable effects do not require conceptual interventions on competing events or knowledge of cross-world counterfactuals. Therefore, in principle, they may be verified in a future experiment. Our definitions of separable effects and conditions for identifiability follow from the work of Robins and Richardson [8] and Didelez [9], who provided estimands for direct and indirect effects in mediation analysis which do not require cross-world assumptions.

We have organised the paper as follows. In Section 2, we describe the observed data structure. In Section 3, we present a conceptual treatment decomposition and provide explicit examples to fix ideas. In Section 4, we formulate the causal estimand and define the new separable effects. In Section 5, we present conditions that allow for identifiability of the separable effects. In Section 6, we explain how to estimate separable effects with standard statistical methods, and we use data from a randomized clinical trial to estimate a direct effect of estrogen therapy on prostate cancer mortality. In Section 7, we provide a final discussion of the new estimands.

## 2. Observed data structure

We consider a study in which individuals are randomly assigned to a binary treatment at baseline (e.g.  if assigned to treatment and if assigned to placebo). Let

denote a vector of individual pretreatment characteristics. Within each of equally spaced discrete time intervals

, let and denote indicators of an event of interest and a competing event, respectively. In our example, denotes death due to prostate cancer and death from other causes by interval . We adopt the convention that is measured right before . If an individual experiences the competing event at time without a history of the event of interest , then all future values of the event of interest are zero.

By definition,

, that is, no individual experiences any event during the initial interval. We use overbars to denote the history of a random variable, such that

is the history of the event of interest through interval . Similarly, we use underbars to denote future values of a random variable, such that . We assume full adherence to the assigned treatment without loss of generality, and until Section 5.3, no loss to follow-up.

## 3. Decomposition of treatment effects

Our approach requires that treatment can be conceptualized as having two binary components that act through different causal pathways: one component that affects the event of interest and one component that affects the competing event . In the observed data, and are deterministically related,

Crucially, we must conceive hypothetical interventions that set and to different values when we define the separable effects. The causal diagram in Figure 1 represents this decomposition in a setting with a single time point. The bold arrows represent the deterministic relation (1). The decomposition of treatment into two distinct components must be justified by subject-matter knowledge. Let us consider two examples.

### 3.1. Diethylstilbestrol and prostate cancer mortality

In our prostate cancer example, we assume that can be decomposed into a component that directly affects death from prostate cancer and a component that directly affects death from other causes. Suppose that treatment is diethylstilbestrol (DES), an estrogen which is thought to reduce mortality due to prostate cancer by suppressing testosterone production and to increase cardiovascular mortality through estrogen-induced synthesis of coagulation factors [10].

We could then consider a hypothetical treatment that has the same direct effect as DES on prostate cancer mortality (the same effect as ), but that lacks any effect on mortality from other causes. Real-life treatments similar to such a hypothetical treatment are luteinizing hormone releasing hormone (LHRH) antagonists or orchidectomy (castration), which reduce testosterone production (similar to ), but lack estrogen induced effects.

Also, we could consider a hypothetical treatment that has the same direct effect as DES on mortality from other causes (the same effect as ), but that lacks any effect on prostate cancer mortality. In practice, a drug that contains not only DES but also testosterone may resemble this hypothetical treatment, as the additional testosterone component will nullify the testosterone suppression that is induced by DES.

### 3.2. Statins and dementia

Consider a study to quantify the effect of statins on dementia. Statins reduce cardiovascular mortality by lowering the cholesterol production in the liver. As dementia may develop due to microvascular events in the small cerebral arteries, lowering cholesterol may also reduce the risk of dementia. When studying the effect of statins on dementia, death will be a competing risk.

In this scenario, decomposing into the distinct components and may be scientifically implausible because statins appear to reduce mortality and dementia through the same mechanism, i.e., lowering the cholesterol levels in the blood. Therefore, the approach that we describe below may be applicable to settings like the one described in the prostate cancer example but not to settings like the one described in this example.

Importantly, to have sufficiently well-defined causal estimands, the decomposition of treatment into components and does not need to be possible in practice, only in principle. Thus, we might be tempted to propose elaborate procedures to justify the decomposition. For example, we could leverage the distinct localization of the microvessels in the brain when trying to decompose the statin treatment: we could imagine a bioengineered cholesterol transporter, which is surgically implanted to shuttle cholesterol particles from the distal cerebral arteries directly to the large cerebral veins, circumventing the cerebral microvessels. That is, if and denote dementia and death, respectively, then carriers of the transporter will have the component of statins on dementia, but they will lack the component of statins on mortality. Similarly, Robins and Richardson discuss the construction of scientifically plausible interventions in a mediation context, using nicotine in cigarettes as an example [8, Section 5.2].

Indeed, it is debated whether statins have a protective effect on dementia [11], and to clarify the notion of a ’protective effect’ it may be helpful to consider a hypothetical trial in which subjects were randomly assigned to the cholesterol transporter or placebo. However, we make important assumptions by requiring that and are the treatment components actually operating in the data [4]. In particular, we require that has no direct effect on , and that has no direct effect on . We may be less confident that these conditions hold when we use a convoluted argument to define and . If these conditions are violated in the data, our effect estimates may in turn be very far from those that would be obtained in an experiment in which individuals are randomized to our and .

## 4. Defining separable effects

For , let be an individual’s event of interest at time when, possibly contrary to fact, is set to the value . Then the average causal effect of treatment at time is on the additive scale.

Let be an individual’s value of when, possibly contrary to fact, is set to and is set to , where . Hereafter we will often simplify the notation by removing and from the superscripts, that is . We can now define the separable direct effects of treatment on the event of interest as the contrasts

 Pr(Ya∗=1,ak+1=1) vs. Pr(Ya∗=0,ak+1=1)

for or ; that is, the effect of the component of treatment that affects the event of interest when the component of treatment that affects the competing event is set at a constant value .

Analogously, we can define the separable indirect effects of treatment on the event of interest as the contrasts

 Pr(Ya∗,a=1k+1=1) vs. Pr(Ya∗,a=0k+1=1),

for and ; that is, the effect of the component of treatment that affects the competing event when the component of treatment that affects the event of interest is set at a constant value . In other words, the separable indirect effects are functions of the treatment component that affects the competing event , and the separable indirect effects arise because the competing event makes it impossible for the event of interest to occur.

We require that setting is equivalent to setting both and to , that is,

When we consider hypothetical interventions, such as in Section 3.2, our confidence that the analysis will be valid depends on the plausibility of (2). From (2) we find that the sum of separable direct and indirect effects (on the additive scale) equals the total effect:

 [Pr(Ya∗=1,a=1k+1=1)−Pr(Ya∗=0,a=1k+1=1)] +[Pr(Ya∗=0,a=1k+1=1)−Pr(Ya∗=0,a=0k+1=1)] =Pr(Ya=1k+1=1)−Pr(Ya=0k+1=1).

## 5. Identification of separable effects

The identification of the separable effects requires the identification of the quantities

 (3) Pr(Ya∗,ak+1=1),

where . Identifying these quantities would be straightforward if each of the treatment components could be separately intervened upon, that is, if we could conduct a randomized experiment with 4 possible treatment arms defined by the 4 combinations of values of and . However, when using data from a study like that of Section 2, in which only the treatment is randomized, we only observe 2 out of the 4 treatment arms in a hypothetical trial in which and were randomized. As a result, identification of (3) requires that the following untestable conditions hold.

### 5.1. Identifiability conditions

First, exchangeability conditional on the measured covariates ,

 ¯Ya,aK+1,¯Da,aK+1to0.0pt$⊥$⊥A∣L for all a,

where time is the end of the study. This exchangeability condition is expected to hold when is randomized.

Second, consistency, such that if and , then

at all times . If any subject has data history consistent with the intervention under a counterfactual scenario, then the consistency assumption ensures that the observed outcome is equal to the counterfactual outcome.

Third, positivity such that

 f(L)>0⟹ (4) Pr(A=a∣L)>0 w.p.1,

which is the usual positivity condition under interventions on . However, our estimand is based on hypothetical intervention on both and , and our positivity condition does not ensure the stricter condition

which, indeed, will be violated when in our setting where .

To allow for identifiability under our positivity condition in (4), we introduce two conditions that are related to conditions described by Didelez in a mediation setting [9].

### Dismissible component condition 1

 Δ1: Pr(Ya∗,ak+1=1∣Ya∗,ak=0,Da∗,ak+1=0,L=l) =Pr(Ya∗,a∗k+1=1∣Ya∗,a∗k=0,Da∗,a∗k+1=0,L=l),

at all times . That is, the counterfactual (discrete-time) hazards of the event of interest are equal under all values of .

### Dismissible component condition 2

 Δ2: Pr(Da∗,ak+1=1∣Ya∗,ak=0,Da∗,ak=0,L=l) =Pr(Da,ak+1=1∣Ya,ak=0,Da,ak=0,L=l),

at all times . That is, the counterfactual (discrete-time) hazard functions of the competing event are equal under all values of .

By considering a hypothetical trial in which both and are randomized, we can define conditional independences that imply the dismissible component conditions, and these conditional independences can be read off of causal DAGs directly, see Appendix A for details.

The dismissible component conditions ensure that there are no unmeasured common causes of and for all . In particular, an unmeasured common cause of and , such as in Figure 2, violates 1 and 2. However, the presence of unmeasured causes of and unmeasured causes of , as shown in Figure 3, does not violate 1 and 2 (see Appendix C for details); it just implies that the hazard terms in (LABEL:eq:_identification_L) cannot be causally interpreted due to conditioning on a collider [1, 12, 13], which is analogous to the mediation setting in Didelez [9, Figure 6]. For this reason, we have defined our causal estimands as contrasts of risks rather than as contrasts of hazards. Furthermore, adjusting for a common cause of and , such as in Figure 4, allows identification under 1 and 2. In subsequent figures we have omitted the variables and to avoid clutter, but our results are valid in the presence of and . We have also omitted an arrow from to , but this arrow would not invalidate our results. Furthermore, we have intentionally omitted arrows from to for , as these arrows are redundant in our setting where the competing event is a terminating event that precludes the event of interest at all subsequent times.

The dismissible component conditions are not empirically verifiable in a trial in which the entire treatment , but neither of its components and , is intervened upon. However, both conditions could be tested in a trial in which and were randomly assigned.

### 5.2. Identification formula

Under the identifiability conditions in Section 5.1, we identify from the following g-functional [6] of the observed data described in Section 2,

 = ∑l[K∑s=0Pr(Ys+1=1∣Ds+1=Ys=0,A=a∗,L=l) s∏j=0[Pr(Dj+1=0∣Dj=Yj=0,A=a,L=l) ×Pr(Yj=0∣Dj=Yj−1=0,A=a∗,L=l)]]f(L=l),

which can be shown by considering a hypothetical trial where and are randomized, see Appendix B. Also, the identification formula can be found by drawing Single World Intervention Templates (SWITs) [14] for scenarios of interest, as suggested in Figure 5.

Importantly, we must measure , i.e. all common causes of the event of interest and the competing event, to identify the separable effects by (LABEL:eq:_identification_L), even when is randomized.

### 5.3. Separable effects in the presence of censoring

We consider a subject to be censored at time if the subject remained under follow-up and was event-free until , but we have no information about the subject’s events at or later. That is, censoring is a type of event that does not make it impossible for the event of interest to occur and censoring can always in principle be prevented. When the censoring is independent of future counterfactual events given , as illustrated in Figure 6, we can identify the separable effects from

 = ∑l[K∑s=0Pr(Ys+1=1∣Ds+1=Ys=¯Cs+1=0,A=a∗,L=l) s∏j=0[Pr(Dj+1=0∣Dj=Yj=¯Cj+1=0,A=a,L=l) ×Pr(Yj=0∣Dj=Yj−1=¯Cj=0,A=a∗,L=l)]]f(L=l),

where is an indicator of being censored at , see Appendix B for details. Alternatively, the identification formula can be derived by drawing a SWIT for the scenario of interest, as suggested in Figure 7.

## 6. Estimation of separable effects

To estimate the separable effects, we emphasize that (LABEL:eq:_identification_L) and (LABEL:eq:_identification_censoring_L) are functionals of hazard functions and the density of . Indeed, and are often denoted ’cause specific hazard functions’ in the statistical literature. Whereas the term ’cause specific’ is confusing because the causal interpretation of these hazard functions is ambiguous [1], we can nevertheless estimate these functions using classical methods from survival analysis, such as multiplicative or additive hazard models. Provided that these hazard models are correctly specified, along with the density [15], we can consistently estimate (LABEL:eq:_identification_censoring_L). In the next section, we use this approach to analyse a randomized trial on prostate cancer therapy.

### 6.1. Example: A randomized trial on prostate cancer therapy

We consider the effect of DES on prostate cancer mortality, as described in Section 3.1. Data from a trial that randomly assigned DES to prostate cancer patients are freely available (http://biostat.mc.vanderbilt.edu/DataSets) [16], and the data have been used in several methodological articles on competing risks [17, 18, 19, 20]. In total, 502 patients were assigned to 4 different treatment arms, and we restrict our analysis to the placebo arm (127 patients) and the high-dose DES arm (125 patients).

As suggested in Section 3.1, we consider a hypothetical drug that has the same direct effect as DES on prostate cancer mortality (same ), but it lacks any effect on mortality due to other causes (opposite ). Then we can estimate a separable direct effect of treatment by comparing the risk under the hypothetical drug to the risk under full DES treatment. To adjust for common causes of death due to prostate cancer and death due to other causes ( in Figure 6

), we used pooled logistic regression models to estimate the terms in (

LABEL:eq:_identification_censoring_L

), in which daily activity function, age group, hemoglobin level and previous cardiovascular disease were included as covariates, that is,

 logit[Pr(Yt=1∣Dt=Yt−1=¯Ct=0,A,L)]=θ0,t+θ1A+θ2At+θ3At2+θ′4L logit[Pr(Dt=1∣Dt−1=Yt−1=¯Ct=0,A,L)]=β0,t+β1A+β2At+β3At2+β′4L,

where and are time-varying intercepts modeled as 3rd degree polynomials. Since the validity of the causal effect estimation hinges on correct specification of the hazard models, we included and to allow for time-varying treatment effects.

The cumulative incidence under the hypothetical drug was similar, but not identical, to the cumulative incidence under DES treatment, as shown in Figure 8B. In Table 1

, we have also displayed estimates of the cumulative incidence with 95% bootstrap confidence intervals after 3 years of follow-up (Computer code in

R is found in the supplementary material).

Our analysis suggests that DES mostly reduces prostate cancer mortality by testosterone suppression, that is, the total effect of DES on prostate cancer mortality is not simply a consequence of a harmful effect on death from other causes. In particular, the point estimate of the separable indirect effect on death due to prostate cancer is after 3 years, which can be interpreted as the reduction in prostate cancer mortality under DES compared to placebo that is due to the DES effect on mortality from other causes. Our results rely on the assumption that all common causes of and are measured in , which is a bold assumption because other factors, such as unmeasured comorbidities, may exert effects on both and .

### 6.2. Scenarios with considerable separable direct effects.

We can provide some intuition to illuminate scenarios in which the separable direct effect on the event of interest at time would be substantially different from the total effect. First, consider a hypothetical set of subjects such that all share a particular characteristic, say due to a genetic variant that interacts with treatment: subject would experience the competing event at time , where , under full treatment (that is, ), but subject would experience the event of interest at a time , where , under the hypothetical treatment in which the separable indirect effect is lacking (that is, ), see Table 2

. Heuristically, this happens if the hypothetical treatment delays the competing event such that the event of interest occurs first. If

comprises a large fraction of the population, we would expect the total effect and the separable direct effect to be different, because competing events would make it impossible for the event of interest to occur under full treatment, but not under the hypothetical treatment. On the other hand, consider a hypothetical set of subjects. All subjects in share another characteristic, say due to another genetic variant that interacts with treatment: any subject would experience the competing event at time , where , under full treatment, but would either experience the competing event at , where , or not experience any event before under the hypothetical treatment. That is, the subjects in will not experience the event of interest before under the hypothetical treatment, regardless of the time at which the competing event occurs. If comprises a large fraction of the population, the total effect and the separable direct effect on the event of interest are likely to be similar.

## 7. Discussion

We have defined new estimands to promote causal reasoning in competing risk settings. These estimands are motivated by hypothetical interventions, in which a time-fixed treatment is decomposed into distinct components, and each component can be assigned different values. The fact that our estimands are defined with respect to manipulable parameters is appealing, and it allows us to study causal effects under conditions that can be empirically assessed, at least in principle.

The separable direct and indirect effects reveal mechanisms by which a treatment influences the outcome of interest. Indeed, the separable effects are conceptually similar to single-world alternatives to pure (natural) effects in mediation analysis [8, 9]. In particular, the total effect can be expressed as a sum of separable direct and indirect effects.

We have provided conditions that allow for identifiability of separable effects in the presence of censoring and pretreatment variables that are common causes of the event of interest and the competing event. In the prostate cancer example, we showed how standard time-to-event methods can be used to estimate the separable effects. However, there will be time-varying common causes of the event of interest and the competing event in many settings, and we aim to generalize our approach to allow for time-varying covariates in future work, such that conditions and become more plausible.

In this article, we have considered settings in which an exposure is randomized. This was our intention, as we focused on defining, identifying and interpreting the estimand. Nevertheless, the decomposition can also be useful in analyses of observational studies, assuming that we can adjust for confounding between the exposure and both competing outcomes.

Finally, the idea of separable effects is not only relevant to settings in which the outcome of interest is a time-to-event. Many practical settings involve intermediate outcomes that are ill-defined after the occurrence of a terminating event. For example, we may be interested in treatment effects on outcomes such as quality of life or cognitive function, and these outcomes are meaningless after death. We aim to study separable effects in such settings in future research.

## References

• [1] Jessica G Young, Eric J Tchetgen Tchetgen, and Miguel A Hernán. The choice to define competing risk events as censoring events and implications for causal inference. arXiv preprint arXiv:1806.06136, 2018.
• [2] Ross L Prentice, John D Kalbfleisch, Arthur V Peterson Jr, Nancy Flournoy, Vern T Farewell, and Norman E Breslow. The analysis of failure times in the presence of competing risks. Biometrics, pages 541–554, 1978.
• [3] Per Kragh Andersen, Ronald B Geskus, Theo de Witte, and Hein Putter. Competing risks in epidemiology: possibilities and pitfalls. International journal of epidemiology, 41(3):861–870, 2012.
• [4] Miguel A Hernán. Does water kill? a call for less casual causal inferences. Annals of epidemiology, 26(10):674–680, 2016.
• [5] Eric J Tchetgen Tchetgen. Identification and estimation of survivor average causal effects. Statistics in medicine, 33(21):3601–3628, 2014.
• [6] James Robins. A new approach to causal inference in mortality studies with a sustained exposure period—application to control of the healthy worker survivor effect. Mathematical modelling, 7(9-12):1393–1512, 1986.
• [7] Constantine E Frangakis and Donald B Rubin. Principal stratification in causal inference. Biometrics, 58(1):21–29, 2002.
• [8] James M Robins and Thomas S Richardson. Alternative graphical causal models and the identification of direct effects. Causality and psychopathology: Finding the determinants of disorders and their cures, pages 103–158, 2010.
• [9] Vanessa Didelez. Defining causal meditation with a longitudinal mediator and a survival outcome. Lifetime Data Analysis, DOI: 10.1007/s10985-018-9449-0, 2018.
• [10] Rafal Turo, Michal Smolski, Rachel Esler, Magda L Kujawa, Stephen J Bromage, Neil Oakley, Adebanji Adeyoju, Stephen CW Brown, Richard Brough, Andrew Sinclair, et al. Diethylstilboestrol for the treatment of prostate cancer: past, present and future. Scandinavian journal of urology, 48(1):4–14, 2014.
• [11] Melinda C Power, Jennifer Weuve, A Richey Sharrett, Deborah Blacker, and Rebecca F Gottesman. Statins, cognition, and dementia—systematic review and methodological commentary. Nature Reviews Neurology, 11(4):220, 2015.
• [12] Miguel A Hernán. The hazards of hazard ratios. Epidemiology (Cambridge, Mass.), 21(1):13, 2010.
• [13]

Mats Julius Stensrud, Morten Valberg, Kjetil Røysland, and Odd O Aalen.

Exploring selection bias by causal frailty models: The magnitude matters. Epidemiology, 28(3):379–386, 2017.
• [14] Thomas S Richardson and James M Robins. Single world intervention graphs (swigs): A unification of the counterfactual and graphical approaches to causality. Center for the Statistics and the Social Sciences, University of Washington Series. Working Paper, 128(30):2013, 2013.
• [15] Jessica G Young, Lauren E Cain, James M Robins, Eilis J O’Reilly, and Miguel A Hernán. Comparative effectiveness of dynamic treatment regimes: an application of the parametric g-formula. Statistics in biosciences, 3(1):119, 2011.
• [16] DP Byar and SB Green. The choice of treatment for cancer patients based on covariate information. Bulletin du cancer, 67(4):477–490, 1980.
• [17] Frank E Harrell, Kerry L Lee, and Daniel B Mark. Multivariable prognostic models: issues in developing models, evaluating assumptions and adequacy, and measuring and reducing errors. Statistics in medicine, 15(4):361–387, 1996.
• [18] Richard Kay. Treatment effects in competing-risks analysis of prostate cancer data. Biometrics, pages 203–211, 1986.
• [19] JP Fine. Analysing competing risks data with transformation models. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 61(4):817–830, 1999.
• [20] Ravi Varadhan, Carlos O Weiss, Jodi B Segal, Albert W Wu, Daniel Scharfstein, and Cynthia Boyd. Evaluating health outcomes in the presence of competing risks: a review of statistical methods and clinical applications. Medical care, pages S96–S105, 2010.

## Appendix A Conditional Independences that imply the dismissible component conditions.

We expressed the dismissible component conditions 1 and 2 in terms of equalities of hazards. We now show that these hazard equalities are implied by certain counterfactual independencies that can be read directly off of successive single world transformation of a causal DAG.

#### a.0.1. Hypothetical trial

Let each component of be randomly assigned in a hypothetical trial . Heuristically, is the 4 arm trial that we do not observe. To indicate that the random variables are defined with respect to , let and be the value of and observed under , respectively. We assume that and are randomized independently of each other to values in , that is . Assume no losses to follow-up. The dismissible component conditions are

 (7) Yk+1(G)to0.0pt$⊥$⊥AD(G)∣AY(G),Yk(G)=0,Dk+1(G)=0,L(G), (8) Dk+1(G)to0.0pt$⊥$⊥AY(G)∣AD(G),Dk(G)=0,Yk(G)=0,L(G).

### a.1. Conditions that ensure Δ1 and Δ2

Since and are randomized, conditional exchangeability is satisfied in the trial , such that

 ¯Ya∗,aK+1(G),¯Da∗,aK+1(G)to0.0pt$⊥$⊥to0.0pt$⊥$⊥AY(G),AD(G)∣L(G),

where . In the special case where , this conditional exchangeability condition is the same as the conditional exchangeability condition in the main text.

Furthermore, we assume consistency in , that is, if and then

 Ya∗,ak+1(G)=Yk+1(G) Da∗,ak+1(G)=Dk+1(G),

where . This consistency condition is identical to the consistency condition in the main text when .

We assume positivity, that is

Let , where . Using exchangeability and consistency we find that

Similarly, using (7), exchangeability and consistency we find

The derivations in (LABEL:eq:_identical_ass_1) and (LABEL:eq:_identical_ass_2) show that 1 is satisfied if condition (7) holds, assuming conditional exchangeability, positivity and consistency. We can use exactly the same argument to show that condition 2 holds under conditional exchangeability, positivity, consistency and condition (8). Conditions (7) and (8) are helpful in practice because these independences can be evaluated in causal graphs. In particular, these conditions hold in Figure 9, where we have described a trial in which and are randomly assigned independently of each other.

## Appendix B Proof of identifiability

We assume a Finest Fully Randomized Causally Interpretable Structured Tree Graph (FFRCISTG) model [6]. The aim is to identify as a function of the factual data, in which is randomized. To do this, we will initially consider a scenario in which both and are randomized, that is, we consider a 4 arm trial , as described in Appendix A. Hereafter we omit the string ’’ after the random variables, e.g. , to avoid clutter. We will provide a proof for the scenario with a measured pretreatment covariate and censoring . The results will immediately hold in simpler scenarios, e.g. by defining or to be the empty set.

### b.1. Indentifiabilty conditions in the presence of censoring

First, we generalize the identifiability assumptions to allow for censoring. Assume that subjects may be lost to follow-up, and that the losses to follow-up can depend on , and , as suggested in Figure 6. Further, assume that the losses to follow-up are independent of future counterfactual events (’independent censoring’). To be more precise, we consider a setting in which we intervened such that no subject was lost to follow-up. Let be an indicator of loss to follow-up at . Let and be the counterfactual values of and when is set to , is set to , and follow-up is ensured at all times.

In a continuous time setting, it is usually assumed that two events cannot occur at the same point in time. In our discrete time setting with pretreatment covariates and censoring , we define a temporal order

For all we consider the following conditions. First, we extend the exchangeability conditions from Section 5.1,

Here, as in Section 5.1, E1 holds when are randomized. E2 requires that losses to follow-up are independent of future counterfactual events, given the measured past. This condition is similar to the ’independent censoring’ condition that is assumed to hold in classical randomized trials [1].

Furthermore, we require a consistency condition such that if , and , then and , and still we only observe scenarios where . The consistency condition ensures that if an individual has a data history consistent with the intervention under a counterfactual scenario, then the observed outcome is equal to the counterfactual outcome.

Similar to Section 5.1, the exchangeability and consistency conditions are conventional in the causal inference literature. We also require an extra positivity condition

 f(A,Yk=0,Dk=0,¯Ck=0,L)>0⟹ Pr(Ck+1=0∣Yk=0,Dk=0,¯Ck=0,L,A)>0 w.p.% 1,

which ensures that for any possible history of treatment assignments and covariates among those who are event-free and uncensored at , some subjects will remain uncensored at .

Finally, we rely on two dismissible component conditions which generalize the conditions in Section 5, by allowing for a hypothetical intervention to eliminate censoring at all times.

Dismissible component conditions:

 Δ1c: Pr(Ya∗,a,c=0k+1=1∣Ya∗,a,c=0t=0,Da∗,a,c=0k+1=0,L) =Pr(Ya∗,a∗,c=0k+1=1∣Ya∗,a∗,c=0t=0,Da∗,a∗,c=0k+1=0,L) Δ2c: Pr(Da∗,a,c=0k+1=1∣Ya∗,a,c=0k=0,Da∗,a,c=0k=0,L) =Pr(Da,a,c=0k+1=1∣Ya,a,c=0k=0,Da,a,c=0k=0,L).

Under these conditions, is identified from (LABEL:eq:_identification_censoring_L).

### b.2. Proof of identifiability

We consider the counterfactual outcomes from the 4 arm trial in which and

are randomized, and we use laws of probability as well as

1 and 2 to express

 = Pr(Ya∗,a,c=0K+1=1) = ∑l[Pr(Ya∗,a,c=0K+1=1∣L=l)]f(L=l) = ∑l[K∑s=0Pr(Ya∗,a,c=0s+1=1∣Da∗,a,c=0s+1=Ya∗,a,c=0s=0,L=l) s∏j=0[Pr(Da∗,a,c=0j+1=0∣Da∗,a,c=0j=Ya∗,a,c=0j=0,L=l) ×Pr(Ya∗,a,c=0j=0∣Da∗,a,c=0j=Ya∗,a,c=0j−1=0,L=l)]]f(L=l) = ∑l[K∑s=0Pr(Ya∗,a∗,c=0s+1=1∣Da∗,a∗,c=0s+1=Ya∗,a∗,c=0s=0,L=l) s∏j=0[Pr(Da,a,c=0j+1=0∣Da,a,c=0j=Ya,a,c=0j=0,L=l) ×Pr(Ya∗,a∗,c=0j=0∣Da∗,a∗,c=0j=Ya∗,a∗,c=0j−1=0,L=l)]]f(L=l),

where is the empty set.

For , let us consider the term