# Learning Causal Hazard Ratio with Endogeneity

Cox's proportional hazards model is one of the most popular statistical models to evaluate associations of a binary exposure with a censored failure time outcome. When confounding factors are not fully observed, the exposure hazard ratio estimated using a Cox model is not causally interpretable. To address this, we propose novel approaches for identification and estimation of the causal hazard ratio in the presence of unmeasured confounding factors. Our approaches are based on a binary instrumental variable and an additional no-interaction assumption. We derive, to the best of our knowledge, the first consistent estimator of the population marginal causal hazard ratio within an instrumental variable framework. Our estimator admits a closed-form representation, and hence avoids the drawbacks of estimating equation based estimators. Our approach is illustrated via simulation studies and a data analysis.

## Authors

• 15 publications
• 20 publications
• 5 publications
• 28 publications
• ### A weighting method for simultaneous adjustment for confounding and joint exposure-outcome misclassifications

Joint misclassification of exposure and outcome variables can lead to co...
01/15/2019 ∙ by Bas B. L. Penning de Vries, et al. ∙ 0

• ### Semiparametric causal mediation analysis under unmeasured mediator-outcome confounding

Although the exposure can be randomly assigned in studies of mediation e...
12/12/2020 ∙ by BaoLuo Sun, et al. ∙ 0

• ### Nonparametric tests of the causal null with non-discrete exposures

In many scientific studies, it is of interest to determine whether an ex...
01/15/2020 ∙ by Ted Westling, et al. ∙ 0

• ### The hazard ratio is interpretable as an odds or a probability under the assumption of proportional hazards

Three statistical studies, all published between 2004 and 2008 but witho...
09/24/2021 ∙ by David M. Thompson, et al. ∙ 0

• ### Exact parametric causal mediation analysis for non-rare binary outcomes with binary mediators

In this paper, we derive the exact parametric expressions of natural dir...
11/01/2018 ∙ by Marco Doretti, et al. ∙ 0

• ### Epidemiology of exposure to mixtures: we cant be casual about causality when using or testing methods

Background: There is increasing interest in approaches for analyzing the...
07/02/2020 ∙ by Thomas F. Webster, et al. ∙ 0

• ### Handling time-dependent exposures and confounders when estimating attributable fractions – bridging the gap between multistate and counterfactual modeling

The population-attributable fraction (PAF) expresses the percentage of e...
11/09/2020 ∙ by Johan Steen, et al. ∙ 0

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

In observational studies with a possibly right censored outcome, the Cox proportional hazards model is by far the dominant analysis tool to infer the association between a binary treatment and an outcome. The associational measure here is the hazard ratio, that is, the ratio of instantaneous incidence rates between treatment groups. It is well-known that the hazard ratio estimated with a Cox model may be subject to residual confounding bias if, as is the case in many observational studies, the treatment variable is endogenous, i.e. subject to unmeasured confounding.

A classical approach to deal with unmeasured confounding uses an instrumental variable. Intuitively, conditional on baseline covariates, an instrumental variable is an exogenous variable that is associated with the outcome only through its association with the treatment. The instrumental variable approach has been well-developed for the analysis of continuous and binary outcomes (e.g. Wright and Wright, 1928; Goldberger, 1972; Angrist et al., 1996; Abadie, 2003; Hernán and Robins, 2006; Tan, 2006; Wooldridge, 2010; Clarke and Windmeijer, 2012; Wang and Tchetgen Tchetgen, 2018) but less so for a right-censored survival outcome, particularly within the dominant Cox regression framework. This is principally because due to non-collapsibility of hazard ratio, the commonly used two-stage methods for instrumental variable estimation fail to provide consistent estimates. In this paper, we fill this gap by proposing a consistent estimator of the population-average causal hazard ratio in the case of an endogenous treatment variable, which to the best of our knowledge, is the first in the literature. Similar in spirit to a Cox regression, we make the proportional causal hazard ratio assumption, which results in the so-called marginal structural Cox model (Hernán et al., 2000). If the treatment is exogenous so that there is no unmeasured confounding, the marginal structural Cox model parameters are identifiable and can be interpreted as the causal hazard ratio. To identify the causal hazard ratio with a binary endogenous treatment variable, in addition to a valid binary instrument, we require a no-interaction assumption that the instrument and unmeasured confounders do not interact on the additive scale in their effect on the exposure. Our identification result extends that of Wang and Tchetgen Tchetgen (2018), who establish identifiability of treatment effects on the additive scale under the same assumption. We allow the outcome model to be completely unrestricted other than the marginal structural Cox model assumption, thus in sharp contrast to various treatment effect homogeneity assumptions previously used in the literature to identify population-average treatment effects with an instrument (e.g. Aronow and Carnegie, 2013; Hernán and Robins, 2006). Our identification formula readily leads to an estimating equation for the causal hazard ratio. To ease computation, we also develop a closed-form representation of the causal hazard ratio under our identification assumption. This is particularly appealing as without a closed-form representation, in practice it can be difficult to find a solution to an estimating equation. Even if one finds one solution, it can be difficult to check the uniqueness of such a solution.

Our target of inference is different from most previous developments for instrumental variable estimation in a survival context, which are motivated by randomized survival studies with non-compliance. The treatment effects considered by these proposals are defined within the so-called complier stratum, consisting of individuals who would comply with the assigned treatment under both active treatment and control. Such estimands include the complier hazard difference (Baker, 1998), the complier hazard ratio (Loeys and Goetghebeur, 2003; Cuzick et al., 2007)

, the complier quantile causal effect

(Frandsen, 2015; Yu et al., 2015)

, the complier survival probabilities

(Nie et al., 2011; Yu et al., 2015) and the complier average causal effect (Abadie, 2003; Cheng et al., 2009; Yu et al., 2015). However, in practice the complier causal effects are often only of secondary interest as they concern a highly selective unknown subset of the population (Robins and Greenland, 1996). Furthermore, its definition depends on the particular instrument that is available (Wooldridge, 2010). This could potentially be a serious limitation outside of the non-compliance setting, especially when there is no natural choice of instrument such as a randomised treatment assignment.

Our work instead contributes to the literature on instrumental variable estimation of population-average treatment effects in a survival context. Prior to our work, Robins and Tsiatis (1991) parameterize the treatment effect under a structural accelerated failure time model, Li et al. (2015), Tchetgen Tchetgen et al. (2015) and Martinussen et al. (2017) consider estimating the conditional hazard difference under a structural cumulative survival model, Martinussen et al. (2017) consider estimating the causal hazard ratio among the treated, while Choi and O’Malley (2017) consider estimating the average treatment effect on the survival time. None of these methods, however, were designed to estimate the population-average causal hazard ratio, which is a natural target of inference given the popularity of the Cox model in practice. Although MacKenzie et al. (2014) have also considered instrumental variable estimation of the population-average causal hazard ratio, their estimating equation is only approximately unbiased. The resulting estimator is hence not consistent. Furthermore, their approach does not allow for observed confounders and is limited to a somewhat artificial causal model (Tchetgen Tchetgen et al., 2015).

## 2 Framework and notation

Consider an observational study where interest lies in estimating the effect of a binary treatment on a possibly censored continuous survival outcome . The effect of interest is subject to confounding by observed variables as well as unobserved variables . Let denote the censoring time and be the death indicator: The observed time on study Let denote a binary instrumental variable with a 0-1 coding scheme. Using the notion of potential outcome (Neyman, 1923; Rubin, 1974), let be the potential exposure if the instrument had taken value to be well-defined (the Stable Unit Treatment Value Assumption, Rubin, 1980). Similarly, we assume and , the potential survival and censoring time if a unit were exposed to and the instrument had taken value to be well-defined. Under Assumption 1 described later, we may define the potential survival function as and the potential hazard function as Let , we may then similarly define and .

We assume the marginal structural Cox model:

 λTd(t)=λT0(t)eψd.

We are interested in estimating , the log of causal hazard ratio.

We make the following assumptions commonly invoked in an instrumental variable analysis.

.

###### Assumption 3 (Instrumental variable relevance):

for all in the support of .

###### Assumption 5 (Independent censoring):

Figure 1 gives causal graph representations (Pearl, 2009; Richardson and Robins, 2013) of the conditional instrumental variable model. Assumptions 15 can be read off from the single world intervention graph (Richardson and Robins, 2013) in Figure 1(b) via d-separation (Pearl, 2009).

One can see from the bi-directed arrows in Figure 1 that we allow for latent common causes of and , so that the instrument and exposure can be associated because has a causal effect on , or because they share a common cause, or both. This is important as in observational study settings, it may not be realistic that one has measured all common causes of and .

Even with a valid instrument, in general population level causal effects are not identifiable from observed data. In the next section, we consider additional assumptions to identify the causal hazard ratio.

## 3 Identification and estimation of the causal hazard ratio

### 3.1 Estimating equation based approach

We consider the identification problem of causal hazard ratio:

 ψ=logλT1(t)λT0(t).

Our identification result is based on the following no-interaction assumption.

###### Assumption 6:

There is no additive interaction in :

 E[D∣Z=1,X,U]−E[D∣Z=0,X,U]=E[D∣Z=1,X]−E[D∣Z=0,X]≡δD(X). (1)

Assumption 6 has been used previously in Wang and Tchetgen Tchetgen (2018) to identify the average treatment effect on the additive scale with an uncensored outcome. It states that conditional on measured covariates , no unmeasured confounder in interacts with the instrument on the additive scale in causing exposure.

###### Remark 1:

When the instrument is randomized, (6) is equivalent to . Let be the compliance type (Wang and Tchetgen Tchetgen, 2018, Table 1). Then Assumption 6 holds as long as there is no unmeasured confounders that also predicts compliance type. As pointed out in Wang and Tchetgen Tchetgen (2018), this assumption has an important design implication that even if a randomized instrument is available, investigators should still try to collect covariates that may predict compliance type.

###### Remark 2:

In general, is not uniquely defined by assumptions 2 and 4. We say Assumption 6 holds if there exists one set of variables that satisfies assumptions 2, 4 and 6 simultaneously.

With a valid instrument satisfying Assumption 6, the causal hazard ratio is identifiable, as shown in Theorem 1.

###### Theorem 1:

Under Assumptions 16, the causal hazard ratio is identifiable and satisfies where

 U(ψ)=∫dN(y)ω(Z,X)[D−E{DeψDI(Y≥y)ω(Z,X)}E{eψDI(Y≥y)ω(Z,X)}], (2)

in which is the counting process of observed failure events, and is any function of such that (2) is well-defined.

To provide intuition into equation (2), recall that the partial score equation in a regular Cox model (Cox, 1972) takes the following form:

 U(β)=Pn∫[W−Pn{WeβWI(Y≥y)}Pn{eβWI(Y≥y)}]dN(y), (3)

where denotes empirical average and denotes the covariates in a regular Cox model. To account for unmeasured confounding, we considered a weighted version of (3) by applying weight function to the at risk process for each time point . Our weighted analysis is similar in spirit to inverse probability weighting techniques commonly used in survival analysis to account for censoring (Robins and Rotnitzky, 1992), observed confounding (Hernán et al., 2000) and to detect early differences in survival times (weighted log-rank test, e.g. Fleming and Harrington, 2011).

The weight function has been used to incorporate information on an instrument in previous analyses; see for example, Wang and Tchetgen Tchetgen (2018). However, it will make (2) ill-defined since under the null that . Instead, we add a stabilization term to the weight function. It can be shown that any choice of with ensures that (2) is well-defined.

Identification formula (2) directly leads to a weighting estimator for . Suppose and are finite-dimensional models on and respectively. The parameter can be estimated using the maximum likelihood estimator The conditional risk difference model , however, does not give rise to a likelihood by itself, so estimation of relies on additional nuisance models. The estimation problem of has been longstanding in the statistical literature and has recently been solved by Richardson et al. (2017)

by specifying a nuisance model on the odds product

where and Remarkably the nuisance function is variation independent of the conditional risk difference so that with proper model specifications for and , the parameter space of is an unconstrained space in The MLE can then be obtained by unconstrained maximization. We refer interested readers to Richardson et al. (2017) for detailed discussions on the nuisance function Alternatively, one may model

directly using say, a logistic regression and then obtain a plug-in estimate for

Equation (2) then motivates an inverse probability weighting estimator, defined as a solution to the following equation:

 n∑i=1Δi^ω(Zi,Xi)⎡⎢ ⎢ ⎢ ⎢⎣Di−n∑j=1{DjeψDjI(Yj≥Yi)^ω(Zj,Xj)}n∑j=1{eψDjI(Yj≥Yi)^ω(Zj,Xj)}⎤⎥ ⎥ ⎥ ⎥⎦=0, (4)

where Under suitable regularity conditions, one can show that the solution to (4) is asymptotically linear using standard empirical process theory. In practice, however, it may be computationally cumbersome to solve equation (4). We address this problem in the next subsection by proposing an alternative estimator that is available in closed form.

### 3.2 A closed-form representation of the causal hazard ratio

The development in this subsection is based on the observation that one may replace the term in (2) with an arbitrary measurable function while maintaining the unbiasedness of the resulting estimating equation.

###### Theorem 2:

Let

 ~U(ψ)=∫dN(y)ω(Z,X)[~g(D,y,ψ)−E{~g(D,y,ψ)eψDI(Y≥y)ω(Z,X)}E{eψDI(Y≥y)ω(Z,X)}]. (5)

We have

Without loss of generality we can write , so that (5) becomes

 ~U(ψ,g)=∫dN(y)ω(Z,X)[e−ψDg(D,y,ψ)−E{g(D,y,ψ)I(Y≥y)ω(Z,X)}E{eψDI(Y≥y)ω(Z,X)}]. (6)

To find a closed-form representation of , we only need to find such that it does not depend on and satisfies

 E{g(D,y,ψ)I(Y≥y)ω(Z,X)}=0.

All such functions may be represented as

 {m(D,y)E{I(Y≥y)ω(Z,X)}−E{m(D,y)I(Y≥y)ω(Z,X)}:m(D,y) is measurable}. (7)

Combining (6) and (7), we obtain the following class of closed-form representations of the causal hazard ratio.

###### Theorem 3:

We have

 exp(ψ)=E∫dN(y)(−D)ω(Z,X){m(1,y)γ1(y)−γm2(y)}E∫dN(y)(1−D)ω(Z,X){m(0,y)γ1(y)−γm2(y)}, (8)

where and are any measurable functions such that is well-defined,

A natural choice is that and . Under the modeling assumptions described in Section 3.1, (8) gives rise to the following estimator:

 ^ψ=logn∑i=1ΔiDi^ω1(Zi,Xi){^γ1,i−^γm02,i}n∑i=1Δi(1−Di)^ω1(Zi,Xi)^γm02,i,

where , , with

 ^γ1(y)=n−1n∑j=1I(Yj≥y)^ω1(Zj,Xj),^γm02(y)=n−1n∑j=1DjI(Yj≥y)^ω1(Zj,Xj).

### 3.3 Large sample properties

The estimator solves the equation , where and

 U(ψ,^θ)=∫[{^γ1(y)−^γm02(y)}D−(1−D)^γm02(y)]^ω(Z,X)e−ψDdN(y)

It follows from Van der Vaart (2000, Lemma 5.10) that is a consistent estimator of . We may further write

 PnU(ψ0,θ0)=PnUc(ψ0,θ0)+op(1/√n),

where

 Uc(ψ0,θ0)=∫[{γ1(y)−γm02(y)}D−(1−D)γm02(y)]ω(Z,X){e−ψDdN(y)−R(y)dΛ0(y)}

with and . The ’s are zero-mean terms that are independent and identically distributed. Following standard M-estimation theory, the influence function of is given by

 IF^ψ=−E{∂U(ψ,θ0)/∂ψ|ψ=ψ0}−1~U(ψ0,θ0),

where

with being the influence function of .

A consistent estimator of is

 Pn{ˆIF^ψ}2,

where is obtained from by replacing unknown quantities with their empirical counterparts.

## 4 Simulation studies

We now evaluate the finite sample performance of our proposed estimator . In our simulations, the baseline covariates include an intercept, a continuous variable

generated from an exponential distribution with mean

and . The unmeasured confounder is generated from an independent exponential distribution with mean . Conditional on and , the instrument and treatment are generated from the following models:

 P(Z=1∣X) =expit(−1/λ2+X2); δD(X) =tanh(γ0+γ1X2+γ2X3+γ3U); log(OPD(X)) =δ0+δ1U+δ2X2,

where . We let so that is bounded away from 0. Moreover, the first set of parameter values is compatible with the commonly used monotonicity assumption that almost surely, as is always positive. The censoring time was generated from an exponential distribution with mean . As discussed in detail in Richardson et al. (2017), our specifications of and give rise to a unique model on Visualizations of such a model can be found in Richardson et al. (2017, Supplementary Materials, upper panels of Figure 1). To make the observed data models compatible with a marginal structural Cox model with parameter , as explained in the supplementary material, we let the survival outcome be the unique root of the following function:

 f(t)=1λ1(λ1−β1t)1λ2(λ2−β2t)exp{(β1u+β2x−λ0eψd)t}−1+A,

where and

is uniformly distributed on the interval

To illustrate the bias due to unmeasured confounding, we also implement the marginal structural Cox model that only adjusts for measured confounders

All simulation results are based on 1000 Monte-Carlo runs of n = 1000 units each. Table 1 summarizes the simulation results for , in which case Assumption 6 holds. The proposed estimator has very small bias and achieves the nominal coverage rate in all the scenarios considered here, confirming our theoretical results. In contrast, the estimates produced by the marginal structural Cox model ignoring unmeasured confounders have absolute bias times 100 ranging from 2.26 to 3.40. Given a fixed data generating mechanism for the censoring time , the censoring rate only increases with slightly. In the supplementary material we show the corresponding results with in which case Assumption 6 fails to hold. The proposed estimator has significant bias only when the monotonicity condition fails and

. The 95% confidence intervals, however, are only slightly conservative.

## 5 Application to the Health and Lifestyle Study

In this section, we illustrate the proposed method by estimating the causal hazard ratio of smoking on survival. The negative associations between smoking and survival have been well established through numerous observational studies. These studies, however, may not collect all the confounding factors, leaving the causal interpretability of associational measures in question. The data we use in our analysis come from one such study, namely the Health and Lifestyle Study, which is a population-based prospective cohort study conducted in England, Scotland and Wales (Cox et al., 1987). The baseline survey was conducted in 1984-1985 and interviewed 9003 participants on their health statuses, attitudes to health and other measurements related to health and lifestyles. In June 2009, some 97.8% of these participants were flagged on the National Health Service Central Register at the Office for National Statistics Southport. This provides information on the final outcome, that is death.

In our analysis, we consider ever smoking as the exposure variable, and age at death as the outcome variable. We use an indicator that the mother of respondent ever smoked as an instrument. We adjust for household income, socioeconomic status, education, gender and an indicator that the mother of respondent die of lung/chest cancer. Among them, household income and socioeconomic status are included to make Assumption 1 plausible as they may be affected by the instrument and may affect the outcome, and the rest are included to make Assumption 6 plausible as they may modify the effect of the instrument on exposure. For illustrative purpose, we leave out observations with missing values, ending up with 6991 subjects in the analysis sample.

We first assess the plausibility of the instrumental relevance assumption using a logistic regression model with covariates as linear terms. Analysis results show that after adjusting for baseline covariates, the instrumental variable is highly associated with the smoking behavior (p-value 0.001), thus confirming the instrumental relevance assumption. We then apply the proposed estimator to estimate the causal hazard ratio of smoking on survival. In addition to the estimator described in Section 3, we report estimates from a crude Cox model that does not account for any confounding, an adjusted Cox model conditioning on the baseline covariates listed above, and a marginal structural Cox model that adjusts for the same set of covariates. We also implement MacKenzie et al. (2014)’s method using code provided in their appendix.

Table 2 summarizes the results. Not surprisingly, the naive and adjusted Cox regression model and the marginal structural Cox model all suggest that smoking is negatively associated with survival. Such findings, however, may be subject to bias from unmeasured confounding. Both MacKenzie et al. (2014)’s and the proposed method employ an instrumental variable based approach. MacKenzie et al. (2014)’s estimating-equation based method fails to produce a valid estimate, because their estimating equation does not admit a solution. Furthermore, their method does not allow for adjustment of baseline covariates. Mother smoking may fail to be a valid instrument if for example, it has an impact on household income, which may in turn affect survival of the respondent. In contrast, the instrument is more likely to be valid with the proposed approach as we assume an instrumental variable model conditional on the aforementioned baseline covariates. Our analysis suggests that smoking increases the hazard of death by 1.86 (1.31-2.66) folds.

We conclude this part with several caveats. First, as is the case with most prospective cohort studies, our results may suffer from survivor bias (Vansteelandt et al., 2017, 2018). This occurs because smokers and non-smokers have different lengths of survival, and thus have different probability of being alive at the time of study recruitment. Second, the validity of the proposed instrument relies on several untestable assumptions, which are hardly watertight (French and Popovici, 2011). For example, the exclusion restriction assumption might be violated because of health hazard caused by secondhand smoking, the independence assumption may be violated if there are genes for nicotine dependence that are shared between and and also affect mortality, and the no interaction assumption may be violated if we have not adjusted for all the modifiers for the instrumental effect on exposure. Third, we have assumed that the missing values are missing completely at random, which is hard to verify for this data set.

## 6 Discussion

In this article, we considered the identification and estimation of the marginal causal hazard ratio under the proportional hazards assumption. Under our framework, as shown in the supplementary material, the cumulative baseline hazard function may be identified through the following identity:

 Λ0(t)=∫t0E{ω(Z,X)dN(y)}E{ω(Z,X)eψDI(Y≥y)}. (9)

Identification formula (9) directly leads to a weighted version of the Breslow estimator (Breslow, 1972).

The proposed methods can also be applied to contexts beyond the proportional hazards framework. Consider the following extension of the marginal structural Cox model:

 λTd(t)=λT0(t)e\boldmath{ψ}′%\boldmath$ϕ$(t)d,

where is a polynomial spline basis of degree , and denotes transposition of . The analogy of estimating equation (2) in this context is

 U(\boldmath{ψ})=∫dN(y)ω(Z,X)⎡⎢ ⎢⎣D% \boldmath{ϕ}(y)−E{\boldmath{ϕ}(y)Dexp(\boldmath{ψ}′\boldmath{ϕ}(y)D)I(Y≥y)ω(Z,X)}E{exp(% \boldmath{ψ}′\boldmath{ϕ}(y)D)I(Y≥y)ω(Z,X)}⎤⎥ ⎥⎦.

The representation in (8) can be extended in a similar fashion.

So far we have assumed independent censoring as in Assumption 5. This assumption is plausible in our data application because the censoring is administrative, but may not hold in other studies. To account for possibly dependent censoring, one may weight the proposed estimating equations by inverse probability of censoring (Robins and Rotnitzky, 1992).

Our framework can also be extended in the following directions. First, in longitudinal studies, it is often the case that both the treatment and confounding variables are time dependent. It would be interesting to extend the proposed methods to estimate parameters in a marginal structural Cox model with time-varying treatments. Second, with an uncensored outcome, one can construct a locally efficient estimator for the population treatment effect of interest that is also multiply robust in the sense that such an estimator is consistent in the union of three different observed data models

(Wang and Tchetgen Tchetgen, 2018). We leave as future work to derive a locally semiparametric efficient estimator for the causal hazard ratio under our identification assumptions.

## Acknowledgments

Wang and Tchetgen Tchetgen were supported by the National Institutes of Health. Wang is also affiliated with the Department of Computer and Mathematical Sciences, University of Toronto Scarborough. Vansteelandt is also affiliated with the Department of Medical Statistics at the London School of Hygiene and Tropical Medicine, UK.

## Supplementary material

Supplementary material available at Biometrika online includes proofs of theorems and propositions in the paper, as well as additional simulation results.

## References

• Abadie (2003) Abadie, A. (2003). Semiparametric instrumental variable estimation of treatment response models. Journal of Econometrics, 113(2):231–263.
• Angrist et al. (1996) Angrist, J. D., Imbens, G. W., and Rubin, D. B. (1996). Identification of causal effects using instrumental variables. Journal of the American Statistical Association, 91:444–455.
• Aronow and Carnegie (2013) Aronow, P. M. and Carnegie, A. (2013). Beyond LATE: Estimation of the average treatment effect with an instrumental variable. Political Analysis, 21(4):492–506.
• Baker (1998) Baker, S. G. (1998). Analysis of survival data from a randomized trial with all-or-none compliance: estimating the cost-effectiveness of a cancer screening program. Journal of the American Statistical Association, 93(443):929–934.
• Breslow (1972) Breslow, N. E. (1972). Contribution to discussion of paper by DR Cox. J. Roy. Statist. Soc., Ser. B, 34:216–217.
• Cheng et al. (2009) Cheng, J., Qin, J., and Zhang, B. (2009). Semiparametric estimation and inference for distributional and general treatment effects. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 71(4):881–904.
• Choi and O’Malley (2017) Choi, J. and O’Malley, A. J. (2017). Estimating the causal effect of treatment in observational studies with survival time end points and unmeasured confounding. Journal of the Royal Statistical Society: Series C (Applied Statistics), 66(1):159–185.
• Clarke and Windmeijer (2012) Clarke, P. S. and Windmeijer, F. (2012). Instrumental variable estimators for binary outcomes. Journal of the American Statistical Association, 107(500):1638–1652.
• Cox et al. (1987) Cox, B., Blaxter, M., Buckle, A., Fenner, N., Golding, J., Gore, M., Huppert, F., Nickson, J., Roth, S. M., Stark, J., et al. (1987). The health and lifestyle survey. Preliminary report of a nationwide survey of the physical and mental health, attitudes and lifestyle of a random sample of 9,003 British adults. Health Promotion Research Trust.
• Cox (1972) Cox, D. R. (1972). Regression models and life tables (with discussion). Journal of the Royal Statistical Society, 34:187–220.
• Cuzick et al. (2007) Cuzick, J., Sasieni, P., Myles, J., and Tyrer, J. (2007). Estimating the effect of treatment in a proportional hazards model in the presence of non-compliance and contamination. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 69(4):565–588.
• Fleming and Harrington (2011) Fleming, T. R. and Harrington, D. P. (2011). Counting processes and survival analysis, volume 169. John Wiley & Sons.
• Frandsen (2015) Frandsen, B. R. (2015). Treatment effects with censoring and endogeneity. Journal of the American Statistical Association, 110(512):1745–1752.
• French and Popovici (2011) French, M. T. and Popovici, I. (2011). That instrument is lousy! In search of agreement when using instrumental variables estimation in substance use research. Health Economics, 20(2):127–146.
• Goldberger (1972) Goldberger, A. S. (1972). Structural equation methods in the social sciences. Econometrica, 40(6):979–1001.
• Hernán et al. (2000) Hernán, M. Á., Brumback, B., and Robins, J. M. (2000). Marginal structural models to estimate the causal effect of zidovudine on the survival of HIV-positive men. Epidemiology, 11(5):561–570.
• Hernán and Robins (2006) Hernán, M. A. and Robins, J. M. (2006). Instruments for causal inference: An epidemiologist’s dream? Epidemiology, 17(4):360–372.
• Li et al. (2015) Li, J., Fine, J., and Brookhart, A. (2015). Instrumental variable additive hazards models. Biometrics, 71(1):122–130.
• Loeys and Goetghebeur (2003) Loeys, T. and Goetghebeur, E. (2003). A causal proportional hazards estimator for the effect of treatment actually received in a randomized trial with all-or-nothing compliance. Biometrics, 59(1):100–105.
• MacKenzie et al. (2014) MacKenzie, T. A., Tosteson, T. D., Morden, N. E., Stukel, T. A., and O’Malley, A. J. (2014). Using instrumental variables to estimate a Cox’s proportional hazards regression subject to additive confounding. Health Services and Outcomes Research Methodology, 14(1-2):54–68.
• Martinussen et al. (2017) Martinussen, T., Nørbo Sørensen, D., and Vansteelandt, S. (2017). Instrumental variables estimation under a structural cox model. Biostatistics.
• Neyman (1923) Neyman, J. (1923). Sur les applications de la thar des probabilities aux experiences Agaricales: Essay des principle. English translation of excerpts by Dabrowska, D. and Speed, T. (1990). Statistical Science, 5:463–472.
• Nie et al. (2011) Nie, H., Cheng, J., and Small, D. S. (2011). Inference for the effect of treatment on survival probability in randomized trials with noncompliance and administrative censoring. Biometrics, 67(4):1397–1405.
• Pearl (2009) Pearl, J. (2009). Causality. Cambridge, England: Cambridge University Press.
• Richardson and Robins (2013) Richardson, T. S. and Robins, J. M. (2013). Single world intervention graphs (SWIGs): A unification of the counterfactual and graphical approaches to causality. Center for the Statistics and the Social Sciences, University of Washington Series. Working Paper, 128.
• Richardson et al. (2017) Richardson, T. S., Robins, J. M., and Wang, L. (2017). On modeling and estimation for the relative risk and risk difference. Journal of the American Statistical Association, 112(519):1121–1130.
• Robins and Greenland (1996) Robins, J. M. and Greenland, S. (1996). Identification of causal effects using instrumental variables: Comment. Journal of the American Statistical Association, 91(434):456–458.
• Robins and Rotnitzky (1992) Robins, J. M. and Rotnitzky, A. (1992). Recovery of information and adjustment for dependent censoring using surrogate markers. In AIDS Epidemiology, pages 297–331. Springer.
• Robins and Tsiatis (1991) Robins, J. M. and Tsiatis, A. A. (1991). Correcting for non-compliance in randomized trials using rank preserving structural failure time models. Communications in Statistics-Theory and Methods, 20(8):2609–2631.
• Rubin (1974) Rubin, D. B. (1974). Estimating causal effects of treatments in randomized and nonrandomized studies. Journal of Educational Psychology, 66(5):688.
• Rubin (1980) Rubin, D. B. (1980). Comment. Journal of the American Statistical Association, 75(371):591–593.
• Tan (2006) Tan, Z. (2006). Regression and weighting methods for causal inference using instrumental variables. Journal of the American Statistical Association, 101(476):1607–1618.
• Tchetgen Tchetgen et al. (2015) Tchetgen Tchetgen, E. J., Walter, S., Vansteelandt, S., Martinussen, T., and Glymour, M. (2015). Instrumental variable estimation in a survival context. Epidemiology (Cambridge, Mass.), 26(3):402–410.
• Van der Vaart (2000) Van der Vaart, A. W. (2000). Asymptotic statistics, volume 3. Cambridge university press.
• Vansteelandt et al. (2017) Vansteelandt, S., Dukes, O., and Martinussen, T. (2017). Survivor bias in mendelian randomization analysis. Biostatistics, (just-accepted).
• Vansteelandt et al. (2018) Vansteelandt, S., Walter, S., and Tchetgen Tchetgen, E. (2018). Eliminating survivor bias in two-stage instrumental variable estimators. Epidemiology (Cambridge, Mass.), (just-accepted).
• Wang and Tchetgen Tchetgen (2018) Wang, L. and Tchetgen Tchetgen, E. (2018). Bounded, efficient and multiply robust estimation of average treatment effects using instrumental variables. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 80:531–550.
• Wooldridge (2010) Wooldridge, J. M. (2010). Econometric analysis of cross section and panel data. Cambridge, MA: MIT press.
• Wright and Wright (1928) Wright, P. G. and Wright, S. (1928). The tariff on animal and vegetable oils. New York: The Macmillan Co.
• Yu et al. (2015) Yu, W., Chen, K., Sobel, M. E., and Ying, Z. (2015). Semiparametric transformation models for causal inference in time-to-event studies with all-or-nothing compliance. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 77(2):397–415.

## 1 Proof of Theorem 1

Let Note that

where

 ∑z=0,1(2z−1)E{dN(y)D|Z=z,X}(% consistency) =∑z=0,1(2z−1)E{dN1(y)D|Z=z,X} =∑z=0,1(2z−1)EU∣XE{dN1(y)D|X,U,Z=z}(Z\rotatebox[origin=c]90.0$⊨$U∣X) =∑z=0,1(2z−1)EU∣X[E{dN1(y)|X,U,Z=z}E{D|X,U,Z=z}](D\rotatebox[origin=c]90.0$⊨$T(1),C(1)∣Z,X,U) =∑z=0,1(2z−1)EU∣X[E{dN1(y)|X,U}E{D|X,U,Z=z}](Z\rotatebox[origin=c]90.0$⊨$T(1),C(1)∣U,X) =δD(X)EU∣X[E{dN1(y)|X,U}](due to (1)) =δD(X)E{dN1(y)|X}

so that

 E{dN(y)2Z−1f(Z|X)δD(X)D} =E{dN1(y)}=λT1(y)SY1(y)dy.

Note similar to the proof above, we obtain that for any measurable function ,

 E{H(Y(1))2Z−1f(Z|X)δD(X)D} =E{H(Y(1))}; (S1) E{H(Y(0))2Z−1f(Z|X)δD(X)(1−D)} =−E{H(Y(0))}; (S2) E{H(Y)2Z−1f(Z|X)δD(X)} =E{H(Y(1))−H(Y(0))}. (S3)

We shall use these equations repeatedly in the following proof.

Due to (S3),

 E{dN(y)2Z−1f(Z|X)δD(X)}=E{dN1(y)−dN0(y)}={λT1(y)SY1(y)−λT0(y)SY0(y)}dy. (S4)

Due to (S1),

 E{DeψDI(Y≥y)2Z−1f(Z|X)δD(X)} =eψE{DI(Y(1)≥y)2Z−1f(Z|X)δD(X)} =eψE{I(Y(1)≥y)}=eψSY1(y).

Finally, due to (S1) and (S2),

 E{eψDI(Y≥y)2Z−1f(Z|X)δD(X)} =eψE{DI(Y(1)≥y)2Z−1f(Z|X)δD(X)}+E{(1−D)I(Y(0)≥y)2Z−1f(Z|X)δD(X)} =eψE{I(Y(1))≥y}−E{I(Y(0)≥y)}=eψSY1(y)−SY0(y). (S5)

Therefore

 E{U(ψ)} =∫⎡⎢ ⎢ ⎢ ⎢ ⎢ ⎢⎣E{dN(y)h(D)2Z−1f(Z|X)δD(X)D}−E{dN(y)h(D)2Z−1f(Z|X)δD(X)}E{DeψDh(D)I(Y≥y)2Z−1f(Z|X)δD(X)}E{eψDh(D)I(Y≥y)2Z−1f(Z|X)δD(X)}⎤⎥ ⎥ ⎥ ⎥ ⎥ ⎥⎦ =∫dy[λT1(y)SY1(y)h(1)−{h(1)λT1(y)SY1(y)−h(0)λT0(y)SY0(y)}h(1)eψSY1(y)h(1)eψSY1(y)−h(0)SY0(y)] (S6) =0,

where as long as and . Furthermore, we have

 ∂E{U(ψ)}∂ψ =−∫eψ{h(1)λT1(y)SY1(y)−h(0)λT0(y)SY0(y)}h(1)SY1(y)h(0)SY0(y){h(1)eψSY1(y)−h(0)SY0(y)}2dy,

the sign of which does not depend on and is non-zero as long as and . Hence the solution to equation is unique.

## 2 Proof of Theorem 2

The proof is very similar to the proof of Theorem 1, except that (S6) now becomes

 ∫dy[λT1(y)SY1(y)~g(1)h(1)−λT0(y)SY0(y)~g(0)h(0)− {h(1)λT1(y)SY1(y)−h(0)λT0(y)SY0(y)}{SY1(y)eψh(1)~g(1)−SY0(y)h(0)~g(0)}h(1)eψSY1(y)−h(0)SY0(y)]=0,

in which for notational convenience, we write for .

## 3 Proof that our data generating mechanism marginalizes to a marginal structural Cox model

Let We specify our observed survival model from the following formulation:

 ST∣D,L(t∣D=d,L=l) =ST∣D,L(t∣D=d,L=l)ST∣D,L(t∣D=d,L=l0)∫ST∣D,L(t∣D=d,L=l)ST∣D,L(t∣D=d,L=l0)dFL(l)∫ST∣D,L(t∣D=d,L=l)dFL(l) =ST∣D,L(t∣D=d,L=l)ST∣D,L(t∣D=d,L=l0)∫ST∣D,L(t∣D=d,L=l)ST∣D,L(t∣D=d,L=l0)dFL(l)STd(t),

where the second equality is an application of the g-formula (Robins, 1986).

In our simulation, we let