Adjusting for bias introduced by instrumental variable estimation in the Cox Proportional Hazards Model

11/07/2017 ∙ by Pablo Martinez-Camblor, et al. ∙ Dartmouth College 0

Instrumental variable (IV) methods are widely used for estimating average treatment effects in the presence of unmeasured confounders. However, the capability of existing IV procedures, and most notably the two-stage residual inclusion (2SRI) procedure recommended for use in nonlinear contexts, to account for unmeasured confounders in the Cox proportional hazard model is unclear. We show that instrumenting an endogenous treatment induces an unmeasured covariate, referred to as an individual frailty in survival analysis parlance, which if not accounted for leads to bias. We propose a new procedure that augments 2SRI with an individual frailty and prove that it is consistent under certain conditions. The finite sample-size behavior is studied across a broad set of conditions via Monte Carlo simulations. Finally, the proposed methodology is used to estimate the average effect of carotid endarterectomy versus carotid artery stenting on the mortality of patients suffering from carotid artery disease. Results suggest that the 2SRI-frailty estimator generally reduces the bias of both point and interval estimators compared to traditional 2SRI.



There are no comments yet.


page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Notation and models

Conventionally, the right-censored framework assumes the sample , where and ( stands for the usual indicator function, taking value if and otherwise), and and are the censoring and the survival times for the th subject (), respectively. Let denote the survival function of . The Cox proportional hazard model is given by


where is the baseline hazard function,

is a random variable representing the study treatment,

is a random vector of exogenous measured predictors and

is a random vector of unmeasured predictors.

The goal is to estimate the value of ; that is, the average change in the risk of an individual caused by a change in the received treatment.

We assume that an individual’s received treatment is the result of the selective process depicted by the equation,


with a measured random vector and a random vector of unmeasured variables that may be correlated with unmeasured variables in . The term is an independent random component representing the difference between the variable and the predicted values obtained in the linear model. Note that the uncontrolled relationship between the unmeasured covariates affecting survival and treatment selection operate through the relationship between and and/or . If , the survival model just contains unmeasured covariates, , that are not unmeasured confounders.

If the random vector satisfies:

  1. ,

  2. (exclusion restriction assumption),

  3. (randomization assumption),

then can be considered an instrumental variable. The strength of the instrument is reflects the strength of the relationship between and . In a randomized trial with perfect compliance, assigned treatment is a perfect instrument. Assumptions , and can be reformulated and combined with the stable unit treatment value assumption, SUTVA, and the monotonicity assumption (Hernán and Robins, 2006) between the treatment to complete the conditions under which the IV identifies the causal effect of non-parametrically (i.e., without relying on (1) and (2)). Common instruments include prior institutional affinity for using a particular procedure, geographic region of residence, an individual’s differential access to certain treatments, and an individuals genes (aka Mendelian randomization, Thanasassoulis and O’Donnell (2009)).

2 Proposed methodology

The proposed methodology considers a standard first stage in which the relationship among the treatment, , the instrument variable, , and the measured confounding, , is estimated by any consistent method. We use simple linear regression models with standard least squares estimation but more flexible procedures could also be implemented. The first stage procedure is:

  1. From the data, estimate the parameters to obtain the predicted values

    Then, compute the residuals, .

It is worth noting that, under model (1) and assuming , is a consistent estimator of . If , estimates . Almost sure convergence of to is not guaranteed. The residual contains all the information about the unmeasured vector related with the treatment assignment and unrelated with ; that is, all the available information about unmeasured confounding is contained in the residuals provided by the model (1). However, also contains white noise pertaining to idiosyncratic or purely random factors affecting an individual’s treatment selection, which corresponds to the difference between the unmeasured covariates in the two models, and , and to the independent random term in model (2). We conjecture that the component of

due to white noise can be handled by specifying an individual

frailty in the outcome model in order to allow to more fully be able to perform its intended task of controlling for . The proposed second stage is:

  1. Estimate the Cox proportional hazard regression with individual frailty:

    where is the individual frailty term. A distribution should be specified for (e.g., log-Gaussian, Gamma). The parameter estimate of that results from this procedure is denoted .

Standard algorithms for estimating Cox models with frailties may be used to implement the procedure. For example, Therneau et al. (2003) proved that maximum likelihood estimation for the Cox model with a Gamma frailty can be accomplished using a general penalized routine, and Ripatti and Palmgren (2000) derived a similar argument for the Cox model with a Gaussian frailty.

2.1 Asymptotic properties of

We derive the asymptotic distribution of the 2SRI-frailty estimator for the case in which . If a similar derivation can be performed by decomposing and into common and orthogonal terms and making standard reliability assumptions on the distributions of these terms.

By adapting the convergence results for Cox’s partial likelihood (see, for instance, Theorem 5.3.1 in Kalbfleisch and Prentice (2002)) we obtain the following convergence results.

Theorem. Assume the causal models


with the random variables (the treatment), (measured covariate) and (unmeasured covariate). In addition, assume that

is normally distributed, that

is independently normally distributed random noise, and that (the instrument) is a random variable satisfying -. Then, if the censoring time, , satisfies , we have the weak convergence,


where (

stands for the variance operator) can be consistently estimated from the survival and at-risk counting processes.

Proof. In absence of the frailty term, it is well-known that the estimator of that maximizes the Cox-model partial likelihood function obeys the asymptotic law


where , and is the ( stands for dimension of the vector ) information matrix Kalbfleisch and Prentice (2002), which can be consistently estimated by .

In the presence of an individual frailty, different estimation methods have been proposed. Particularly, Ripatti and Palmgren (2000) and Vaida and Xu (2000)

studied the case of multiplicative log-normal distributed frailties. The proposed methodology obtains maximun-likelihood estimates of the regression parameters, the variance components and the baseline hazard, as well as empirical Bayes estimates of the random effects (frailties). Therefore, it suffices to prove that the second stage of our proposed 2SRI-frailty procedure is a Cox proportional hazard model with a gaussian frailty term in which the coefficient related with the treatment is unchanged from the original model (i.e.,


Given the causal equation in (4) and , yields

Hence, . If , the linear part of the model in (3) can be rewritten as,

where , and . Due to and being independent normally distributed variables, is also asymptotically independent and normally distributed. If , then

with , and where, in this case, , and . Therefore, is asymptotically independent and normally distributed due to being independent and normally distributed. Hence, the survival model is given by

which has the form of a Cox proportional hazards model with frailty . Therefore, if is the estimator resulting from step , invoking the censoring time assumptions and using the convergence of the partial maximum-likelihood method given a consistent method of estimating the product of the baseline risk and the frailty (Ripatti and Palmgren, 2000; Vaida and Xu, 2000), it follows that


with the component in the matrix corresponding to .

Remark. Normality of is required only when . In this case, the survival model does not contain unmeasured confounders, just unmeasured covariates. Such white noise can be omitted in standard linear models, but not in Cox regression models where it underestimates the treatment effect. The key point is that the first stage residual adds individual variability (a frailty) in the Cox model estimated in the second-stage.

3 Monte Carlo Simulation Study

To evaluate the behavior of the proposed methodology in finite samples, we conducted a range of Monte Carlo simulations. We found that, beyond the expected effect on precision of estimation, neither the baseline shape of the survival times nor the censorship distribution have any meaningful effect on the observed results. Therefore, we only show results when the baseline survival times follow a Weibull distribution with shape parameter two and scale parameters coherent with the proportional hazard assumption; that is, the scale parameter equates to . Here is the target. We subtract from , as opposed to drawing from a distribution with mean 0, to ensure that we obtain realistic survival times with binary without needing to alter the intercept. and denote the measured and unmeasured confounders, respectively. Both the measured () and unmeasured () covariates follow independent standard normal distributions, . Censoring was independently drawn from a Weibull distribution such that the expected censorship was 20%. Treatment assignment is based on the linear equation , where is the instrument, is the unmeasured covariate, and is the random noise. We set for a continuous exposure and for a binary exposure. All of , , and are drawn from independent standard-normal distributions. Notice that, after fixing the rest of the parameters, increasing yields an instrumental variable of lesser quality. Sample size was fixed at .

3.1 All Omitted Covariation is Unmeasured Confounding

We first suppose , and follow possibly correlated standard normal distributions. That is, the endogenous variable is continuous and there are no unmeasured predictors of survival time unrelated with treatment selection (i.e., while the true Cox model of the survival times may include a shared covariate with the treatment selection process, it does not include a frailty).

Figure 3 shows the median of the bias observed in 2000 Monte Carlo iterations for a stronger, , and weaker, , instrument. The Naive Cox model, which ignores the presence of omitted covariates, only performed well for (there are no omitted covariates and the Cox model is correct). The proposed algorithm, 2SRI-frailty (2SRI-F), reduced bias the most when the omitted covariates had strong effects. When the presence of unmeasured confounding was weaker ( close to zero), and when there was no effect of the treatment, , both 2SRI and 2SRI-F obtained similar results. Median-bias appeared to be invariant to the strength of the instrument.

Figure 3: Median bias for Weibull(, 2) times and , where , , and follow independent standard normal distributions. Gray-dotted, naive Cox model (omitted covariate is ignored); black-dashed 2SRI, procedure; blue-continuous, 2SRI-F (2SRI plus Gaussian frailty). Continuous black thin line stands for the zero-bias situation.
Naive 2SRI 2SRI-F Naive 2SRI 2SRI-F
0.466 0.671 0.871 0.019 0.867 0.891
0.000 0.931 0.931 0.000 0.917 0.922
0.000 0.670 0.884 0.000 0.868 0.898
0.952 0.951 0.912 0.952 0.851 0.912
0.940 0.948 0.949 0.940 0.948 0.949
0.943 0.947 0.915 0.943 0.947 0.915
0.000 0.617 0.857 0.000 0.861 0.874
0.000 0.935 0.940 0.000 0.902 0.907
0.396 0.685 0.992 0.012 0.878 0.898
Table 1: Observed coverage of 95% confidence intervals for the Cox model treatment effect covariate, Naive, 2SRI algorithm and the proposed 2SRI algorithm with gaussian individual frailty (2SRI-F).

Table 1 reports the coverage of the 95% confidence intervals computed from the standard asymptotic variance obtained from the second-stage Cox regression models of the respective procedures, ignoring the first stage variability.The proposed algorithm achieved coverage close to the nominal level in all cases, suggesting that it will be able to be implemented easily in practice. In contrast, the coverage of the naive Cox methods and the basic 2SRI algorithm was poor. As expected, the naive method performed correctly for , the case when the model is correct. As is well-known, two-stage instrumental variable methods lead to estimators () with the degree of variance inflation depending on the strength of the instrumental variable. We found that the length of the 95% confidence intervals ranged between 0.09 and 0.17 for the naive models, between 0.20 and 0.23 for 2SRI and between 0.21 and 0.26 for the proposed 2SRI-F method, respectively, for both and . The fact that the 2SRI-F procedure imposes a greater amount of inflation compared to 2SRI is helpful in terms of its ability to maintain the nominal level of coverage. Crucially, it appears that the 2SRI-F’s assumption of the distribution of the frailty is helpful in addressing bias and does not suffer from the excessive and inappropriate gains in precision that accompany many procedures with parametric components.

3.2 Omitted Covariation is a Mixture of Unmeasured Confounding and a Pure Individual Frailty

Figure 4 depicts the observed median bias for the previous scenario for and cor. Note that implies that the survival model does not contain unmeasured confounders, only unmeasured covariates, while implies that all omitted covariation manifests as unmeasured confounding (i.e., is also related to treatment assignment). In the case, it is reassuring that the 2SRI and the 2SRI-F procedures perform nearly as well as the naive Cox regression model, the true model in this scenario. The advantage of using 2SRI-F versus 2SRI was larger for close to zero, which makes sense as the pure frailty variation is at its maximum, whereas at values close to the frailty has all but disappeared.

Figure 4: Median bias for Weibull(, 2) times and , , , and follow standard normal distributions. Correlation between and is . Gray-dotted, naive Cox model with a Gaussian frailty term (omitted covariate is ignored but a Gaussian frailty is included in the model); black-dashed 2SRI, procedure; blue, 2SRI-F (2SRI plus Gaussian frailty). Continuous black thin line stands for the zero-bias situation.

In order to check the robustness of the recommended gaussian frailty with respect to the unmeasured covariates distribution, we study the case where the unmeasured covariates, and , are not normally distributed. In particular, we considered the following scenarios:

M-I.     .


M-III. .


where , and follow independent centered Gamma(1,1) and , and follow independent standard normal distribution. Note that parameter determines both the distribution and the relationship between and and it is chosen in order to keep constant the marginal variance. Figure 5 shows the median of the bias of 2SRI-F algorithm observed in 2000 Monte Carlo iterations under the previous models with . Results suggest minimal impact of the unmeasured covariates distribution. As expected, when the frailty has the assumed distribution the bias is smaller but, crucially, observed biases were always smaller than under the 2SRI procedure.

Figure 5: Median bias for Weibull(, 2) times and , and follow standard normal distributions and and following previous models always using the 2SRI-F algorithm. Continuous black thin line stands for the zero-bias situation.

3.3 Nonlinear Treatment Selection Model: Binary Exposure

The second scenario supposes , a binary exposure. Because other values of produced similar results, we only report results for which the correlation between and was fixed at .

Figure 6 depicts the median bias over 2000 Monte Carlo iterations when was directly estimated from Cox regression with Gaussian frailty, (due to the presence of an unmeasured covariate unrelated with treatment assignment, this model is the correct model in the absence of unmeasured confounding), the 2SRI procedure, and 2SRI-F. A stronger () and a weaker () scenario was considered for the instrument. Not surprisingly, ignoring the presence of the frailty and estimating a standard Cox regression model results in larger bias, even in the absence of an unmeasured confounder. The 2SRI algorithm helps us to control just the part of the bias related with the treatment assignment but also fails to handle the frailty and its performance suffers as a result. The naive Cox model with a frailty performs much better than both the naive Cox model with no frailty and 2SRI, implying that accounting for the frailty may be more important than dealing with unmeasured confounding. However, the proposed 2SRI-F methodology produces a yet further reduction in bias, to close to zero in all scenarios, which we conjecture is due to separating the idiosyncratic and confounding effects of . These results reveal that there is clear benefit to be gained in practice from using 2SRI-F as an IV procedure for time-to-event data.

Figure 6: Median bias assuming Weibull(, 2) survival times and , with , , , and following standard normal distributions. Correlation between and is 1/2. Legend: Gray-dotted, naive Cox model with Gaussian frailty (omitted covariate is ignored but a Gaussian frailty is included); black-dashed 2SRI, procedure; blue, 2SRI-F (2SRI plus Gaussian frailty). Continuous black thin line stands for the zero-bias situation.
Naive-F 2RSI 2RSI-F Naive-F 2RSI 2RSI-F
0.325 0.885 0.943 0.102 0.916 0.947
0.189 0.936 0.944 0.041 0.944 0.947
0.158 0.885 0.938 0.037 0.925 0.946
0.941 0.954 0.949 0.936 0.944 0.943
0.946 0.943 0.947 0.950 0.949 0.947
0.936 0.946 0.944 0.939 0.905 0.947
0.158 0.883 0.936 0.036 0.943 0.942
0.212 0.940 0.946 0.038 0.927 0.942
0.321 0.880 0.933 0.099 0.914 0.948
Table 2: Observed coverage of 95% confidence intervals for the Cox model treatment effect covariate, Naive, 2SRI algorithm and the proposed 2SRI algorithm with gaussian individual frailty (2SRI-F).

Both the 2SRI and the proposed algorithms achieved coverage close to the nominal level in all scenarios; the naive Cox model with Gaussian frailty (Naive-F) understandably yields good results only when (Table 2). As expected, the strength of the instrument affects the confidence interval widths. The width of the 95% confidence interval ranged between 0.44 and 0.57 for the Naive-F models. Under interval estimatior width ranged between 0.86 and 0.88 and between 0.91 and 1.07 for the 2SRI and 2SRI-F, respectively. For the widths ranged between 1.23 and 1.31 and between 1.30 and 1.53 for 2SRI and 2SRI-F, respectively. These results are consistent with the results for continuous exposures; the variance inflation under the 2SRI-F procedure exceeds that under 2SRI which in-turn exceeds that under Naive-F.

4 Real-world application: The Vascular Quality Initiative dataset

We apply 2SRI-frailty to nationwide data from the Vascular Quality Initiative (VQI) ( on patients diagnosed with carotid artery disease (carotid stenosis). These data contain comprehensive information on all patients suffering from carotid stenosis and is continually updated over time to facilitate determination of the best procedure or treatment approach to use on average and to determine which type of patients benefit the most from each procedure. However, the data are exposed to a plethora of selection biases raising concerns that naive analyses will yield biased results. Because the outcomes of most interest are events such as stroke or death that can occur at any point during follow-up, these data are ideal for application of the 2SRI-frailty procedure.

We employed 2SRI-F to estimate the comparative effectiveness of carotid endarterectomy (CEA) versus carotid angioplasty and stenting (CAS), the two surgical procedures used to intervene on patients with carotid stenosis. The data consist of 28712 patients who received CEA and 8117 who received CAS, between 15 and 89 years of age, over 2003-2015. During follow-up, there were 3955 and 807 deaths in the CEA and CAS groups, respectively. Table 3

shows descriptive statistics for the measured covariates by procedure.

n=28,712 n=8,117
Age, meansd 70.29.4 69.110.3
Male, n (%) 17,180 (59.8) 5,119 (63.1)
Race, n (%)
blaWhite 27,033 (94.2) 7,382 (90.9)
blaBlack 922 (3.2) 433 (5.3)
blaOther 757 (2.6) 302 (3.8)
Elective, n (%) 24,906 (86.7) 6,587 (81.1)
Symptomatic, n (%) 11,168 (38.9) 4,282 (52,7)
TIA or amaurosis, n (%) 6,405 (22.3) 2,001 (24.6)
Stroke, n (%) 4,763 (16.6) 2,281 (28.1)
Hypertension, n (%) 25,452 (88.6) 7,235 (89.1)
Smoking History, n (%) 22,098 (77.0) 6,168 (76.0)
Positive Stress Test, n (%) 2,655 (9.2) 677 (8.3)
Coronary Disease, n (%) 8,586 (29.9) 2,790 (34.4)
Heart Failure, n (%) 2,669 (9.3) 1,215 (15.0)
Diabetes, n (%) 9,749 (33.9) 2,942 (36.2)
COPD, n (%) 6,229 (21.7) 2,083 (25.7)
Renal Insufficiency, n (%) 1,649 (5.7) 446 (5.5)
HD, n (%) 263 (0.9) 114 (1.4)
Prior ipsilateral CEA, n (%) 4,472 (15.6) 2,857 (35.2)
Antiplatelet therapy, n (%)
blaAspirin 23,960 (83.4) 6,932 (85.4)
blaP2y12 inhibitor 6,980 (24.3) 6,173 (76.0)
Beta-blocker, n (%) 18,269 (63.6) 4,602 (56.7)
Statin, n (%) 22,418 (78.1) 6,408 (78.9)
Table 3: Descriptive statistics for measured covariates both overall and by carotid endarterectomy (CES) versus carotid angioplasty and stenting (CAS) recipients.

Figure 7 depicts two Kaplan-Meier survival curves: crude (left) and adjusted by patient age, gender, ethnicity, race. type of surgery (elective/not elective), symptons (yes/no), hypertension diabetes, smoking history (yes/no), positive stress test, coronary disease, heart failure, diabetes, COPD, renal insufficiency, dialysis status (HD), prior ipsolateral CEA, and the use of antiplatelet therapy, beta-blokers and statin by using the weighted inverse propensity procedure MacKenzie et al. (2012). The crude hazard ratio (HR) comparing CEA to CAS was 0.719 (95% CI of (0.666; 0.777)). Adjusted HR was 0.693 (0.633; 0.760). The last HR is slightly modified when a frailty term is included: 0.685 (0.624; 0.753), and 0.676 (0.613; 0.745) for the Gaussian and Gamma cases, respectively.

Figure 7: Kaplan-Meier estimations for both CEA and CAS group with 95% confidence bands: crude (left) and adjusting by the covariates described in table 3 using the weighted inverse propensity procedure (right).

As the instrumental variable we used the center level frequency of CEA versus CAS procedures over the twelve months prior to the current patient; that is, total CEA divided by total of CEA and CAS procedures in the twelve months prior to the current patient. This variable is justified as an instrument due to: 1) hospitals that performed a high relative amount of a certain procedure in the past are likely to keep doing so; 2) there should be no effect of the relative frequency of CEA vs CAS on a patient’s outcome except through its effect on treatment choice for that patient; 3) we know of no factors that would influence both this frequency and a patient’s outcome. Reasons 2) and 3) are contingent on adjusting for the total number of CEA and CAS procedures performed at the center over the past 12 months.

On the VQI data the IV is highly associated with treatment choice. The probability that a randomly selected subject undergoing CEA has a larger value of the instrument than a randomly selected subject undergoing CAS, was 0.881 (95% confidence interval of (0.876; 0.885)). This IV was unrelated with all of the measured confounders suggesting anecdotally that it may also be uncorrelated with any unmeasured confounders. Hence, it is reasonable to assume that the relationship of the instrument with mortality is solely due to its relationship with the treatment. Figure 8 (left side) shows the histogram of in both CEA and CAS groups, at right, we show the boxplot for the IV by surgical procedure.

Figure 8: Histograms for the instrument variable (left), , and boxplot (right), by received treatment.
HR (95% CI)
Crude 0.719 (0.666; 0.777)
Cox model (Naive) 0.693 (0.633; 0.760)
Naive - frailty (gaussian) 0.685 (0.624; 0.753)
Naive - frailty (gamma) 0.676 (0.613: 0.745)
2SRI 0.901 (0.737; 1.100)
2SRI - frailty (gaussian) 0.887 (0.724: 1.087)
2SRI - frailty (gamma) 0.882 (0.716; 1.086)
Table 4: Hazard ratios and 95% confidence intervals by estimation method.

The treatment effect almost disappears when 2SRI is applied on the dataset: HR of 0.901 with a 95% confidence interval (0.737; 1.100). When the proposed 2SRI-frailty algorithm is used a similar result obtains: a HR of 0.887 (

) with a 95% confidence interval (0.724; 1.087). Similar results were also obtained under a gamma distributed frailty instead of the gaussian frailty. Table

4 shows the hazard ratios and 95% confidence intervals.

5 Discussion

Instrumental variables methods are often used to account for unmeasured confounders. Although these methods have been widely studied in a variety of situations, their suitability for estimating Cox proportional hazard models is unclear. It is well-known that, in this case, model misspecification can produce bias even when the omitted variables are unrelated with the treatment assignment (Aalen et al., 2015); that is, when they only affect the survival time. As suggested by our structural argument in the Introduction, an individual frailty appears able to solve this problem. We showed that the presence of idiosyncratic variation affecting treatment selection may induce a frailty in the instrumented survival time model even if there is no frailty in the original survival model. In practice, the most likely scenario is that both a true frailty and unmeasured confounding factors affect survival. For these reasons, we were motivated to develop and evaluate an IV procedure that, in the second stage, incorporates a frailty term.

Because the Cox model is nonlinear, our base strategy for dealing with unmeasured confounders was to use the two-stage residual inclusion algorithm, 2SRI, adapted to the Cox model. As noted above, even when the true survival model does not contain omitted covariates, the 2SRI procedure induces a frailty in the second-stage Cox regression model from the inclusion of the residuals computed in the first stage. To account for this phenomenon, we added an individual frailty in the second-stage (instrumented) statistical model. Under standard reliability conditions, we proved the asymptotic consistency of the estimator defined under our 2SRI-F procedure for the case when the univariate frailty distribution is correctly assumed to be Gaussian.

Monte Carlo simulations suggested that the proposed methodology (2SRI-F) produces an important bias reduction and is superior to the 2SRI, particularly in the presence of an individual frailty due to unmeasured covariates unrelated with the treatment assignment. A very important finding is that the bias of the 2SRI-F method was always close to zero even when the residuals from the treatment selection equation were not normally distributed. The Gaussian distribution can be directly justified when each individual frailty is the sum of different independent sources of heterogeneity. Furthermore, because the procedure with the Gaussian frailty was surprisingly robust to erroneously assumed frailty distributions, we recommend using a Gaussian frailty.

A controversial feature of our procedure is the inclusion of the individual frailty term. Although there exists a vast literature for the case where the frailty is common to a group of individuals (shared frailty), the number of references dealing with individual frailties is minimal. Consistency properties of the common estimation algorithm for Cox models with frailties were proved previously (Nielsen et al., 1992; Murphy, 1995). We adapted these theoretical results in deriving the consistency results presented herein. By specifying a distribution for its values, the individual frailty accounts for the omitted covariates unrelated with the treatment assignment – the extra variability introduced in the survival model from the first stage of the algorithm () and the portion of independent of , freeing the augmented first-stage residual (the control function) to deal with unmeasured confounding. The resulting procedure estimates the average treatment effect conditional on the unmeasured confounder and the frailty (Yashin et al., 2001). Because in practice specification of the distribution of the frailty can be arbitrary, the observed results should be handled with caution (Hougaard, 1995) and they should be supported by sensitivity analyses considering different frailty distributions. In the VQI application, the empirical results were found to only have a slight dependence on the distribution of the frailty (see Figure 5). This is a key finding that justifies the use of the 2SRI-F procedure and represents a major advance in the instrumental variables literature for the analysis of time-to-event outcomes in observational settings.

In the real-world application, a small but significant (at the 0.05 level) effect of the treatment is detected when the presence of omitted covariates on the Cox model is ignored. This effect almost disappears under 2SRI. When the 2SRI-F method is used, the estimated effect of CEA over CAS is slightly larger. This result confirms that the effect of the procedure a patient receives is underestimated when unmeasured confounding is ignored (Chamberlain, 1985; Gail et al., 1984). It is worth noting that different patient enrollment rates were observed by treatment: while CAS patient censorship is constant across the follow-up, most of the CEA patients have follow-up above two years with an important number censored between the second and the fourth years. To the extent these differences are caused by an unmeasured confounder, this can introduce additional bias in the standard naive estimates and strongly motivates the use of an adequate instrumental variable procedure.

The method we developed conditions on all omitted covariates and assumes they have multiplicative effects on the hazard function under the Cox model, unlike recently developed methods that make unusual additive hazard assumptions in order to more simply account for unmeasured confounding (MacKenzie et al., 2014; Tchetgen Tchetgen et al., 2015). Therefore, we anticipate that our proposed and proven procedure will hold extensive appeal and be widely used in practice.

While it is encouraging that our null results cohere with those of recent RCTs (Rosenfield et al., 2016; Brott et al., 2016)

, thereby overcoming the unfavorable CAS results of the non-IV analyses, an effect close to 0 makes it difficult to distinguish our proposed IV procedure from the incumbent two-stage residual inclusion method. However, when the true effect is 0 (HR of 1), the bias from ignoring the frailty is 0 due to the fact that omitting a frailty shrinks the true coefficient towards 0. Therefore, the differences between the 2SRI-F and standard 2SRI procedure for the Cox model are expected to converge to 0 as the true treatment effect approaches 0. In this sense, the lack of extensive differences between the various 2SRI (frailty and standard) procedures is a real-data endorsement that our proposed 2SRI-F procedure for the Cox model performs as it should by not rejecting the null hypothesis when the RCT results and the 2SRI results suggest that the true effect is close to 0.


This work was supported by a Patient-Centered Outcomes Research Institute (PCORI) Award ME-1503-28261. All statements in this paper, including its findings and conclusions, are solely those of the authors and do not necessarily represent the views of the Patient-Centered Outcomes Research Institute (PCORI), its Board of Governors or Methodology Committee. The authors want to show their most sincerely grateful to the PCORI Patient Engagement and Governance Committee and specially to Jon Skinner for reading a draft of the manuscriptfor their efforts in the manuscripts revision and their help in developing the research proposal. The authors have no conflicts of interest to report.


  • Aalen et al. (2015) Aalen, O. O., R. J. Cook, and K. Røysland (2015). Does cox analysis of a randomized survival study yield a causal treatment effect? Lifetime Data Analysis 21(4), 579–593.
  • Anderson (2005) Anderson, T. (2005). Origins of the limited information maximum likelihood and two-stage least squares estimators. Journal of Econometrics 127(1), 1–16.
  • Angrist et al. (1996) Angrist, J., G. Imbens, and D. Rubin (1996). Identification of causal effects using instrumental variables identification of causal effects using instrumental variables. Journal of the American Statistical Association 91(434), 444–455.
  • Brott et al. (2016) Brott, T. G., G. Howard, G. S. Roubin, J. F. Meschia, A. Mackey, W. Brooks, W. S. Moore, M. D. Hill, V. A. Mantese, W. M. Clark, C. H. Timaran, D. Heck, P. P. Leimgruber, A. J. Sheffet, V. J. Howard, S. Chaturvedi, B. K. Lal, J. H. Voeks, and R. W. I. Hobson (2016). Long-term results of stenting versus endarterectomy for carotid-artery stenosis. New England Journal of Medicine 374(11), 1021–1031.
  • Cai et al. (2011) Cai, B., D. S. Small, and T. R. T. Have (2011). Two-stage instrumental variable methods for estimating the causal odds ratio: Analysis of bias. Statistics in Medicine 30(15), 1809–1824.
  • Chamberlain (1985) Chamberlain, G. (1985). Heterogeneity, omitted variable bias, and duration dependence. In J. J. Heckman and B. S. Singer (Eds.), Longitudinal Analysis of Labor Market Data:. Cambridge: Cambridge University Press.
  • Cox (1972) Cox, D. R. (1972). Regression models and life-tables. Journal of the Royal Statistics Society (B) 34(1), 187–220.
  • Gail et al. (1984) Gail, M. H., S. Wieand, and S. Piantadosi (1984). Biased estimates of treatment effect in randomized experiments with nonlinear regressions and omitted covariates. Biometrika 71(3), 431–444.
  • Greene and Zhang (2003) Greene, W. and G. Zhang (2003). Econometric analysis. Prentice Hall, New Jersey, USA.
  • Hausman (1978) Hausman, J. A. (1978). Specification tests in econometrics. Econometrica 46(6), 1251–1271.
  • Hernán (2010) Hernán, M. (2010). The hazards of hazard ratios. Epidemiology 21(1), 13–15.
  • Hernán and Robins (2006) Hernán, M. and J. Robins (2006). Instruments for causal inference: an epidemioligist’s dream? Epidemiology 17(4), 360–372.
  • Hougaard (1995) Hougaard, P. (1995). Frailty models for survival data. Lifetime Data Analysis 1(3), 255–273.
  • Kalbfleisch and Prentice (2002) Kalbfleisch, J. D. and R. L. Prentice (2002). The statistical analysis of failure time data. John Wiley & Sons, Inc.
  • Klungel et al. (2015) Klungel, U., de Boer A., B. S.V., G. R.H., and K. Roes (2015). Instumental variable analysis in epidemiologic studies: an overview of the estimation methods. Pharmaceutica Analytica Acta 6(4), 1–9.
  • Li et al. (2015) Li, J., J. Fine, and A. Brookhart (2015). Instrumental variable additive hazards models. Biometrics 71(1), 122–130.
  • MacKenzie et al. (2012) MacKenzie, T. A., J. R. Brown, D. S. Likosky, Y. Wu, and G. L. Grunkemeier (2012). Review of case-mix corrected survival curves. The Annals of Thoracic Surgery 93(5), 1416 – 1425.
  • MacKenzie et al. (2016) MacKenzie, T. A., M. Lberg, and A. J. O’Malley (2016). Patient centered hazard ratio estimation using principal stratification weights: application to the norccap randomized trial of colorectal cancer screening. Observational Studies 2, 29–50.
  • MacKenzie et al. (2014) MacKenzie, T. A., T. D. Tosteson, N. E. Morden, T. A. Stukel, and A. J. O’Malley (2014). Using instrumental variables to estimate a cox’s proportional hazards regression subject to additive confounding. Health Services and Outcomes Research Methodology 14(1), 54–68.
  • Martens et al. (2006) Martens, E. P., W. R. Pestman, A. de Boer, S. V. Belitser, and O. H. Klungel (2006). Instrumental variables: Application and limitations. Epidemiology 17(3), 261–267.
  • Murphy (1995) Murphy, S. A. (1995). Asymptotic theory for the frailty model. The Annals of Statistics 23(1), 182–198.
  • Nielsen et al. (1992) Nielsen, G. G., R. D. Gill, P. K. Andersen, and T. I. A. Sørensen (1992). A counting process approach to maximum likelihood estimation in frailty models. Scandinavian Journal of Statistics 19(1), 25–43.
  • Normand et al. (2011) Normand, S.-L. T., R. G. Frank, and A. J. O’Malley (2011). Estimating cost-offsets of new medications: Use of new antipsychotics and mental health costs for schizophrenia. Statistics in Medicine 30(16), 1971–1988.
  • Pearl (1995) Pearl, J. (1995). Causal diagrams for empirical research. Biometrika 82(4), 669.
  • Ripatti and Palmgren (2000) Ripatti, S. and J. Palmgren (2000). Estimation of multivariate frailty models using penalized partial likelihood. Biometrics 56(4), 1016–1022.
  • Robins and Tsiatis (1991) Robins, J. M. and A. A. Tsiatis (1991). Correcting for non-compliance in randomized trials using rank preserving structural failure time models. Communications in Statistics - Theory and Methods 20(8), 2609–2631.
  • Rosenfield et al. (2016) Rosenfield, K., J. S. Matsumura, S. Chaturvedi, T. Riles, G. M. Ansel, D. C. Metzger, L. Wechsler, M. R. Jaff, and W. Gray (2016). Randomized trial of stent versus surgery for asymptomatic carotid stenosis. New England Journal of Medicine 374(11), 1011–1020.
  • Schmoor and Schumacher (1997) Schmoor, C. and M. Schumacher (1997). Effects of covariate omission and categorization when analysing randomized trials with the cox model. Statistics in Medicine 16(3), 225–237.
  • Tchetgen Tchetgen et al. (2015) Tchetgen Tchetgen, E., S. Walter, S. Vansteelandt, T. Martinussen, and M. Glymour (2015). Instrumental variable estimation in a survival context. Epidemiology 26(3), 402–410.
  • Terza et al. (2008) Terza, J. V., A. Basu, and P. J. Rathouz (2008). Two-stage residual inclusion estimation: Addressing endogeneity in health econometric modeling. Journal of Health Economics 27(3), 531 – 543.
  • Terza et al. (2008) Terza, J. V., W. D. Bradford, and C. E. Dismuke (2008). The use of linear instrumental variables methods in health services research and health economics: a cautionary note. Health Research and Educational Trust 43(3), 1102–1120.
  • Thanasassoulis and O’Donnell (2009) Thanasassoulis, P. and T. O’Donnell (2009). Mendelian randomization. Journal of American Medical Association 301(22), 2386–2388.
  • Therneau et al. (2003) Therneau, T. M., P. M. Grambsch, and V. S. Pankratz (2003). Penalized survival models and frailty. Journal of Computational and Graphical Statistics 12(1), 156–175.
  • Vaida and Xu (2000) Vaida, F. and R. Xu (2000). Proportional hazards model with random effects. Statistics in Medicine 19(24), 3309–3324.
  • Wan et al. (2015) Wan, F., D. Small, J. E. Bekelman, and N. Mitra (2015). Bias in estimating the causal hazard ratio when using two-stage instrumental variable methods. Statistics in Medicine 34(14), 2235–2265.
  • Wienke (2010) Wienke, A. (2010). Frailty models in survival analysis. Florida: Chapman & Hall/CRC Biostatistics Series.
  • Yashin et al. (2001) Yashin, A., I. MAchine, A. Begun, and J. Vaupel (2001). Hidden frailty: myths and reality. Research Report, Department of Statistics and Demography, Odense University.