A causal model for subgroup effects in randomized screening trials

by   Sudipta Saha, et al.

The primary analysis of randomized cancer screening trials for cancer typically adheres to the intention-to-screen (ITT) principle, measuring cancer-specific mortality reductions between screening and control arms. These mortality reductions result from a combination of the screening regimen, screening technology and the effect of the early, screening-induced, treatment. This motivates addressing these different aspects separately. Here we are interested in the causal effect of early versus delayed treatments on cancer mortality among the screening-detectable subgroup, which under certain assumptions is estimable from conventional randomized screening trial using instrumental variable type methods. To define the causal effect of interest, we formulate a simplified causal multi-state model for screening trials, based on a hypothetical intervention trial where screening detected individuals would be randomized into early versus delayed treatments. The cancer-specific mortality reductions after screening detection are quantified by a cause-specific hazard ratio. For this, we propose two estimators, based on an estimating equation and a likelihood expression. The methods extend existing instrumental variable methods for time-to-event and competing risks outcomes to time-dependent intermediate variables. Using the causal model as a data generating mechanism, we investigate the performance of the new estimators, and compare them to two previously proposed ones. In addition, we illustrate the proposed method in the context of CT screening for lung cancer using the US National Lung Screening Trial (NLST) data.



page 1

page 2

page 3

page 4


A Bayesian Nonparametric Approach for Evaluating the Effect of Treatment in Randomized Trials with Semi-Competing Risks

We develop a Bayesian nonparametric (BNP) approach to evaluate the effec...

Causal Proportional Hazards Estimation with a Binary Instrumental Variable

Instrumental variables (IV) are a useful tool for estimating causal effe...

Causal Inference for Comprehensive Cohort Studies

In a comprehensive cohort study of two competing treatments (say, A and ...

Doubly Robust Nonparametric Instrumental Variable Estimators for Survival Outcomes

Instrumental variable (IV) methods allow us the opportunity to address u...

A Bayesian accelerated failure time model for interval-censored three-state screening outcomes

Women infected by the Human papilloma virus are at an increased risk to ...

Statistical approaches using longitudinal biomarkers for disease early detection: A comparison of methodologies

Early detection of clinical outcomes such as cancer may be predicted bas...

Towards automatic pulmonary nodule management in lung cancer screening with deep learning

The introduction of lung cancer screening programs will produce an unpre...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

The benefits of cancer screening are ideally studied through randomized screening trials, where asymptomatic participants are assigned to either undergo a prespecified regimen of screening examinations, or to control without screening, and then followed up for cancer specific mortality (Hu and Zelen, 1997). The primary analysis of such a trial usually follows the intention to screen (ITT) principle, based on comparison of cancer specific mortality between the screening and control arms at the end of the follow-up period. Any mortality reductions in the screening arm are due to the combination of earlier (asymptomatic) detections due to the screening technology, and the effectiveness of the subsequent early (versus delayed, following symptomatic diagnosis) treatments. This motivates studying these two aspects separately. Herein we are interested in the effectiveness of the early treatments in the subpopulation who would be screening detectable if offered screening. The subgroup effectiveness can be studied based on data collected through conventional screening trials, using instrumental variable type estimators, where the instrument is the randomized assignment.

Instrumental variable (IV) approach has been primarily introduced to adjust for non-compliance with the treatment assignments in therapeutic trials (Angrist et al., 1996; Baiocchi et al., 2014). Several authors have utilized the IV approach in screening trials for estimating the effect of screening assignment among the compliers, a latent subgroup of participants who adhere to the assigned screening regimen (Baker, 1998; Roemeling et al., 2007). This is different from the present context where our interest is in the effectiveness of early treatments in the screening detectable subgroup; IV estimators are applicable also for this purpose, and can estimate the mortality risk reduction in the screening detectable subgroup. For this purpose, McIntosh (1999) formulated instrumental variable type estimators in a cohort diagnosed with cancer, assembled after sufficiently long follow-up period to have similar numbers of cancer cases in both screening and control arms to avoid overdiagnosis bias. In Saha et al. (2018) we defined the effect measures and estimators as functions of time, and formulated the estimators in the entire trial cohort, which avoids any selection bias due to conditioning on intermediate variables in instrumental variable analysis (Swanson et al., 2015).

If the screening is discontinued sufficiently early in the trial and the subsequent follow-up is sufficiently long, the estimators proposed in Saha et al. (2018) also estimate the case-fatality reduction due to the early treatments, though they are valid for subgroup mortality risk reduction even without this interpretation, which is important if the screening regimen continues for the duration of the trial. We connected the effect measures to a hypothetical intervention trial proposed by Miettinen (2014, 2015), where screening-detected individuals would be randomized into early versus delayed treatments. Although such a trial could not be implemented in practice, it is helpful in formulating meaningful causal estimands that can then be estimated from conventional screening trials. In a different context, treatment effect in a latent subgroup has been studied by Altstein et al. (2011); Altstein and Li (2013), in a diagnostic setting where immediate lymph node removal surgery is received by patients with a positive biopsy test while those with a negative biopsy test receive a control regimen of observation, with the causal effect of the immediate surgery among the positive biopsy subgroup as the quantity of interest. The screening trial context differs from this due to the repeated rounds of screening, making the latent subgroup membership time-dependent.

Recent methodological literature has seen several proposals for instrumental variable estimation with time-to-event and competing risks outcomes. The latter is relevant for the screening context as the interest is in cancer-specific mortality. Instrumental variable estimators generalize to time-to-event outcomes most straightforwardly by considering time-specific risk or survival probability as the outcome and using plug-in estimators that can accomodate censoring such as the Kaplan-Meier estimator

(Nie et al., 2011). This approach generalizes to competing risks outcomes by using non-parametric cumulative incidence estimators (Richardson et al., 2017). Instrumental variable approaches to estimate causal effect parameters in causal proportional hazards (Martinussen et al., 2019) and additive hazards (Zheng et al., 2017; Martinussen and Vansteelandt, 2020) models have also been proposed. The current setting differs from these because the latent subgroup is time-dependent, accumulated through repeated screening examinations. In Saha et al. (2018) we demonstrated that the plug-in type estimation approach to instrumental variable estimation is still applicable in this setting, connecting the causal quantities to the hypothetical intervention trial. We also formulated the estimators under competing risks, and allowing for inclusion of covariates to account for covariate dependent censoring. Herein we aim to extend the approach from estimating subgroup mortality risk reduction to estimating a causal hazard ratio. We broadly follow the approach of (Martinussen et al., 2019) for instrumental variable estimation for causal Cox proportional hazards models, but extend this approach for the time-dependent latent subgroup. The challenges in formulating causal estimands with time-dependent exposure in a multi-state model setting have been discussed by von Cube et al. (2019). We make sure the quantities are well-defined by linking them with the hypothetical intervention trial.

To test the methods, our second objective is to propose a causal model for randomized screening trial that can be used as a data generating mechanism in simulation studies. We aim for a simplified mechanism that avoids modeling the properties of the screening technology (sensivity, specifity), sojourn time (the period where asymptomatic cancer is screening-detectable), lead time (the period between screening detection and clinical detection in the absence of screening) and stage transition of the cancer. This is in line with our focus on the causal subgroup treatment effects among the screening detectable. In contrast, the well-known Hu-Zelen model and its extensions which have been used for planning screening trials (Zelen, 1993; Hu and Zelen, 1997; Lee and Zelen, 2006), parameter estimation (Shen and Zelen, 1999; Sung et al., 2019) and could also serve as a simulation model. These models are based on modeling the natural history of the disease and the properties of the screening test. We aim to avoid specifying these parts of the model by basing the causal model on the hypothetical intervention trial, where the causal effects of early treatments following an early detection are quantified through a hazard ratio. We will apply the simulation mechanism to compare the performance of the different measures suggested for subgroup mortality reduction, and further use simulation to answer the question of whether conventional screening trials are powered to detect such subgroup effects.

The remainder of the paper is structured as follows: in Section 2 we introduce the necessary notation, define the causal model, and describe the algorithm to simulate data mimicking screening trials. In Section 3, we discuss the identifying assumptions for estimation under this model, and introduce a new estimator to quantify the effect of early treatments in the screening-detectable subgroup as a hazard ratio. We presented a simulation study to compare all the estimators in Section 4, in terms of bias, power and coverage probability. In Section 5, we illustrate the use of the new estimator using data from the US National Lung Screening Trial (NLST). Finally, a brief discussion and future directions are presented in Section 6.

2 A causal model for randomized screening trials

2.1 Conventional screening trial

We introduce the necessary notation in the context of the US National Lung Screening Trial, where heavy smokers, currently smoking or having quit within the last years, with a smoking history of pack years, aged between , were assigned at random to undergo three annual screening examination or control. The participants were asymptomatic of lung cancer at the time of randomization. Participants in the screening arm received three annual low-dose helical CT scans, whereas participants in the control arm received a standard chest X-ray, and both arms were followed up for years for the main analysis (NLST Research Team, 2011). In the following, we assume the X-ray screening to have negligible mortality benefits, as has been demonstrated by Oken et al. (2011), and equate it to no screening. We characterize the causal effect of screening-induced early treatments through a simplified causal multi-state model where the mortality benefits manifest after an early detection through a CT scan. From the healthy state (numbered as state 1), participants in the screening arm have three possible transitions: early diagnosed through CT screening (state 2), cancer-specific death without early diagnosis (state 3) and other-cause death (state 4). In addition, from the early diagnosed state, the participants can move to cancer-specific death or other-cause death states, with potentially different transition rates. Since the control arm does not have early diagnosis by definition, only transitions from state 1 to states 3 and 4 are possible. For concise notation, we use counting processes. We note that since each particular type of transition in a multi-state model can be characterized through a competing risks model, the transition intensities are analogous to the causal cause-specific hazards, as outlined by Young et al. (2020), but conditional on the event histories taken place so far.

Let be the potential (underlying, in the absence of censoring) counting process that counts the transitions of a given individual from state to by time in arm , where and , or indicate the screening or control arms, respectively. In the current context each transition can take place at most once, and thus each counting process only counts to one, with states 3 and 4 being absorbing states. The corresponding observed process is formulated as , where is the arm that the patient actually was assigned to in the randomized screening trial. The counting processes are characterized by the transition intensities defined as

where indicates the history of states until time . We note that the model could be alternatively specified in discrete time through the conditional probabilities , but we will use continuous time notation for simplicity. The corresponding cumulative transition intensity function is defined as . The transition intensities can be collected into a matrix

In the conventional screening trials, the ITT estimand would be defined by comparing the outcomes between the two arms at time after sufficiently long follow-up, in terms of the cumulative incidences for cancer-specific mortality. For example, the absolute reduction in cancer-specific mortality risk would be measured by


and the proportional reduction by


The estimation of these quantities does not requite a multi-state model, as the components could be estimated through empirical cumulative incidences in the two arms. However, the multi-state model is needed to specify and estimate the causal effect of the early treatment among the early detectable subgroup. This subgroup is latent at baseline and accumulates in the screening arm during the follow-up of the trial through the repeated screening examinations. Since screening itself is not an intervention, the mortality benefits can manifest only after an early detection. To specify the correponding causal effect, we need to introduce a well-defined intervention.

2.2 Hypothetical intervention trial

Following and extending (Miettinen, 2015), we can envision a trial that has similar screening regimen as the screening arm of the conventional trial, but instead of randomizing into screening and non-screening, screens everyone, and at the time of an early detection, randomizes into referral to early treatment vs delayed treatment through withholding the screening result. Delayed treatment refers treatments following a later symptomatic diagnosis. Since the early detection can take place only once for a given individual, the randomization for the assignment in the event of early detection can already take place at study baseline, which is relevant for the indexing of the corresponding potential outcomes. A schematic illustration of the hypothetical trial is presented in Figure 1, and the corresponding multi-state model in Figure 2. While this kind of trial could not be implemented in practice, it is important as a thought experiment to define the causal quantities that we are interested in.

Among we define , to be potential (underlying) counting processes for subsequent death corresponding to early () versus delayed () treatment, characterized by transition intensities

The modified transition matrix is now

Under this model, the absolute cancer mortality mortality risk reduction in the subgroup could be measured by


and the proportional reduction by


In a trial where the screening is discontinued at some point (as it was in NLST after three annual rounds), and the follow-up is long enough so that all the mortality benefits have been realized, these quantities approximate the absolute and proportional case-fatality reduction. We discussed the estimation of these quantities in (Saha et al., 2018).

We note that the above does not yet assume anything about the early treatment effect, for example generally this effect could be a function of time and/or function of the time of the early diagnosis. However, we can simplify the model by characterizing the effect as a constant ratio , characterizing the early treatment effect on cancer-specific mortality through a single parameter. In addition, although this is not generally required, we take , meaning that the early treatments do not have effect on other-cause mortality. We take the time scale for all of the transition intensities to be the time since the baseline/study entry. However, it would also be equally possible to use the time since early detection as the time scale for the transitions after the early detection, and to define proportionality of the effect on this time scale.

Since the transition intensities are not directly estimable from data observed under a conventional randomized screening trial, the estimation requires instrumental variable type methods, using the randomized assignment in the conventional trial as an instrument. We will propose estimation methods in Section 3, but will first discuss the use of the causal model for simulation.

Eligible for screening and randomized

(screening arm)

(control arm)

(a) Conventional screening trial Contrast: or

Eligible for screening

(everyone screened)

(screening-detected and randomized)

(referral to early treatment)

(no referral to early treatment)

(b) Hypothetical intervention trial Contrast: or
Figure 1: Illustration of a conventional randomized screening trial and a hypothetical intervention trial sharing the same screening regimen, and the corresponding causal contrasts.



Early detected


Cancer-specific death


Other-cause death


Figure 2: A schematic illustrating the transition intensities in the hypothetical intervention trial. Here corresponds to the transition intensity from state to state at time . The dashed arrow represents cancer mortality under delayed treatments, contrasted to early treatments, with the intensity ratio quantifying the causal effect of interest.

2.3 Description of the simulation algorithm

To simulate data from the proposed causal multi-state model, one needs to fix the transition intensities in the matrix , to simulate event histories in the two arms in the hypothetical intervention trial, corresponding to and . These event histories are also used to represent event histories in the conventional screening trial. In particular, the intervention arm () event histories can be directly taken to be event histories in the screening arm of the conventional trial, assuming that early diagnosis in practice leads to early treatment, as would be the case at least in the lung cancer context. Also, the control arm event histories can be directly taken to be event histories in the control arm by taking and , , under the assumption that the screening test and being withheld the result of it does not in itself modify the subsequent outcomes without the intervention.

For a simulation study, for example exponential or Weibull functional forms can be assumed for the intensities, or the observable ones (, and , ) can be estimated from the screening arm of an existing trial such as the NLST, with the control arm () counterparts obtained by fixing the effect parameter to a value. With the transition intensity matrix , simulation can then proceed through usual algorithms for simulating event histories from a multi-state model as described for example in Beyersmann et al. (2011, p.  45). Briefly, this proceeds by simulating the time and type of the next event from a competing risks model, moving to a new state, and continuing the same. Starting from the healthy state, this would mean simulating the time of the first event, , from the total hazard

which specifies the event time distribution

. Using the inverse cumulative distribution function method, we can take

and solve with respect to to get the time. At time the type of the event is then drawn with multinomial probabilities

for . If the new state is terminal (), the algorithm stops, otherwise it continues similarly from state 2, where the new total hazard is given by , and the event type probabilities by

for . We used the NLST as the basis of the simulation study presented in Section 4. For the observable transition intensities, we calculated smooth hazard estimates using the (Hess and Gentleman, 2019) package in R and integrated them numerically to get the corresponding cumulative intensities.

3 Hazard ratio estimation

To estimate the parameter from data collected under a conventional randomized screening trial, in addition to the modeling assumption discussed in the previous section, we need some identifying assumptions. In particular, we assume that (i) , , meaning that for someone who would get early detected by time , but not receive early treatment, the potential mortality outcome would be the same as in the control arm of the conventional trial. Further, we assume that , , meaning that for some who would not get early detected, the potential mortality outcome would be the same between the two arms of the conventional trial. Essentially, assumptions (i) and (ii) together state that screening examinations themselves (and being withheld of the result) do not have impact on mortality in the absence of the intervention. In addition, we assume that (iii) , meaning that the mortality rates under early treatment following early diagnosis can be estimated from the screening arm of the conventional trial. Essentially, this means assuming that in practice early treatment follows early diagnosis.

3.1 Estimating equation

To derive a connection between the observable and unobservable quantities, we can re-express the cancer-specific cumulative incidence in the control arm of a conventional trial as the sum of two different alternative event histories in terms of the latent subgroup membership of the individual, that is, the event history under early detection and no early treatment and the event history under no early detection by time . This gives


Here the second equality used assumptions (i) and (ii) and the third equality assumption (iii), as well as the modeling assumption of proportional hazards. A related equation was considered by McIntosh (1999), but without considering the hazard ratio as the parameter of interest. The estimating equation proposed by Martinussen et al. (2019) is also similar, but did not consider time-dependent latent subgroup. We note that all the quantities on the right hand side of can be estimated from the screening arm of the conventional trial, while the left hand size can be estimated from the control arm of the conventional trial. For semi-parametric estimation of , we use non-parametric plug-in estimates for the other quantities in (3.1).

State occupation probabilities in a multi-state model can be generally estimated using the non-parametric Aalen-Johansen approach (Borgan, 2014), which in the case of a competing risks setting reduces to the non-parametric cumulative incidence estimator for the left hand side of the equation. The required plug-in inputs for the right hand side are Nelson-Aalen/Breslow estimates for the cumulative hazards/hazard increments; using these as inputs (with one of them modified by the multiplicative factor ), the Aalen-Johansen state occupation probabilities can be calculated for example using the msfit and probtrans functions of the mstate package (Putter et al., 2007; de Wreede et al., 2011). Equation (3.1) can then be solved numerically with respect to to estimate the early treatment effect. The approach can accommodate independent censoring through the plug-in estimates. The fixed time at which the estimating equation is evaluated can be chosen as the maximum follow-up length, or, as we demonstrate in Section 5, the effect can be estimated as a function of to see if the estimates stabilize over the follow-up period.

We note that we can obtain an analogous equation


for other-cause mortality. However, this is not needed for estimation as we have only one unknown quantity to solve under the assumption that early treatments do not have effect on other-cause mortality.

3.2 Maximum likelihood estimation

Alternatively to the estimating equation approach, we can consider a multinomial likelihood expression for the parameter using data from the mortality outcomes in the conventional trial. This can be written as


where and are as in (3.1) and (3.1). The proportionality followed because, considering the transition intensities in the screening arm as fixed quantities, the likelihood contributions of the individuals in the screening arm do not depend on . In practice, for semi-parametric estimation, the transition intensities in (3.1) and (3.1) can be replaced with non-parametric plug-in estimates, and the expression then maximized numerically with respect to . Instead of evaluating the likelihood expression at a fixed time , independent censoring can be accommodated by evaluating the likelihood contributions at the minimum of the time of death or censoring time.

4 Simulation study

To compare the estimators proposed in Section 3, we generated data using the algorithm described in Section 2.3, using the NLST as the basis of the simulation study. In addition, we compared the statistical power of tests based on these to the two estimators we had proposed earlier for absolute and proportional subgroup mortality risk reduction. In the present notational framework, these estimators can be expressed as




where again expectations can be replaced with appropriate non-parametric cumulative incidence estimators. While these are estimating mortality risk reductions rather than the hazard ratio, the four estimators are comparable in terms of the power of a Wald-type or confidence interval based test on them. This will help answer the question of whether conventional screening trial such as NLST is powered to detect the subgroup early treatment effect. For this reason, we also calculated estimates for the ITT mortality reductions as


and the proportional reduction by


as the trial is powered for these.

To compare the four estimators, we generated NLST-like datasets, by fixing the transition intensities to those estimated from the NLST, and fixing the hazard ratio to . For each dataset we simulated individuals assigned randomly with equal probability to screening or control. Since in the NLST the censoring is mostly administrative at the end of the follow-up at 7 years, we did not simulate random censoring. The true values of the subgroup mortality reductions were evaluated by calculating the quantities (3) and (4) under the specified transition intensities. In the estimators, the cumulative incidences were calculated using the cuminc function of the cmprsk R package (Gray, 2019). For more complex state occupation probabilities needed for evaluating the estimating equation and the likelihood expression, we first calculated Nelson-Aalen/Breslow estimators, which were used as inputs to Aalen-Johansen estimators for state occupation probabilities using the functions available in the mstate R package. The estimating equation was solved numerically using the uniroot function and the likelihood maximized numerically using the optim function of R (R Core Team, 2017).

The mortality reductions, as well as the estimating equation (3.1) and the likelihood expression (3.2

) were evaluated at 7 years. To estimate standard errors for each estimator, we boostrapped each simulated dataset 50 times, and took the standard deviation of the point estimates as the standard error. This was compared to the Monte Carlo standard deviation of the point estimates over the simulation rounds. 95% confidence intervals were constructed using the normal approximation and compared to the nominal coverage probability. The power of the confidence interval based test was calculated as the proportion of simulation rounds where the interval did not cover the null value.

The simulation results are presented in Table 1. These indicate that all the estimators could capture the corresponding true value relatively well, based on low bias and near nominal coverage probability. The power of the subgroup treatment effect estimators was also comparable to the ITT analysis. This can be explained by the fact that all the mortality reduction due to screening is realized through the effect of the early treatments, and thus focusing on the subgroup does not lose any of the effect. The test based on the estimator (9) had the highest power. The two hazard ratio estimators gave very similar results, so we only use the estimating equation for the data analysis in the next section.

Estimator Causal Truth Point SE Power Coverage MCsd MCE
contrast estimate
(3.1) 0.4804 0.4893 0.1628 0.85 0.942 0.1657 0.005
(3.2) 0.4804 0.4892 0.1628 0.85 0.942 0.1657 0.005
(8) (3) 0.1512 0.1507 0.0521 0.83 0.945 0.0534 0.0017
(9) (4) 0.2986 0.2921 0.0804 0.90 0.944 0.0803 0.0025
(10) (1) 0.0036 0.0036 0.0012 0.85 0.940 0.0012 0.00004
(11) (2) 0.1616 0.1616 0.0500 0.87 0.938 0.0514 0.0016
Table 1: Comparison of subgroup and population-level estimators. The estimators from top to bottom are estimating equation and likelihood based estimation of the subgroup cancer mortality hazard ratio, absolute and proportional subgroup cancer mortality reduction, and absolute and proportional population cancer mortality reduction. SE stands for standard error, MCsd for Monte Carlo standard deviation and MCE for Monte Carlo error.

5 Illustration of the hazard ratio using NLST data

In the NLST, heavy smokers were randomized to either CT screening or control arms, with three annual rounds of screening with the first round initiated at the baseline and subsequently followed for years (NLST Research Team, 2011). The original analysis followed the ITT principle. Here we are interested in quantifying the effect of the early vs late treatments among the screening detectable subgroup in terms of the subgroup cancer-specific mortality hazard ratio. For this purpose, we carried out a secondary analysis of the NLST data using the estimating equation approach based on (3.1). We estimated the hazard ratio as a function of the fixed time point , to determine if this effect stabilizes or dilutes over time. The confidence intervals were generated using bootstrap replicates.

The results are presented in Figure 3. This demonstrates that the hazard ratio becomes statistically significant in the second year of follow-up, reaches a maximum of at around 6 years, and levels after that. Overall the effect estimate is relatively stable over time which may lend some support towards the proportionality of this effect. The maximum hazard ratio indicates a cancer-specific mortality hazard reduction due to the early treatments. This can be contrasted to the empirical cancer mortality rate ratio between screening and control arms in the trial which after 7 year of follow-up was , or around 15.6%. The reason for the difference between these two measures is that the latter combines the effect of early treatments after early diagnosis with factors influencing early diagnosis itself, including the screening regimen, screening technology and non-compliance. The statistical significance of the two effects is similar based on the NLST data; even though the subgroup effect is larger in magnitude, it is evaluated in a smaller subpopulation, and these two aspects counterbalance each other compared to the smaller ITT effect in the entire study population.

Figure 3: Estimated effect of delayed versus early treatment on lung cancer mortality among the screening-detectable subgroup. The dashed lines are 95% bootstrap confidence bands. Estimates are generated as a function of time, using the follow-up data up to a given time point . The estimates are noisy in the first year of follow-up and are shown from the second year.

6 Discussion

In this paper, we proposed a simplified causal multi-state model to simulate screening trials, and proposed a new measure (i.e. causal cause-specific hazard ratio) to quantify the impact of early treatments compared to delayed treatments in the screening-detectable subgroup. We note that the subgroup is specific to the screening regimen and screening technology implemented in the trial that is used for the estimation, as the effect is formulated in terms of a hypothetical intervention trial that shares the same regimen and technology. We presented two alternative estimators for the subgroup effect. In addition to comparing properties of estimators, the simulation model may have use in planning new trials targeted for the subgroup effects.

While we contend that the cause-specific hazard ratio can serve as a useful summary measure of the magnitude of the effect, and simplifies accommodating competing risks into the analysis, we do note that the use of marginal hazard ratios as a causal parameters has been criticized by several authors (e.g. Aalen et al., 2015), who point out even in randomized trials where the arms are initially balanced with respect to observed or unobserved baseline characteristics, this balance is lost over the course of the follow-up as more high risk individuals get removed from the risksets. However, as outlined by Young et al. (2020), a causal cause-specific hazards can always be formulated in terms of the underlying potential event time/event type outcomes in the two arms of a trial. Because these quantities are defined conditional on the past, it should be noted that the corresponding hazard ratio being one at a given time does not necessarily imply the absence of a causal effect. If a strictly causal interpretation is sought, one could use the subgroup mortality risk reduction as the measure instead.

With these reservations, we suggest interpreting the cause-specific hazard ratio as a summary measure of the early treatment effect over time. However, we note that the proportionality is more plausible for the subgroup early treatment effect since this effect begins from the early diagnosis, compared to the use of cancer mortality rate ratio to compare the screening and control arms, which is dependent on the screening regimen, among other things. In fact, we suggest that the present proposal can resolve some of the controversies in characterizing screening effect through a hazard/rate ratio, discussed for example by Liu et al. (2013, 2015); Hanley and Njor (2018); Habbema (2018). Rather than the screening effect in the entire population, what could be plausibly assumed proportional is the early treatment effect in the subgroup. On the other hand, the usual residual type diagnostics for proportionality violations are not applicable under the instrumental variable type estimation methods; diagnosing violations of proportionality under the current framework is a topic for further research. As one possible approach, the likelihood-based method would allow testing the significance of interaction terms with time entered into the model. The likelihood could also be used to assess on which time scale proportionality fits the data better.

The estimation methods we outlined here could accommodate independent censoring. They extend straightforwardly to the case where the censoring is conditional on baseline covariates, for example, standardizing over the empirical covariate distribution in the control arm of the trial, equation (3.1) can be expressed as

where the covariate conditional cause-specific hazards can be estimated using Cox models, and the cumulative incidences on the left hand side either with Cox or Fine & Gray models. In addition to allowing for covariate-dependent censoring, conditioning on the covariates may be useful for relaxing the instrumental variable assumptions outside of randomized trials, or improving the efficiency of the instrumental variable type estimators (Burgess et al., 2017). We are currently pursuing these extensions.


This work was supported by the Ontario Institute for Cancer Research through funding provided by the Government of Ontario (to SS) and by a Discovery Grant from the Natural Sciences and Engineering Research Council of Canada (to OS). The authors thank the National Cancer Institute (NCI) for access to NCI’s data collected by the National Lung Screening Trial. The statements contained herein are solely those of the authors and do not represent or imply concurrence or endorsement by the NCI.


  • Aalen et al. (2015) Aalen, O. O., Cook, R. J., and Røysland, K. (2015). Does cox analysis of a randomized survival study yield a causal treatment effect? Lifetime data analysis, 21(4):579–593.
  • Altstein and Li (2013) Altstein, L. L. and Li, G. (2013). Latent subgroup analysis of a randomized clinical trial through a semiparametric accelerated failure time mixture model. Biometrics, 69:52–61.
  • Altstein et al. (2011) Altstein, L. L., Li, G., and Elashoff, R. M. (2011). A method to estimate treatment efficacy among latent subgroups of a randomized clinical trial. Statistics in medicine, 30:709–717.
  • Angrist et al. (1996) Angrist, J. D., Imbens, G. W., and Rubin, D. B. (1996). Identification of causal effects using instrumental variables. Journal of the American statistical Association, 91(434):444–455.
  • Baiocchi et al. (2014) Baiocchi, M., Cheng, J., and Small, D. S. (2014). Instrumental variable methods for causal inference. Statistics in medicine, 33(13):2297–2340.
  • Baker (1998) Baker, S. G. (1998). Analysis of survival data from a randomized trial with all-or-none compliance: estimating the cost-effectiveness of a cancer screening program. Journal of the American Statistical Association, 93(443):929–934.
  • Beyersmann et al. (2011) Beyersmann, J., Allignol, A., and Schumacher, M. (2011). Competing risks and multistate models with R. Springer Science & Business Media.
  • Borgan (2014) Borgan, Ø. (2014). Aalen-Johansen estimator. Wiley StatsRef: Statistics Reference Online, pages 1–13.
  • Burgess et al. (2017) Burgess, S., Small, D. S., and Thompson, S. G. (2017). A review of instrumental variable estimators for mendelian randomization. Statistical methods in medical research, 26(5):2333–2355.
  • de Wreede et al. (2011) de Wreede, L. C., Fiocco, M., and Putter, H. (2011). mstate: An R package for the analysis of competing risks and multi-state models. Journal of Statistical Software, 38(7):1–30.
  • Gray (2019) Gray, B. (2019). cmprsk: Subdistribution Analysis of Competing Risks. R package version 2.2-9.
  • Habbema (2018) Habbema, D. (2018). Statistical analysis and decision making in cancer screening. European journal of epidemiology, 33(5):433–435.
  • Hanley and Njor (2018) Hanley, J. A. and Njor, S. H. (2018). Disaggregating the mortality reductions due to cancer screening: model-based estimates from population-based data. European journal of epidemiology, 33(5):465–472.
  • Hess and Gentleman (2019) Hess, K. and Gentleman, R. (2019). muhaz: Hazard Function Estimation in Survival Analysis. R package version
  • Hu and Zelen (1997) Hu, P. and Zelen, M. (1997). Planning clinical trials to evaluate early detection programmes. Biometrika, 84:817–829.
  • Lee and Zelen (2006) Lee, S. and Zelen, M. (2006). Chapter 11: a stochastic model for predicting the mortality of breast cancer. JNCI Monographs, 2006:79–86.
  • Liu et al. (2015) Liu, Z., Hanley, J. A., Saarela, O., and Dendukuri, N. (2015). A conditional approach to measure mortality reductions due to cancer screening. International Statistical Review, 83(3):493–510.
  • Liu et al. (2013) Liu, Z., Hanley, J. A., and Strumpf, E. C. (2013). Projecting the yearly mortality reductions due to a cancer screening programme. Journal of medical screening, 20:157–164.
  • Martinussen et al. (2019) Martinussen, T., Nørbo Sørensen, D., and Vansteelandt, S. (2019). Instrumental variables estimation under a structural Cox model. Biostatistics, 20(1):65–79.
  • Martinussen and Vansteelandt (2020) Martinussen, T. and Vansteelandt, S. (2020). Instrumental variables estimation with competing risk data. Biostatistics, 21(1):158–171.
  • McIntosh (1999) McIntosh, M. W. (1999). Instrumental variables when evaluating screening trials: estimating the benefit of detecting cancer by screening. Statistics in medicine, 18:2775–2794.
  • Miettinen (2014) Miettinen, O. S. (2014). Toward Scientific Medicine. Springer.
  • Miettinen (2015) Miettinen, O. S. (2015). ‘screening’for breast cancer: Misguided research misinforming public policies. Epidemiologic Methods, 4:3–10.
  • Nie et al. (2011) Nie, H., Cheng, J., and Small, D. S. (2011). Inference for the effect of treatment on survival probability in randomized trials with noncompliance and administrative censoring. Biometrics, 67(4):1397–1405.
  • NLST Research Team (2011) NLST Research Team (2011). Reduced lung-cancer mortality with low-dose computed tomographic screening. N Engl J Med, 365:395–409.
  • Oken et al. (2011) Oken, M. M., Hocking, W. G., Kvale, P. A., Andriole, G. L., Buys, S. S., Church, T. R., Crawford, E. D., Fouad, M. N., Isaacs, C., Reding, D. J., Weissfeld, J. L., Yokochi, L. A., O’Brien, B., Ragard, L. R., Rathmell, J. M., Riley, T. L., Wright, P., Caparaso, N., Hu, P., Izmirlian, G., Pinsky, P. F., Prorok, P. C., Kramer, B. S., Miller, A. B., Gohagan, J. K., and Berg, C. D. (2011). Screening by chest radiograph and lung cancer mortality: the Prostate, Lung, Colorectal, and Ovarian (PLCO) randomized trial. JAMA, 306:1865–1873.
  • Putter et al. (2007) Putter, H., Geskus, R. B., and Fiocco, M. (2007). Tutorial in biostatistics: Competing risks and multi-state models. Statistics in Medicine, (26):2389–2430.
  • R Core Team (2017) R Core Team (2017). R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria.
  • Richardson et al. (2017) Richardson, A., Hudgens, M. G., Fine, J. P., and Brookhart, M. A. (2017). Nonparametric binary instrumental variable analysis of competing risks data. Biostatistics, 18(1):48–61.
  • Roemeling et al. (2007) Roemeling, S., Roobol, M. J., Otto, S. J., Habbema, D. F., Gosselaar, C., Lous, J. J., Cuzick, J., and Schröder, F. H. (2007). Feasibility study of adjustment for contamination and non-compliance in a prostate cancer screening trial. The Prostate, 67(10):1053–1060.
  • Saha et al. (2018) Saha, S., Liu, Z. A., and Saarela, O. (2018). Estimating case-fatality reduction from randomized screening trials. Epidemiologic Methods, 7.
  • Shen and Zelen (1999) Shen, Y. and Zelen, M. (1999). Parametric estimation procedures for screening programmes: stable and nonstable disease models for multimodality case finding. Biometrika, 86(3):503–515.
  • Sung et al. (2019) Sung, N. Y., Jun, J. K., Kim, Y. N., Jung, I., Park, S., Kim, G. R., and Nam, C. M. (2019). Estimating age group-dependent sensitivity and mean sojourn time in colorectal cancer screening. Journal of medical screening, 26(1):19–25.
  • Swanson et al. (2015) Swanson, S. A., Robins, J. M., Miller, M., and Hernán, M. A. (2015). Selecting on treatment: a pervasive form of bias in instrumental variable analyses. American journal of epidemiology, 181(3):191–197.
  • von Cube et al. (2019) von Cube, M., Schumacher, M., and Wolkewitz, M. (2019). Causal inference with multistate models—estimands and estimators of the population attributable fraction. Journal of the Royal Statistical Society: Series A (Statistics in Society).
  • Young et al. (2020) Young, J. G., Stensrud, M. J., Tchetgen Tchetgen, E. J., and Hernán, M. A. (2020). A causal framework for classical statistical estimands in failure-time settings with competing events. Statistics in Medicine, 39(8):1199–1236.
  • Zelen (1993) Zelen, M. (1993). Optimal scheduling of examinations for the early detection of disease. Biometrika, 80:279–293.
  • Zheng et al. (2017) Zheng, C., Dai, R., Hari, P. N., and Zhang, M.-J. (2017). Instrumental variable with competing risk model. Statistics in medicine, 36(8):1240–1255.