A matching framework for truncation by death problems

10/19/2021
by   Tamir Zehavi, et al.
0

Even in a carefully designed randomized trial, outcomes for some study participants can be missing, or more precisely, ill-defined, because participants had died prior to date of outcome collection. This problem, known as truncation by death, means that the treated and untreated are no longer balanced with respect to covariates determining survival. To overcome this problem, researchers often utilize principal stratification and focus on the Survivor Average Causal Effect (SACE). The SACE is the average causal effect among the subpopulation that will survive regardless of treatment status. In this paper, we present a new approach based on matching for SACE identification and estimation. We provide an identification result for the SACE that motivates the use of matching to restore the balance among the survivors. We discuss various practical issues, including the choice of distance measures, possibility of matching with replacement, post-matching crude and model-based SACE estimators, and non-parametric tests. Our simulation results demonstrate the flexibility and advantages of our approach. Because the cross-world assumptions needed for SACE identification can be too strong and are unfalsifiable, we also present sensitivity analysis techniques and illustrate their use in real data analysis. Finally, a recent alternative for SACE that does not demand cross-world unfalsifiable assumptions targets the conditional separable effects. We show how our approach can also be utilized to estimate these causal effects.

READ FULL TEXT VIEW PDF

Authors

page 1

page 2

page 3

page 4

06/28/2020

Conditional separable effects

Researchers are often interested in treatment effects on outcomes that a...
02/24/2022

A sensitivity analysis approach for the causal hazard ratio in randomized and observational studies

The Hazard Ratio (HR) is often reported as the main causal effect when s...
06/01/2022

The Subtype-Free Average Causal Effect for Heterogeneous Disease Etiology

Studies have shown that the effect an exposure may have on a disease can...
09/28/2021

Identification of the Heterogeneous Survivor Average Causal Effect in Observational Studies

Clinical studies are often encountered with truncation-by-death issues, ...
02/15/2019

Survivor average causal effects for continuous time: a principal stratification approach to causal inference with semicompeting risks

In semicompeting risks problems, nonterminal time-to-event outcomes such...
03/06/2021

Identifying Principal Stratum Causal Effects Conditional on a Post-treatment Intermediate Response

In neoadjuvant trials on early-stage breast cancer, patients are usually...
03/23/2020

Insights into the "cross-world" independence assumption of causal mediation analysis

Causal mediation analysis is a useful tool for epidemiological research,...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Randomized controlled trials (RCTs) are considered by many the gold standard for estimating causal effects of treatments or interventions on an outcome of interest. However, even in a carefully designed trial, outcomes for some study participants can be missing, or more precisely, ill-defined, because participants had died prior to date of outcome collection. This problem is known as “truncation by death” (Zhang and Rubin, 2003; Hayden et al., 2005; Egleston et al., 2006; Lee, 2009; Zhang et al., 2009; Ding et al., 2011; Tchetgen Tchetgen, 2014; Wang et al., 2017; Ding and Lu, 2017; Feller et al., 2017; Stensrud et al., 2020; Nevo and Gorfine, 2020).

Examples of truncation by death in practice are ubiquitous: in clinical studies, when the outcome is quality-of-life index one year after initiating treatment for a terminal disease, but some study participants die earlier (Ding et al., 2011; Ding and Lu, 2017); in labor economics, when studying the effect of a training program on salaries, but some remain unemployed (in this example, unemployment is the analogue of death) (Lee, 2009; Zhang et al., 2009); in education studies, different school programs are evaluated with respect to students’ achievements, but some students may drop out (Garcia Jaramillo and Hill, 2010).

In the presence of truncation by death, a naive comparison of the outcomes between the treated and untreated groups among the survivors does not correspond to a causal effect. Intuitively, those survived with the treatment might have died had they did not receive the treatment, hence the two treatment groups are not comparable, and are unbalanced with respect to covariates determining survival. An alternative composite outcome approach combines the two outcome types into one, often by setting the outcome to zero for those not surviving. This approach estimates a well-defined causal effect, but it entangles treatment effect on survival with treatment effect on the outcome.

To overcome the lack of causal interpretation for the comparison among the survivors, or the partial information gained by the composite outcome approach, researchers have utilized principal stratification (Robins, 1986; Frangakis and Rubin, 2002) to focus on a subpopulation among which a causal effect is well-defined. The Survivor Average Causal Effect (SACE) is the effect within the subpopulation who would survive regardless of treatment assignment; this subpopulation is often termed the always-survivors.

Identification and estimation of the SACE possess a challenge even in a RCT, as the standard assumptions of randomization and stable unit treatment value assumption (SUTVA) do not guarantee identification from the observed data distribution. As a result, there is a built-in tension between the strength of the assumptions researchers are willing to make and the degree to which causal effects can be identified from the observed data. Ordered by the increasing strength of the assumptions underpinning them, three main approaches have been considered for the SACE: constructing bounds (Zhang and Rubin, 2003; Lee, 2009; Nevo and Gorfine, 2020), conducting sensitivity analyses (Hayden et al., 2005; Egleston et al., 2006; Ding et al., 2011; Ding and Lu, 2017; Nevo and Gorfine, 2020) and leveraging additional assumptions for full identification (Hayden et al., 2005; Zhang et al., 2009; Ding et al., 2011; Ding and Lu, 2017; Feller et al., 2017; Wang et al., 2017).

A key commonly-made assumption is monotonicity (Zhang and Rubin, 2003; Egleston et al., 2006; Tchetgen Tchetgen, 2014; Ding and Lu, 2017; Feller et al., 2017; Wang et al., 2017). Monotonicity states that those surviving when untreated would also survive had they were treated. Under monotonicity, the principal stratum proportions are identifiable from the data but the SACE is not.

Towards identification of principal strata effects, one line of work utilizes conditional independence assumptions between the stratum membership and the non-survival potential outcomes, given covariates and possibly one of the survival potential outcomes (Jo and Stuart, 2009; Ding and Lu, 2017; Feller et al., 2017). Jo and Stuart (2009) investigated principal strata effects in the instrumental variable (IV) settings, where strata arise due to noncompliance, and focused on a scenario with only two strata. Ding and Lu (2017) considered IV and truncation by death being leading examples in a general study of methods based on the principal score

. The principal score is the probability of stratum membership given covariates

(Hill et al., 2002). Ding and Lu (2017) derived key properties of the principal score, and proposed a weighting-based estimator for principal causal effects. Principal score methods, however, essentially assume that there are no shared causes of survival and the outcome that are affected by the treatment. Tchetgen Tchetgen (2014) considered a special variant of the SACE, that refines the always-survivors to only include those who would also survive under the intervention that assigns the treatment to, say, , and a post-treatment covariate to its value under treatment .

The practical use of SACE has been criticized (Robins, 1986; Stensrud et al., 2020) because the underlying subpopulation for which the SACE is relevant can be an irregular subset of the population and because identification of the SACE relies on unfalsifiable assumptions. A recent alternative (Stensrud et al., 2020) extended the separable effect paradigm (Stensrud et al., 2020) to the truncation by death problem. Their proposed estimands, termed conditional separable effects (CSEs), consider, loosely speaking and under certain conditions, a mechanistic separation of the treatment to the component affecting survival and the component affecting the outcome. Interestingly, these effects can be identified even in a trial that only includes the original treatment, without its mechanistic separation, under assumptions that are not testable from that trial, but, unlike the assumptions needed for SACE identification, can be falsified by a carefully-designed future trial.

Nevertheless, in practice, while researchers across a variety of disciplines increasingly recognize the invalidity of the aforementioned naive approaches (McConnell et al., 2008; Colantuoni et al., 2018; Koenen et al., 2017), the novel methods and theory developed by causal inference researchers for truncation by death are not yet fully implemented in practice. Therefore, methods that are intuitive, simple to use, and with clear practical guidelines are of need. The recent ICH E9 addendum on estimands and sensitivity analysis in clinical trials (ICH, 2020) highlighted the importance of considering intercurrent events, including truncation by death, when designing and analyzing RCTs. These considerations include, among other issues, the target estimand for defining treatment effect, and a laid out sensitivity analysis approach for the different assumptions.

In this paper, we propose to exploit the power and flexibility of matching procedures for estimation of SACE or CSEs. Traditionally, matching has been used to adjust for pre-treatment differences in observed covariates between the treated and untreated groups in observational studies (Stuart, 2010; Rosenbaum, 2011; Dehejia and Wahba, 1999, 2002; Abadie and Imbens, 2006; Ho et al., 2007; Abadie and Spiess, 2020). Here, we propose to use matching to restore the balance originally achieved by randomization, and broken by the differential survival. We show that careful consideration of the variables matched upon leads to a flexible framework for SACE or CSEs estimation.

The rest of the paper is organised as follows. In Section 2, we present notations and the SACE causal estimand. In Section 3, we consider the main assumptions and provide an identification result for the SACE, that motivates the matching framework we then describe in detail in Section 4. In Section 5, we propose sensitivity analysis approach for the assumptions, using the matching framework, and in Section 6, we show how the matching framework is adapted to the recently-proposed CSEs. In Section 7, we present a simulation study, and in Section 8, we illustrate our framework in a real data analysis. Concluding remarks are offered in Section 9. Code and data set analysis are available from the Github account of the first author.

2 Preliminaries

Using the potential outcomes framework, for each individual in the population, we let and be the survival status and the outcome, respectively, under treatment value , at a pre-specified time after treatment assignment (e.g., one year). The outcome takes values in if . Truncation by death is introduced by setting whenever . Throughout the paper, we assume SUTVA, namely that there is no interference between units and no multiple versions of the treatment leading to different outcomes.

With these notations, one can define the following four principal strata (Frangakis and Rubin, 2002): always-survivors () , protected () , harmed () , and never-survivors () . We also let represent the stratum of unit and , be the stratum proportions. Because the causal contrast is well-defined only among the always-survivors, researchers often set the SACE,

as the target causal parameter (Rubin, 2006).

For a sample of size from the population, the observed data for each individual is , where

is a vector of pre-treatment covariates;

is the treatment indicator ( for a treated individual, and for an untreated individual); is a vector of post-treatment covariates, that are affected by ; is the survival status; is the outcome of interest. From SUTVA it follows that , and . In addition to SUTVA, we assume that treatment was randomized. Randomization for .

SUTVA and randomization suffice to identify the causal effect of on survival , but not the SACE or the CSEs (defined in Section 6). The covariates and play a key role in the identification of the SACE or CSEs. Assumptions on these sets of covariates are necessary, because even though treatment was randomized, truncation by death forces researchers to consider shared causes of the post-treatment variables and , and their relation to the treatment .

ns, pro as, har ns, har as, pro               ns, pro as ns as, pro

Table 1: Relationship between the observed data and the strata. Each cell gives the possible stratum membership of a unit observed with and , with (right) and without (left) the monotonicity assumption.

3 SACE identification

Because the SACE is not point-identified from the observed data under SUTVA and randomization, additional assumptions have been considered. Possibly the most common assumption, which seems plausible in many studies, is monotonicity (Zhang and Rubin, 2003; Egleston et al., 2006; Tchetgen Tchetgen, 2014; Ding and Lu, 2017; Feller et al., 2017; Wang et al., 2017). Monotonicity. . Monotonicity means that treatment could not cause death, that is, the harmed stratum is empty. Under monotonicity, SUTVA, and randomization, the proportion of always-survivors is identifiable from the data by . This proportion is crucial in practice, because it speaks to the proportion of the population for which the SACE is relevant. Table 1 presents the possible stratum membership for each unit according to its observed values, with and without the monotonicity assumption.

Our next assumption is strongly related to the assumptions known as Principal Ignorability (PI) and General Principal Ignorability (GPI) (Jo and Stuart, 2009; Feller et al., 2017; Ding and Lu, 2017). These assumptions, and their variants, state conditional independence between the potential non-survival outcomes and the principal stratum membership, conditionally on covariates. In this section, we consider an assumption we term partial principal ignorability (PPI) stating that for study participants who would have survived under treatment, conditionally on covariates , the outcome under treatment is not informative about the survival status had those participants were untreated. Partial Principal Ignorability (PPI). . Under a nonparametric structural equation model with independent errors (Pearl, 2009), PPI holds in the direct acyclic graph (DAG) depicted in the left panel of Figure 1, but not in the right panel of the figure. In general, we will refer to as pre-treatment, although could have included post-treatment covariates, say , as long these were measured before and are not caused by . Under randomization, the latter can be falsified by testing whether . Importantly, PPI implies that the post-treatment covariates do not include shared causes of and , that is, that there are no mediators between of that also affect . An assumption analogue to PPI in the time-to-event setting was also considered by Xu et al. (2020).

Figure 1: DAGs representing the problem of truncation by death with (right) and without (left) post-treatment shared cause of survival and the outcome ()

The following identification result motivates our proposed matching framework for SACE estimation.

Proposition 1

Under SUTVA, randomization, monotonicity and PPI, the SACE is identified from the observed data by

(1)

The proof is given in Section A.1 of the Appendix. We note that although Proposition 1 requires strong assumptions, SUTVA and randomization hold in many RCTs, monotonicity is a common assumption in the SACE literature and PPI is similar to assumptions previously adopted for SACE identification (Hayden et al., 2005; Ding and Lu, 2017). Nevertheless, we present sensitivity analysis methods for each of these assumptions in Section 5 and, in Section 6, we discuss how alternative assumptions can be utilized for identification of CSEs (Stensrud et al., 2020).

PPI and monotonicity are cross-world assumptions that differ at the (in)dependence they impose. Monotonicity imposes a specific form of dependence between the survival status in the two worlds, while PPI imposes (conditional) independence between the outcome in one world and survival in the other world . Interestingly, the same identification result as (1) can be obtained by weakening the monotonicity assumption, while paying the price in assuming a stronger version of PPI. Formally, we present the following two assumptions

Strong partial principal ignorability (SPPI). for .

SPPI was also assumed by Hayden et al. (2005), who added a strong survival status independence assumption to obtain identification for the SACE. Hayden et al. (2005) also developed a sensitivity analysis method without these two assumptions.

Our next assumption uses the principal score , where (Hill et al., 2002; Jo and Stuart, 2009; Ding and Lu, 2017; Feller et al., 2017). For example, the probability of being an always-survivor given the covariates is . Weighting methods that use principal scores for identification and estimation of principal stratum effects have also been developed (Jo and Stuart, 2009; Ding and Lu, 2017; Feller et al., 2017).

Constant principal score ratio (CPSR). For all possible values for some possibly-unknown . The CPSR assumption asserts that the ratio between the proportion of harmed and always-survivors is constant at all levels of . It is weaker than the monotonicity assumption, under which . Replacing PPI with SPPI, and monotonicity with CPSR, we obtain the same identification result as in Proposition 1.

Proposition 2

Under SUTVA, randomization, SPPI and CPSR (for any ), the SACE is identified from the observed data by (1).

The proof is given in Section A.2 of the Appendix.

4 A matching framework for the SACE

Propositions 1 and 2 motivate us to consider matching for SACE estimation. Equation (1) implies that a consistent SACE estimator can be obtained by comparing the treated and untreated survivors at each level of , and then average the estimated differences according to the distribution of in the group. Note that under monotonicity, all members of the group are always-survivors, and furthermore, ignoring finite-sample variability, the conditional distribution of is identical to the conditional distribution of ; see Lemma 2 in Section A.1 of the Appendix. Therefore, our proposed framework seeks to create a matched sample with the following properties:

  1. Survivors only. The matched sample includes only survivors .

  2. Closeness. Within each matched pair or group of the matched sample, the values of should be close to each other.

  3. Distribution preservation. The matched sample should have the distribution of as in the group.

The closeness property underpins having the differences between the treated and untreated at each matched pair/group being sensible estimators of . The distribution preservation property ensures the outer expectation in (1) could be correctly estimated nonparametrically.

In observational studies, we seek to create balance between the treated and the untreated with respect to pre-treatment confounders associated with treatment and the outcome. The key notion motivating our approach is that for SACE estimation, we also seek to create balance between the treated and the untreated, but with respect to covariates that are associated with the non-mortality outcome and, due to selection bias caused by survival, are also associated with the treatment.

With the three properties in mind, we propose the following general matching procedure and analysis for SACE estimation, which parallels the use of matching in observational studies.

  1. Identify covariates for which PPI is a reasonable assumption.

  2. Choose a distance measure determining proximity of two units , with covariate vectors .

  3. Create matched pairs or groups by matching each unit from to a unit(s) from the group while minimizing the total distances within pairs/groups.

  4. Analyze the matched sample.

Step (i) pertains to SACE identification, discussed in the previous section. Regarding the distance measure (Step (ii)), for the standard use of matching for observational data, three common choices are exact distance (zero if covariate vectors are identical, infinity otherwise), Mahalanobis distance, or differences in the (logit of) the propensity score. The exact or Mahalanobis distance can also be used for achieving balance on

. However, the number of covariates needed for PPI or SPPI to hold can be non-small, making the exact and Mahalanobis distances less attractive as distance measures, especially when there are both discrete and continuous covariates on which one would like to match.

In observational studies, matching on the propensity score is justified by its balance properties, shown in the seminal work of Rosenbaum and Rubin (1983). Recently, analogous results for balance on covariates with respect to different principal strata were derived for functions of the principal scores (Ding and Lu, 2017; Feller et al., 2017). Specifically, let , and . Matching on seems attractive because given , always-survivors and protected in are balanced with respect to (Ding and Lu, 2017). Thus, due to randomization, conditionally on , the untreated always-survivors and units in are balanced with respect to . The principal score function can be estimated, e.g., using an EM algorithm (Ding and Lu, 2017). Specifically, under monotonicity, the stratum membership can be modeled using three-categories latent class multinomial regression.

Taking the principal score as the sole distance measure, i.e., , may suffer from disadvantages analogue to those of the propensity score in observational studies, namely, two units with very different covariate vectors can have similar . Furthermore, a potential disadvantage shared by both matching on and weighting using is that the analysis relies on correct model specification for the principal scores. This challenge is amplified for principal score models, as they model a latent stratum which is not directly observable for all study participants.

One potential remedy is to combine the Mahalanobis distance with a caliper (Rubin and Thomas, 2000; Stuart, 2010) on the principal score, namely

(2)

where and are subsets of key covariates from and , respectively, is the empirical covariance matrix of and is a user-specified threshold.

Turning to Step (iii) above, if all are matched, then if the closeness property was achieved, then distribution preservation holds in the matched sample. However, the number of always-survivors in the and groups might differ (e.g., if randomization probabilities are unequal). This may harm both closeness if inadequate matches are made and distribution preservation if not all group members are matched. A remedy to this problem is found by considering matching with replacement, so potential always-survivors from can be chosen as matches for more than one individual from . Finally, the matching framework offers a number of potential methods and variants, among which assigning multiple treated to each untreated unit (1: matching), or implementing full matching (Hansen, 2004).

4.1 Analysis of the matched data

Upon achieving reasonable balance, researchers can turn their attention to estimation of and inference about the SACE (Step (iv)). One key use of matching has been as a way to achieve balance so randomization-based inference and estimation can be carried out (Rosenbaum, 2011)

. Consider the sharp null hypothesis among the always-survivors

(3)

The proposed matching framework enables researchers to answer a basic question: is there evidence for a causal effect of the treatment on the non-mortality outcome for at least one person? Failure to reject the null (3) means the data do not support a positive answer to this question.

In a simple 1:1 matched set without replacement, the null hypothesis (3) can be tested using permutation tests, such as Wilcoxon signed-rank test. For 1:1 matching with replacement, the null of no causal effect can be non-parametrically tested by first grouping each treated survivor with all of their untreated matches, and then carrying out the aligned-rank test (Hodges and Lehmann, 1962). We illustrate the use of this test in Section 8.

We turn to estimation and inference based on the sampling distribution. For standard observational studies, the average treatment effect is often estimated by the crude difference between the treated and the untreated among the matched sample, and (paired or unpaired) Student’s -test is used for inference. However, because the analogue of closeness for matching in observational studies typically cannot be achieved with non-small and/or continuous confounders, bias is expected. Abadie and Imbens (2006) studied the magnitude of this bias, gave conditions under which the crude difference estimator is consistent, and derived the asymptotic distribution of this estimator.

It is generally recommended to consider covariate-adjusted treatment effects beyond the crude differences in the matched sample (Stuart, 2010). This may both reduce bias from inexact matching, and improve efficiency of estimators. For example, one may consider a regression model including the treatment and the covariates that were used for matching. In our case, this translates to the model

(4)

and non-linear terms, interactions and additional variables associated with can also be included. Easy to see from (1) that under model (4), the SACE equals to . More generally, given a model for , with parameters , we can write

(5)

for some induced by the model. Note that this model can be either non-parametric, semi-parametric or fully parametric. The analysis of the matched dataset starts with obtaining estimates , possibly using weighed estimation when matching is either 1: or with replacement. Then, the SACE can be estimated by , where .

A general variance estimator for

is unavailable, as variance estimation for post-matching estimators is an active field of research. To deal with that matching created dependence within pairs/groups, researchers in practice often include the matching pair/group as a random effect, although others recommended to ignore the uncertainty resulting from the matching procedure (Ho et al., 2007)

. Even for the simple linear regression case, only recently

Abadie and Spiess (2020)

have shown that the variance of the estimated coefficients in post-matching regression can be consistently estimated by clustered standard errors that are available from any standard software. Their results are valid only for 1:

matching without replacement. Furthermore, Abadie and Imbens (2008) have shown that when matching is implemented with replacement, the standard bootstrap cannot be used. Abadie and Imbens (2006) derived large sample properties of matching estimators (when matching is with replacement), and proposed consistent variance estimators. They have shown that matching estimators are not -consistent in general, and described the required conditions for the matching estimators to be -consistent.

As an alternative to the crude mean difference or estimators based on model fitting in the matched data set, we also consider a bias-corrected (BC) approach (Abadie and Imbens, 2011). The BC estimator combines any regression model fitted in the original dataset, with a mean difference estimator. The idea is that the BC estimator adjusts for the bias resulting from inexact matching by adding a bias term to the difference of the means estimator. Abadie and Imbens (2011) showed that the BC approach yields -consistent and asymptotically normal estimators for the average treatment effect and the average treatment effect in the treated. Their results can be naturally extended to our case, by recognizing that our proposed approach essentially estimates the average effect on the untreated among the survivors. This is reflected by the outer expectation in (1) being over .

With this in mind, we present the BC estimator for the SACE under 1:1 matching. Prior to creating the matched sample, a regression estimator for is constructed. Let index the matched pair, and let and be the observed outcome and covariates for the unit with in matched pair . Then, following Abadie and Imbens (2011), let

be the imputed outcome for the treated unit in matched pair

. The BC SACE estimator is

(6)

5 Sensitivity analyses

Causal inference identification relies on assumptions that are untestable from the data. Therefore, methods are often coupled with sensitivity analyses that relax one or more of the identifying assumptions to obtain an identification result as a function of a meaningful, but unknown, sensitivity parameter(s). Then, researchers can vary the value of this parameter to receive a curve of possible estimates as a function of this parameter. In this paper, because neither monotonicity nor PPI can be tested from the data or even falsified in a future trial, we propose two sensitivity analyses, each is driven by an identification formula for the SACE as a function of meaningful sensitivity parameters.

Our approach is inspired by the sensitivity analysis proposed by Ding and Lu (2017) for their weighting-based methods. For each of the assumptions, we present an identification result that, analogously to Propositions 1 and 2, direct us how to develop matching-based estimators. As a result, our sensitivity parameters, which are chosen to be meaningful for the truncation by death setting, sometimes differ from those of Ding and Lu (2017). Before turning to the individual sensitivity analysis, we first define to be the mean potential outcome under for units within the principal strata with covariate values .

5.1 Sensitivity analysis for PPI

Our proposed sensitivity analysis for departures from PPI uses the sensitivity parameter . In words, is the ratio between the mean outcomes under treatment of the always-survivors and of the protected at any value of . Because under PPI, values further away from one reflect a more substantial deviation from PPI. Whether values larger or smaller than one (or both) are taken for should be based on subject-matter expertise. For example, when higher outcome values reflect better health, the always-survivors might be expected to be healthier than the protected and hence the sensitivity analysis would focus on values smaller than one.

The following proposition provides the basis for the proposed sensitivity analysis for PPI using matching.

Proposition 3

Under randomization, SUTVA and monotonicity, the SACE is identified from the data as a function of by

(7)

The proof is given in Section A.3 of the Appendix. As one may expect, under PPI () we obtain the same identification formula as in Proposition 1. The larger is at all levels of , the closer (7) is to (1) for all values. Following Proposition 3, matching-based SACE estimation for each value is as follows.

  1. Estimate the principal scores and subsequently , using e.g., the EM algorithm proposed by Ding and Lu (2017).

  2. Implement a matching procedure with the chosen distance measure, possibly, but not necessarily, using the estimated .

  3. Estimate for . For example, using two separate linear regression estimators.

  4. Replace (5) with

    (8)

    and estimate

We illustrate the use of this approach in Section 8. Of note is that our proposed sensitivity parameter is identical to the one proposed by Ding and Lu (2017). Our sensitivity analysis requires estimation of , but offers the flexibility of not using it in the matching process, and thus is expected to reduce, at least to some extent, the dependence of the final results on correct specification of the principal score model.

5.2 Sensitivity analysis for monotonicity

Proposition 2 already enabled us to replace the monotonicity assumption with the weaker CPSR assumption, and obtain identification of the SACE if we also replace PPI with the stronger SPPI assumption. To develop a sensitivity analysis for monotonicity under PPI, without imposing the stronger SPPI, we present here an identification formula under PPI and CPSR, depending on two sensitivity parameters. The first is the previously-defined ratio and the second is .

Under monotonicity we have that , so larger values of reflect larger divergence from monotonicity. In Section A.4 of the Appendix we show that is bounded within the range , where . With a similar logic to the interpretation of , the sensitivity parameter represents the relative frailty of those at the harmed stratum compared to the always-survivors, as reflected by the ratio between mean outcome when untreated.

Importantly, our proposed sensitivity analysis does not formally require estimation of the principal scores. Also of note is that our sensitivity parameters and differ from those chosen by Ding and Lu (2017) as their sensitivity analyses were developed for principal stratification in general. Because we are focused on the truncation by death setting, our sensitivity parameters are simpler to interpret, and hence to choose values for. It turns out that this choice also enables an identification formula analogue to (1) which better motivates our matching approach. The following proposition provides the basis for the proposed sensitivity analysis.

Proposition 4

Under randomization, SUTVA, PPI, and CPSR, the SACE is identified from the data as a function of and by

The proof is given in Section A.2 of the Appendix. Propositions 1 and 2 can be seen as direct corollaries of Proposition 4, because under monotonicity (), and/or under SPPI (), the identification result (4) coincides with the one presented in Propositions 1 and 2.

To utilize Proposition 4 for a sensitivity analysis, we can repeat the following analysis for different combinations of .

  1. Implement a matching procedure as previously described, with or without estimating the principal scores.

  2. Estimate for . For example, using two separate regression estimators.

  3. Replace (5) with

    (9)

    and estimate

If one chooses to use the principal scores in the matching procedure (e.g., with a caliper on ), the EM algorithm for estimating the principal scores should be revised, and implemented separately for each value of . Because our sensitivity parameter differs from the one used by Ding and Lu (2017), modifications of their proposed EM algorithm are needed. The details are given in Section B of the Appendix.

6 Matching for CSEs and for SACE without PPI

CSEs have been recently proposed as an alternative to SACE. For complete motivation, definitions and theory, we refer the reader to Stensrud et al. (2020). Here we provide a summary of key points, before showing how the matching framework can be used for estimating CSEs.

Two key limitations of the SACE are often raised. First, the always-survivors stratum can be a non-trivial subset of the population, and it cannot be known, both for the observed data and for future individuals, who is an always-survivor and who is not. Second, identification of the SACE relies on cross-world assumptions, namely assumptions on the unidentifiable joint distribution of potential outcomes under different intervention values. Such assumptions cannot be tested from the data nor they can be guaranteed to hold by experimental design. Here, both monotonicity and PPI are cross-world assumptions.

Motivated by these limitations, Stensrud et al. (2020) proposed the CSEs approach. A key prerequisite for this approach is having an alternative to the single treatment , represented by two treatments , with potential outcomes and , such that joint interventions on and lead to the same results as interventions on . The latter is formalized by the modified treatment assumption, which implies that setting or lead to the same potential outcomes of and . A second critical assumption is partial isolation, that states there are no causal paths between and . Under this assumption,

Under partial isolation, the CSEs for are

(10)

Stensrud et al. (2020) make the point that unlike the members of the principal stratum , the members of the conditioning set can be observed from the data (under certain assumptions). Note that the interpretation of (10) as the effect of on is non-trivial, because, under the modified treatment assumption and partial isolation, there can still be causal paths between and which do not involve .

The following Proposition provides an identification formula for , which resembles our identification formula for the SACE, and even coincides with it under certain conditions. It additionally relies on a positivity assumption and dismissible component conditions, all described in Section 7 of Stensrud et al. (2020).

Proposition 5

Under randomization, partial isolation, the modified treatment assumption, positivity, and the dismissible component conditions

The proof follows from Theorem 1 of Stensrud et al. (2020), and a few more lines given in Section A.5 of the Appendix. Proposition 5 resembles the identification result for the SACE presented in (1). Note that for estimation of , the process is identical to SACE estimation by matching, with the only change is being that balance should be achieved with respect to both and . For the matching-based approach for estimation of , the distribution preservation property is revised to have the distribution of in the matched set to be the same as in .

7 Simulation studies

To assess the performance of the matching-based approach and to compare the proposed estimators to naive approaches and to the weighting-based method (Ding and Lu, 2017), we conducted simulation studies. The number of simulation repetitions was 1,000 for each simulation scenario. The sample size of each simulated dataset was 2,000. For the simulations and the data analysis in Section 8, we used the R package Matching (Sekhon, 2011).

7.1 Data generating mechanism

For each unit , a covariate vector of length

was simulated from the multivariate normal distribution,

, where is a vector of length with entries 0.5, and

is the identity matrix of dimension

. Then, the stratum

was generated under monotonicity according to a multinomial logistic regression model, with the protected as the reference stratum,

(11)

for and .

For the always-survivors, the potential outcomes were generated according to linear regression models with means for and additive correlated normal errors with zero mean, unit variance, and correlation . For the protected, only was generated (according to the same marginal model), and for the never-survivors, the outcomes were truncated by death under both treatment values. For the linear regression models, we considered scenarios including all - interactions and scenarios without including any interaction. The treatment assignment was randomized with probability . Finally, the observed survival status and outcomes were determined by , and .

7.1.1 Scenarios

To have a fair assessment of the finite-sample performance of the different methods, we considered a variety of scenarios under different number of covariates, stratum models and outcome models. Complete specification of all simulation parameters is given in Section C.1 of the Appendix.

In Scenario A, the always-survivors stratum comprised 50% of the population. In Scenario B, the always-survivors stratum proportion was larger and comprised 75% of the population. In each of these scenarios, we considered the option of relatively high versus relatively low proportion of protected; see Table 8 in Section C.1 of the Appendix. These scenarios were created by choosing different values for the multinomial logistic regression coefficients and , for . For each stratum, the coefficients , , were the same for all covariates, and were changed according to the number of covariates to obtain the desired stratum proportions.

In reality, the functional form of the principal score is unknown to the researchers. Therefore, we additionally conducted simulations under misspecification of the principal score model. In these scenarios, as an example, the true principal score model included squared and log terms of two of the covariates. The proportions of always-survivors were slightly different than before (Table 8).

7.2 Analyses

For each of the simulated datasets, we calculated the two naive estimators – mean difference in the survivors, and the mean difference in the composite outcomes. For the matching approach, we followed the procedure described in Section 4.1, by matching every untreated survivor to a treated survivor. We compared matching on the Mahalanobis distance; matching on under model (11), estimated using an EM algorithm; and matching on Mahalanobis distance with a caliper on , taking (Equation (2

)) to be 0.25 standard deviation (SD) of the estimated

. For each of the above options, we considered matching both with and without replacement of treated survivor units. Of note is that in practice, as we also illustrate in Section 8, one chooses the matching distance that achieves the best balance, so our assessment of the matching framework approach is likely to be a bit pessimistic. As indicated in Section 7.1.1, in some of the scenarios considered the principal score model was misspecified.

In the matched datasets, we considered the following estimators: the crude difference estimator; linear regression estimators without and with all interactions; the BC matching estimator (6) (with linear regression estimator for ). The latter was only used when matching was with replacement. For the regression estimators, least squares (LS) was used when matching was without replacement and weighted least squares (WLS) when matching was with replacement.

When matching was without replacement, we used the simple standard error (SE) estimator of the paired differences for the crude differences and clustered SE for the LS (Abadie and Spiess, 2020). For matching with replacement, we used the SE estimator proposed by Abadie and Imbens (2006) for the crude difference, clustered SEs for the WLS estimator, and the SE estimator proposed by Abadie and Imbens (2011) for the BC estimator.

We compared our matching-based approach to the model-assisted weighting-based estimator of Ding and Lu (2017) (DL), which uses the estimated principal scores. This estimator was found by the authors to be more efficient than the simple weighting estimator.

Figure 2: Bias of different estimators under correctly specified (left panel) and misspecified (right panel) principal score model. True outcome model included - interactions. The true SACE parameter ranged between 3.36 - 3.92 for , 4.86 - 5.99 for , and 8.29 - 9.48 for .

Principal score model correctly specified Principal score model misspecified Method Estimator Mean Emp.SD Est.SE MSE CP95 Mean Emp.SD Est.SE MSE CP95 Scenario A (), SACE = 5.65 Scenario A (), SACE = 4.92 Matching Crude 5.60 0.19 0.18 0.039 0.93 4.93 0.20 0.19 0.041 0.93 LS 5.63 0.17 0.17 0.028 0.95 4.94 0.16 0.16 0.027 0.95 LS-I 5.64 0.17 0.18 0.030 0.96 4.92 0.17 0.17 0.029 0.96 Matching Crude 5.62 0.19 0.18 0.038 0.93 4.92 0.20 0.18 0.041 0.92 with BC 5.64 0.18 0.20 0.032 0.98 4.92 0.18 0.20 0.032 0.97 replacement WLS 5.64 0.18 0.18 0.031 0.96 4.93 0.18 0.18 0.031 0.96 WLS-I 5.64 0.18 0.18 0.032 0.95 4.92 0.18 0.18 0.032 0.94 Composite 13.0 0.59 0.60 54.31 0.00 11.3 0.64 0.62 40.58 0.00 Naive 4.69 0.36 0.36 1.052 0.23 4.47 0.37 0.37 0.341 0.78 DL 5.65 0.16 0.027 5.14 0.17 0.077 Scenario B (), SACE = 6 Scenario B (), SACE = 4.86 Matching Crude 5.89 0.22 0.16 0.059 0.80 4.43 0.30 0.19 0.276 0.43 LS 5.96 0.11 0.13 0.013 0.97 4.78 0.14 0.14 0.026 0.92 LS-I 5.98 0.13 0.14 0.018 0.96 4.86 0.16 0.14 0.025 0.91 Matching Crude 5.80 0.15 0.14 0.058 0.72 4.75 0.17 0.15 0.041 0.86 with BC 5.99 0.14 0.17 0.019 0.98 4.86 0.16 0.18 0.027 0.96 replacement WLS 5.95 0.13 0.15 0.019 0.96 4.85 0.16 0.16 0.025 0.95 WLS-I 5.99 0.14 0.15 0.019 0.96 4.86 0.16 0.15 0.027 0.92 Composite 7.51 0.55 0.55 2.613 0.21 10.3 0.66 0.55 29.71 0.00 Naive 4.22 0.31 0.31 3.252 0.00 4.53 0.34 0.33 0.226 0.81 DL 6.00 0.12 0.015 4.95 0.16 0.035

Table 2: Selected simulation results for Scenarios A and B, with high , covariates, and when the true model did not include - interactions. Matching was carried out using Mahalanobis distance with caliper on absolute difference (0.25 estimated SD). LS: least squares; LS-I: least squares with interactions; WLS: weighted least squares; WLS-I: weighted least squares with interactions. DL: model based weighing estimator of Ding and Lu (2017)

Emp.SD: empirical standard deviation. Est.SE: empirical standard error; MSE: mean square error; CP95: empirical coverage proportion of 95% confidence interval.

7.3 Results

Results in this section are presented for the case the true outcome model included all interactions. Figure 2 presents the bias of the DL estimator and of selected post-matching estimators, when matching was done with replacement using the Mahalanobis distance with a caliper on . Table 2 presents further results for Scenarios A and B with covariates and high . Additional results are presented in Section C.2 of the Appendix.

The absolute bias of the crude estimator increased as the number of covariates increased. This bias was apparent only in part of the scenarios, and was mitigated when matching on only (Figure 5). When the principal score model was correctly specified, the regression and BC matching estimators, and the DL estimator, all performed well in terms of bias. When the principal score model was misspecfied, the DL estimator was biased, while the estimators that followed matching with replacement were generally unbiased.

In Scenario B, the bias of the crude estimator was larger when replacements were allowed. When a caliper is imposed, matching with replacement can sometimes lead to poor matches, in comparison to matching without replacement. The caliper may prevent matches with a low Mahalanobis distance, and allow matches with a large Mahalanobis distance; this phenomenon is more likely to occur when matching is carried out with replacement. Importantly, the regression estimators did not suffer from this problem.

SEs were generally well estimated, except those of the crude estimator after matching without replacement and the BC estimator. In terms of MSE, the LS estimator without interactions and DL estimator outperformed the other post-matching estimators when the principal score model was correctly specified. When the principal score model was misspecified, nearly all matching estimators had substantially lower MSE than the DL estimator (Tables 2 and 9).

The empirical coverage rates of the 95% confidence intervals for the regression-based estimators were close to the desired level, except for the LS estimators after matching without replacement in Scenario B. For the BC estimators, the coverage rate of the confidence intervals was conservative.

Generally, the empirical SDs have increased as increased and as decreased. The estimated SEs of the BC estimator became more biased as increased; see Table 9. Comparing different distance measures (Figure 5), the bias of the crude estimator was the lowest after matching on , and the largest after matching on the Mahalanobis distance. The results were similar when the true outcome model did not include interactions, although the absolute bias and the SEs of the regression estimators were lower, and the SE of the BC estimator was well estimated; see Figure 4 and Table 10.

8 Illustrative data example

The National Supported Work (NSW) Demonstration was an employment training program, carried out in the United States during the early 1970s. The NSW aimed to provide work experience for disadvantaged workers and to help them acquire capabilities required at the labor market. Eligible candidates were randomized to participate in the program (treated, ) or to not receive assistance from the program (untreated, ). Further details about the program can be found elsewhere (LaLonde, 1986; Dehejia and Wahba, 1999). The dataset consists of participants in total, (41%) treated (59%) untreated.

The outcome of interest was earnings in 1978 in terms of 1982 US dollars (). However, a challenge in considering the effect of the training program on earnings is that only participants () were employed during 1978, () in the treated and () in the untreated. Thus, unemployment at 1978 () represents “truncation by death”. Table 3 provides the number and proportion of participants within the four values of .

Total 129 (18%) 296 (41%) 425 (59%) 67 (9%) 230 (32%) 297 (41%) Total 196 (27%) 526 (73%) 722 (100%)

Table 3: Observed treatment and employment status in the NSW data

A number of pre-randomization variables that are possibly shared causes (or good proxies of shared causes) of employment status and earnings in 1978 are available

(Dehejia and Wahba, 1999). These include age, years of education, not having an high school degree, race (white/black/hispanic), marital status (married or not married), and employment status and real earnings in 1975. A description of these covariates is given in Table 11 in Section D of the Appendix.

One may expect that these covariates were generally balanced due to randomization in the original sample, and except for years of education, having an high school degree and the employment status in 1975, this seems to be the case (Table 4). Looking at those employed in 1978 (), the imbalance got more substantial. Among the employed, compared to the untreated, the treated were older, less likely to be hispanic, and more likely to be married, to own a high school degree, and to be employed in 1975. At each treatment arm, earnings and employment rates in 1975 were higher among the employed, suggesting that participants who managed to acquire a position in 1978 were possibly more skilled before entering the program.

. Full Employed Matched Untreated Treated SMD Untreated Treated SMD Untreated Treated SMD Continuous Age 24.4 (6.6) 24.6 (6.7) 0.03 24.1 (6.6) 24.6 (6.7) 0.09 24.1 (6.6) 23.7 (6.1) -0.05 Education 10.2 (1.6) 10.4 (1.8) 0.12 10.2 (1.6) 10.4 (1.9) 0.13 10.2 (1.6) 10.2 (1.4) 0.02 Earnings75 3.0 (5.2) 3.1 (4.9) 0.01 3.4 (5.7) 3.3 (5.1) -0.02 3.4 (5.7) 3.2 (5.2) -0.03 Discrete Black 340 (80%) 238 (80%) 0.00 224 (76%) 176 (77%) 0.02 224 (76%) 232 (78%) 0.06 Hispanic 48 (11%) 28 (9%) -0.06 41 (14%) 25 (11%) -0.09 41 (14%) 37 (12%) -0.04 Married 67 (16%) 50 (17%) 0.03 45 (15%) 44 (19%) 0.11 45 (15%) 42 (14%) -0.03 Nodegree 346 (81%) 217 (73%) -0.21 238 (80%) 167 (73%) -0.20 238 (80%) 243 (82%) 0.04 Employed75 247 (58%) 186 (63%) 0.09 182 (61%) 150 (65%) 0.08 182 (61%) 177 (60%) -0.03

Table 4: Covariates balance in the full, employed and the matched dataset. For each dataset, means (SDs) for continuous covariates and frequencies (proportions) for discrete covariates are presented according to treatment status, along with SMD. Matching was carried out using the Mahalanobis distance with a caliper on ()

8.1 Results

We present and describe the main results in this section. Additional information is available from Section D of the Appendix. The composite outcome approach yielded an estimated difference of 886 US dollars (CI95%: -70, 1842) between those participated in the program and those who did not. However, because the program increases employment by approximately 8% () (one-sided test), the above estimates cannot be interpreted as a causal effect on the earnings. The naive difference in the survivors, that does not correspond to a causal effect of interest is 409 (CI95%: -691, 1509).

The always-survivors (here, always-employed) stratum proportion is identifiable under monotonicity and was estimated to be , which is quite high in comparison to and . Importantly, this means the causal effect among the always-survivors stratum is of a high interest in this example, as the always-survivors comprise more than two thirds of the study population.

Turning to the matching, because we had several discrete covariates and a number of continuous covariates, neither exact matching nor the Mahalanobis distance seemed attractive. Therefore, we used the Mahalanobis distance with a caliper on , as discussed in Section 4.

In our main analysis, we included the continuous covariates age, education and earnings in 1975 in the Mahalanobis distance. We used an EM algorithm to estimate the principal score model with the continuous covariates age and earnings in 1975, and the discrete covariates black, hispanic, marital status, and employment status in 1975. The estimated multinomial regression (11) coefficients are given in Table 12. Compared to the protected, always-survivors were younger, more likely to be white, not married and not employed at 1975.

After trial and error to improve the balance, we chose for the caliper (2). As can be seen from Table 4, in the matched sample the covariates were generally balanced, and compared to employed (survivors) sample, notable improvements were achieved in the balance of the covariates education, high-school degree and employment status in 1975. Compared to our chosen distance measure, matching using the Mahalanobis distance without a caliper, the balance has been improved for some covariates (e.g. age), but has noticeably deteriorated for marital status. Matching only on provided inferior results for most covariates (Table 13).

Because there were less treated than untreated employed, we used matching with replacement, resulting in matching of all untreated employed participants, thus achieving the distribution preservation property. The alternative of matching without replacement resulted in only matches out of the () untreated (Table 13).

Turning to the analysis of the matched sample, beyond the crude difference, we also fitted a linear regression model (using WLS) for the outcome using the covariates age, education, black, hispanic, marital status, and earnings in 1975, and compared models with and without all treatment-covariates interactions. For the model with interactions, we estimated the SACE by plugging in the fitted model in (5), and then averaged the obtained quantities across the matched untreated employed. We used the same set of covariates for the regression model utilized by the BC estimator. We estimated the SEs as described in Section 7 and then calculated Wald-type 95% confidence intervals.

For the proposed approach, the crude difference in the matched data set was 435 (CI95%: -601, 1472), and 451 (CI95%: -995, 1896) and 380 (CI95%: -1021, 1782) for the weighted regression causal effects based on weighted linear regression without and with, respectively, all treatment-covariates interactions. The BC estimator (6) was 343 (CI95%: -1103, 1790). The non-parametric aligned-rank test (Hodges and Lehmann, 1962; Heller et al., 2009) revealed no evidence for rejecting the sharp null hypothesis of no individual causal effect (). The model-assisted weighting approach using the principal scores (Ding and Lu, 2017) estimated the SACE to be 387 (CI95%: -654, 1428 using Efron’s percentile method with 500 bootstrap samples). To summarize, there was no evidence for causal effect on the earnings in the always-survivors under monotonicity and PPI. In the next section, we explore how relaxing these assumptions affect the conclusions using the sensitivity analyses described in Section 5.

When matching was on the Mahalanobis distance solely, the estimates were slightly lower, while the estimates and the estimated SEs after matching on were substantially lower. Regardless of the distance measure we have used, the estimated SEs of the crude estimator were lower than estimated SEs of the other estimators (Table 14).

8.2 Sensitivity analyses

We used the same matching as the main analysis for the sensitivity analyses. We present results for the WLS estimator without interactions after matching on the Mahalanobis distance with a caliper. Results under different distance measures and using several estimators are presented in Section D of the Appendix

We start with sensitivity analysis for PPI according to the approach described in Section 5.1 that uses Proposition 3. We considered values within where values smaller than one (larger than one) might imply the always-survivors are believed to be more (less) skilled than the protected and hence are expected to have higher earnings had they participated in the program.

As can be seen from the left panel of Figure 3, as increases, the SACE decreases. For , the SACE was estimated to be negative, meaning that the program decreases the mean earnings among the always-survivors. For all values, the estimates were not significantly different from zero at the significance level. Hence, reasonable violations of PPI would not have changed our conclusions.

Figure 3: Sensitivity analyses for the SACE estimation in the NSW data. The left panel presents the SACE estimate as a function of , without assuming PPI. The right panel presents SACE estimate as a function of and without assuming monotonicity.

Turning to sensitivity analysis for monotonicity according to the approach described in Section 5.2 based on Proposition 4. A possible violation of monotonicity might be that program participants raise their reservation wages (the minimum wage they require in order to accept a job) due to the program, and therefore may refuse low-earnings job offers they would have accepted had they did not participate in the program (Zhang et al., 2009). Without monotonicity, the principal scores are still identifiable from the data, but now as a function of . In the NSW data, and were estimated to be 0.7 and 0.77, respectively, and therefore we considered . Similar to before, had we believed the always-survivors are more skilled than the harmed (had they were not participating in the program), we would have limited the to be smaller than one.

As can be seen from the right panel of Figure 3, for a fixed , as increases, the SACE increases. As increases, changes in have bigger impact on the SACE estimate. This is expected from Proposition 4. For , the positive effect of the program became significant for large enough values. For example, for , meaning there are 2.5 times more always-survivors than harmed at each level of , the SACE is significantly different from zero if the ratio between the mean outcomes of the always-survivors and the harmed when untreated is at least 1.5. The results of the sensitivity analyses under different distance measures and using several estimators were overall similar to the results presented in Figure 3 (Section A.8 of the Appendix).

9 Discussion

In this paper, we presented a matching-based approach for estimation of well-defined causal effects in the presence of truncation by death. Underpinning our approach is that the balance created by randomization and lost due to differential survival can be retrieved by a matching procedure achieving the three properties: Survivors only, closeness, and distribution preservation.

To fix ideas, we focused in this paper on 1:1 matching with or without replacement, and on classical matching distance measures that are heavily used in practice (with the principal score taking the role of the propensity score). In practice, 1:k matching can also be used. Other distance measures or newly-developed matching procedures are also possible to use, and might be more attractive, especially when rich data are available and is high-dimensional. Regardless of the quality of balance achieved on the observed covariates, researchers should be aware that the identifying assumptions for the SACE are strong and cannot be falsified. Hence, the analysis should be accompanied with the proposed sensitivity analyses, or focus on CSEs.

Matching methods are one of the most popular tools in the causal inference toolbox of practitioners. We provided in this paper a theoretical basis and discussed implementation details for adapting matching methods to adequately overcome truncation by death. We therefore expect our work to have an immediate impact on analyses when truncation by death is present.

References

  • Abadie and Imbens (2006) Abadie, A. and G. W. Imbens (2006). Large sample properties of matching estimators for average treatment effects. econometrica 74(1), 235–267.
  • Abadie and Imbens (2008) Abadie, A. and G. W. Imbens (2008). On the failure of the bootstrap for matching estimators. Econometrica 76(6), 1537–1557.
  • Abadie and Imbens (2011) Abadie, A. and G. W. Imbens (2011). Bias-corrected matching estimators for average treatment effects. Journal of Business & Economic Statistics 29(1), 1–11.
  • Abadie and Spiess (2020) Abadie, A. and J. Spiess (2020). Robust post-matching inference. Journal of the American Statistical Association just-accepted, 1–37.
  • Colantuoni et al. (2018) Colantuoni, E., D. O. Scharfstein, C. Wang, M. D. Hashem, A. Leroux, D. M. Needham, and T. D. Girard (2018). Statistical methods to compare functional outcomes in randomized controlled trials with high mortality. BMJ 360.
  • Dehejia and Wahba (1999) Dehejia, R. H. and S. Wahba (1999). Causal effects in nonexperimental studies: Reevaluating the evaluation of training programs. Journal of the American statistical Association 94(448), 1053–1062.
  • Dehejia and Wahba (2002) Dehejia, R. H. and S. Wahba (2002). Propensity score-matching methods for nonexperimental causal studies. Review of Economics and statistics 84(1), 151–161.
  • Ding et al. (2011) Ding, P., Z. Geng, W. Yan, and X.-H. Zhou (2011). Identifiability and estimation of causal effects by principal stratification with outcomes truncated by death. Journal of the American Statistical Association 106(496), 1578–1591.
  • Ding and Lu (2017) Ding, P. and J. Lu (2017). Principal stratification analysis using principal scores. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 79(3), 757–777.
  • Egleston et al. (2006) Egleston, B. L., D. O. Scharfstein, E. E. Freeman, and S. K. West (2006). Causal inference for non-mortality outcomes in the presence of death. Biostatistics 8(3), 526–545.
  • Feller et al. (2017) Feller, A., F. Mealli, and L. Miratrix (2017). Principal score methods: Assumptions, extensions, and practical considerations. Journal of Educational and Behavioral Statistics 42(6), 726–758.
  • Frangakis and Rubin (2002) Frangakis, C. E. and D. B. Rubin (2002). Principal stratification in causal inference. Biometrics 58(1), 21–29.
  • Garcia Jaramillo and Hill (2010) Garcia Jaramillo, S. and J. Hill (2010). Impact of conditional cash transfers on children’s school achievement: evidence from Colombia. Journal of Development Effectiveness 2(1), 117–137.
  • Hansen (2004) Hansen, B. B. (2004). Full matching in an observational study of coaching for the sat. Journal of the American Statistical Association 99(467), 609–618.
  • Hayden et al. (2005) Hayden, D., D. K. Pauler, and D. Schoenfeld (2005). An estimator for treatment comparisons among survivors in randomized trials. Biometrics 61(1), 305–310.
  • Heller et al. (2009) Heller, R., E. Manduchi, and D. S. Small (2009). Matching methods for observational microarray studies. Bioinformatics 25(7), 904–909.
  • Hill et al. (2002) Hill, J., J. Waldfogel, and J. Brooks-Gunn (2002). Differential effects of high-quality child care. Journal of Policy Analysis and Management: The Journal of the Association for Public Policy Analysis and Management 21(4), 601–627.
  • Ho et al. (2007) Ho, D. E., K. Imai, G. King, and E. A. Stuart (2007). Matching as nonparametric preprocessing for reducing model dependence in parametric causal inference. Political analysis 15(3), 199–236.
  • Hodges and Lehmann (1962) Hodges, J. and E. L. Lehmann (1962). Rank methods for combination of independent experiments in analysis of variance. The Annals of Mathematical Statistics 33(2), 482 – 497.
  • ICH (2020) ICH (2020). ICH E9 (R1) Addendum on estimands and sensitivity analysis in clinical trials to the guideline on statistical principles for clinical trials. Technical report.
  • Jo and Stuart (2009) Jo, B. and E. A. Stuart (2009). On the use of propensity scores in principal causal effect estimation. Statistics in medicine 28(23), 2857–2875.
  • Koenen et al. (2017) Koenen, K. C., J. A. Sumner, P. Gilsanz, M. M. Glymour, A. Ratanatharathorn, E. B. Rimm, A. L. Roberts, A. Winning, and L. D. Kubzansky (2017). Post-traumatic stress disorder and cardiometabolic disease: improving causal inference to inform practice. Psychological Medicine 47(2), 209–225.
  • LaLonde (1986) LaLonde, R. J. (1986). Evaluating the econometric evaluations of training programs with experimental data. The American economic review, 604–620.
  • Lee (2009) Lee, D. S. (2009). Training, wages, and sample selection: Estimating sharp bounds on treatment effects. The Review of Economic Studies 76(3), 1071–1102.
  • McConnell et al. (2008) McConnell, S., E. A. Stuart, and B. Devaney (2008). The truncation-by-death problem: what to do in an experimental evaluation when the outcome is not always defined. Evaluation Review 32(2), 157–186.
  • Nevo and Gorfine (2020) Nevo, D. and M. Gorfine (2020). Causal inference for semi-competing risks data. arXiv preprint arXiv:2010.04485.
  • Pearl (2009) Pearl, J. (2009). Causal inference in statistics: An overview. Statistics surveys 3, 96–146.
  • Robins (1986) Robins, J. (1986). A new approach to causal inference in mortality studies with a sustained exposure period—application to control of the healthy worker survivor effect. Mathematical modelling 7(9-12), 1393–1512.
  • Rosenbaum (2011) Rosenbaum, P. R. (2011). Observational studies. Springer.
  • Rosenbaum and Rubin (1983) Rosenbaum, P. R. and D. B. Rubin (1983). The central role of the propensity score in observational studies for causal effects. Biometrika 70(1), 41–55.
  • Rubin (2006) Rubin, D. B. (2006). Causal inference through potential outcomes and principal stratification: application to studies with “censoring” due to death. Statistical Science 21(3), 299–309.
  • Rubin and Thomas (2000) Rubin, D. B. and N. Thomas (2000). Combining propensity score matching with additional adjustments for prognostic covariates. Journal of the American Statistical Association 95(450), 573–585.
  • Sekhon (2011) Sekhon, J. S. (2011). Multivariate and propensity score matching software with automated balance optimization: The Matching package for R. Journal of Statistical Software 42(7), 1–52.
  • Stensrud et al. (2020) Stensrud, M. J., J. M. Robins, A. Sarvet, E. J. T. Tchetgen, and J. G. Young (2020). Conditional separable effects. arXiv preprint arXiv:2006.15681.
  • Stensrud et al. (2020) Stensrud, M. J., J. G. Young, V. Didelez, J. M. Robins, and M. A. Hernán (2020). Separable effects for causal inference in the presence of competing events. Journal of the American Statistical Association (just-accepted), 1–23.
  • Stuart (2010) Stuart, E. A. (2010). Matching methods for causal inference: A review and a look forward. Statistical science 25(1), 1.
  • Tchetgen Tchetgen (2014) Tchetgen Tchetgen, E. J. (2014). Identification and estimation of survivor average causal effects. Statistics in medicine 33(21), 3601–3628.
  • Wang et al. (2017) Wang, L., X.-H. Zhou, and T. S. Richardson (2017). Identification and estimation of causal effects with outcomes truncated by death. Biometrika 104(3), 597–612.
  • Xu et al. (2020) Xu, Y., D. Scharfstein, P. Müller, and M. Daniels (2020). A Bayesian nonparametric approach for evaluating the causal effect of treatment in randomized trials with semi-competing risks. Biostatistics.
  • Zhang and Rubin (2003) Zhang, J. L. and D. B. Rubin (2003). Estimation of causal effects via principal stratification when some outcomes are truncated by “death”. Journal of Educational and Behavioral Statistics 28(4), 353–368.
  • Zhang et al. (2009) Zhang, J. L., D. B. Rubin, and F. Mealli (2009). Likelihood-based analysis of causal effects of job-training programs using principal stratification. Journal of the American Statistical Association 104(485), 166–176.

Appendix

Appendix A Additional theory and proofs

Throughout the proofs in this section, we denote

for either a probability density function, or a joint distribution function (including of a continuous and a binary variable) conditioned on the event

. For simplicity of presentation only, we assume in all proofs below that all components of are continuous.

a.1 Auxiliary lemmas and Proof of Proposition 1

We start with auxiliary lemma that would be useful for our proofs, and then in Section A.1.1 turn to the proof of Proposition 1. Our first lemma states that under randomization and SUTVA the proportion of always-survivors among the group is the relative size of out of the proportion .

Lemma 1

Under randomization and SUTVA

By SUTVA and randomization, we have

The proof for the other claim about is analogous.

The following Lemma would be useful for our proofs, and is also interesting by itself. It states that under randomization, SUTVA and monotonicity, the distribution of the covariates in the untreated survivors () is the same as the distribution of in the always-survivors ().

Lemma 2

Under randomization, SUTVA and monotonicity

We have

where the first move is by monotonicity, the second by randomization, and the third by SUTVA.

Our third Lemma shows that result in Lemma 2 holds when monotonicity is replaced by CPSR. Lemma 2 is a direct corollary of Lemma 3, since under monotonicity .

Lemma 3

Under randomization, SUTVA and CPSR

First, note that is also the ratio between the marginal stratum proportions. That is,

(12)

Next, by Bayes theorem and CPSR, the distribution of

in the harmed and in the always-survivors strata is the same,