# On the identification of individual principal stratum direct, natural direct and pleiotropic effects without cross-world independence assumptions

The analysis of natural direct and principal stratum direct effects has a controversial history in statistics and causal inference as these effects are commonly identified with either untestable cross-world independence or graphical assumptions. This paper demonstrates that the presence of individual level natural direct and principal stratum direct effects can be identified without cross-world independence assumptions. We also define a new type of causal effect, called pleiotropy, that is of interest in genomics, and provide empirical conditions to detect such an effect as well.

## Authors

• 2 publications
• 13 publications
05/27/2019

### Detecting Individual Level Always Survivor' Causal Effects Under Truncation by Death' and Censoring Through Time

The analysis of causal effects when the outcome of interest is possibly ...
03/23/2020

### Insights into the "cross-world" independence assumption of causal mediation analysis

Causal mediation analysis is a useful tool for epidemiological research,...
11/22/2020

### Exploiting network information to disentangle spillover effects in a field experiment on teens' museum attendance

Nudging youths to visit historical and artistic heritage is a key goal p...
03/10/2020

### A different perspective of cross-world independence assumption and the utility of natural effects versus controlled effects

The pure effects described by Robins and Greenland, and later called nat...
02/17/2019

### Bayesian Methods for Multiple Mediators: Relating Principal Stratification and Causal Mediation in the Analysis of Power Plant Emission Controls

Emission control technologies installed on power plants are a key featur...
04/17/2019

### Clarifying causal mediation analysis for the applied researcher: Defining effects based on what we want to learn

The incorporation of causal inference in mediation analysis has led to t...
10/19/2017

### The Geometry of Gaussoids

A gaussoid is a combinatorial structure that encodes independence in pro...
##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

### 1.1 Natural and principal stratum direct effects

The analysis of natural direct and principal stratum direct effects has a long history in statistics and causal inference (Pearl, 2009; VanderWeele, 2015; Frangakis & Rubin, 2002; Robins & Greenland, 1992). All of these effects are well defined in a counterfactual or potential outcome framework (Pearl, 2009; VanderWeele, 2015; Frangakis & Rubin, 2002; Robins & Greenland, 1992), but these effects are commonly identified using either untestable cross-world independence assumptions or graphical assumptions (Pearl, 2009; VanderWeele, 2015; Frangakis & Rubin, 2002; Robins & Greenland, 1992). Consequently, the identification and use of these effects in research has elicited controversy from authorities across varied disciplines (Pearl, 2009; VanderWeele, 2015; Frangakis & Rubin, 2002; Robins & Greenland, 1992). The first contribution of this paper is demonstrating that individual level natural direct and principal stratum direct effects can sometimes be empirically detected without cross-world independence assumptions.

The second contribution of this paper is the derivation of novel constraints on the counterfactual distribution of different direct effects should one reject a null hypothesis corresponding to no direct effect. These constraints provide useful information on the magnitude of the corresponding direct effect in comparison to the other direct effects. Such constraints enable statisticians to uncover mechanisms of the effect of a treatment on an outcome. Related results when one assumes that the treatment has a monotonic effect on the potential mediator are also derived.

As a consequence of our work, the detection of direct effects can sometimes be carried out with no stronger identification assumptions than the assumptions needed to analyze total effects in randomized clinical trials (VanderWeele, 2015; Imbens & Rubin, 2015). Finally, we also define “individual level pleiotropic effects” to be present when a treatment is shown to cause two different outcomes in a single individual in the population. Such an effect is of interest in understanding the etiology of different phenotypes from single gene or allele.

## 2 Identification of individual level direct effects

### 2.1 Notation and assumptions

Let denote a binary outcome. In addition, suppose a binary treatment

is randomized at baseline, and a binary variable

is also observed at a time between measurement of and the assignment of treatment The individuals in our study, denoted by symbol compose a finite population, denoted Define counterfactuals and to be the value of and respectively had we set the value of the treatment to for individual The manipulated counterfactual denotes the value of had we set to and to . Finally, define the counterfactual to be the value of had we set the value of the treatment to and set the value of the variable to the value it would take had we set to some value , that could be different to Denote

The individual level total effect is defined as The individual pure direct effect is defined as and the total indirect is defined as (Robins & Greenland, 1992). Similarly, the total direct is defined as and the individual pure indirect effect is defined as (Robins & Greenland, 1992). These pure and total direct and indirect effects are sometimes also referred to as natural direct and indirect effects (Pearl, 2009). The individual controlled direct effect is defined as . We say there is a principal stratum direct effect if for some individual for whom we have Population total, total direct, pure direct, total indirect, pure indirect and controlled direct effects are derived through taking expectations of the relevant individual level effects. The population level principal stratum direct effect of stratum is defined as for some (Frangakis & Rubin, 2002).

With treatment randomized at baseline, one may make the ‘weak ignorability’ assumption . Unlike typical approaches to identify natural direct and indirect effects, no cross-world independence assumptions of the form (Pearl, 2009) are used to derive our results. Throughout, we require the consistency assumption for both and , which means that when then and . This assumption states that the value of and that would have been observed if had been set to what in fact they were observed to be is equal respectively to the value of and that was observed. Additionally, the consistency assumption for means that when and then . Finally, we assume the composition axiom, which states that when for some , then . The randomization, consistency, and composition assumptions are the only assumptions necessary to derive our results. The assumptions needed to identify such direct effects in observational studies is provided in the online supplement. Proofs of all results are given in the Appendix.

### 2.2 Identification direct effects

###### Theorem 1

Suppose is randomized at baseline. If for some and then there exists a non-empty subpopulation such that for every and

###### Corollary 1

Suppose is randomized at baseline. If for some and then there exists a non-empty subset such that for every , with and i.e. the total effect, total direct effect, pure direct effect, principal stratum direct effect with are all equal to , and the total indirect and pure indirect effects for individual are both zero.

Theorem 1 and Corollary 1 allow for the empirical detection of individual natural and principal stratum direct effects. A somewhat related set of results to Theorem 1 are the empirical conditions to detect individual level sufficient cause interaction (VanderWeele & Robins, 2008; VanderWeele et al., 2012b; Ramsahai, 2013). Theorem 1 demonstrates that such logic can also be used to detect individual level natural and principal stratum direct effects. Each of the four empirical conditions provided in Theorem 1, will correctly identify when an individual principal stratum direct effect is present in the population.

Readers familiar with literature on the falsification of the binary instrumental variable model (Balke & Pearl, 1997; Ramsahai & Lauritzen, 2011) will recognize that each of the empirical conditions in Theorem 1 as the complement of one of four ‘instrumental inequalities’ (Balke & Pearl, 1997; Swanson et al., 2018). The result closest to ours is the result of Cai et al. (2008) who demonstrate that non-zero population average controlled direct effects can be identified under the ‘strong ignorability’ assumption through rejecting the ‘instrumental inequality’ (Cai et al., 2008), but their result does not make any statement about individual or population principal stratum or natural direct effects. Richardson & Robins (2010) also discuss implications of falsifying the instrumental variable model, but their results do not imply our results on individual principal stratum or natural direct effect. They also make no mention of individual level principal stratum direct effects.

Sjölander (2009) provides bounds for the population natural direct effect. These bounds for the population natural direct effect are different to the Cai et al. (2008) bounds on the population controlled direct effect. It is simple to demonstrate that if the Sjölander (2009) lower bound (upper bound) of the natural direct effect is greater than zero (less than zero) respectively, then one of the inequalities associated with Theorem 1 holds. However, if we consider the converse, an inequality with Theorem 1 can hold, but the Sjölander (2009) bounds for the population natural direct effect could include zero. The difference stems from Sjölander (2009) presenting bounds on the population average natural direct effects, while we present lower bounds on the proportion of individuals that display a particular form of a principal stratum direct effect and consequently a natural direct effect.

Other authors derive bounds on the principal stratum direct effects assuming that the relevant principal stratum exists (Zhang & Rubin, 2003; Hudgens et al., 2003; Imai, 2008; Richardson & Robins, 2010), whereas our results demonstrates that the relevant principal stratum must in fact exist, as shown below, and provides information on the proportion of individuals that display such a principal stratum direct effect in comparison to other counterfactual responses.

Natural direct and indirect effects are identified also if one assumes an underlying nonparametric structural equation models that imply the cross-world independence assumption (Pearl, 2009). We do not make that assumption here. Moreover, under a nonparametric structural model framework, these natural direct effects fail to be identified if there exists a post baseline confounder between and that is also effected by treatment (Avin et al., 2005). However, Theorem 1 will still correctly identify non-zero direct effects even in the presence of a confounder between and that is affected by treatment Imai et al. (2013) use a more elaborate experimental design in which both treatment and mediator are randomized to identify the presence of indirect effects, but we are not assuming randomization of the mediator, only treatment. We now give an additional result on the distribution of counterfactual response types corresponding to direct effects.

###### Proposition 1

Suppose is randomized at baseline. If for and then we have the following counterfactual result:

 pc(y,1−y,m,m) > pc(1−m,1−m,1−m,1−m)+pc(1−y,y,1−m,1−m)+pc(1−y,y,y,1−y) (1) +pc(1−y,y,1−y,y)+pc(y,1−y,1−m,1−m)+pc(m,m,1−y,y) +pc(m,m,1−m,1−m)+pc(1−m,1−m,y,1−y)+pc(1−y,y,m,m).

Theorem 1 demonstrates that individual level principal stratum are identified solely assuming randomization and consistency of counterfactuals. Corollary 1 then demonstrates that this individual level principal stratum direct effect is also a natural direct effect using the composition axiom. Finally, Proposition 1 demonstrates that if one rejects a null hypothesis of the form for some and , then the counterfactual distribution of is severely constrained. This constraint on the counterfactual distribution has implications for understanding mechanisms in the population. It informs us that the proportion of individuals for whom the treatment has a positive (or negative) total effect on the outcome and simultaneously the treatment does not change the mediator value from some fixed value is greater than the proportion of individuals for whom the treatment has a negative (or positive respectively) total effect and simultaneously the treatment does not change the mediator value from some fixed value . Proposition 1 also informs us whether a direct effect of treatment is greater than other direct effects. Also note  (1) implies that .

## 3 Identification of direct effects with monotonicity assumptions

### 3.1 Positive monotonicity of exposure on mediator

Now consider the setting of positive monotonicity defined as for all that is, there is no individual for whom treatment prevents the mediator from occurring. While such monotonic effects are never verifiable, they are falsifiable and can sometimes be justified with subject matter knowledge. Under the assumption of monotonicity, there is a gain in the capacity to detect direct effects in comparison to the results from the previous section. Additionally, if such assumptions are reasonable, the tighter bounds enable detection of two different principal stratum direct effects in a population.

### 3.2 Identification of individual level direct effects under positive monotonicity

###### Theorem 2

Suppose is randomized at baseline. In addition suppose that has a positive monotonic effect on . If for and then there exists a non-empty subpopulation such that for every and

###### Corollary 2

Suppose is randomized at baseline. In addition suppose that has a positive monotonic effect on . If , then there exists a non-empty subpopulation such that for every the total effect, total direct effect, pure direct effect, principal stratum direct effect with are all equal to , and the total indirect and pure indirect effects for are both zero.

###### Proposition 2

Suppose is randomized at baseline. In addition suppose that has a positive monotonic effect on . If for and , then we have the following counterfactual result:

Proposition 2 enables similar interpretations of distribution of the counterfactual contrasts that were provided for Proposition 1. If , then the proportion of individuals for whom treatment has a beneficial effect on and and are still zero is greater than the proportion of individuals for whom treatment has a harmful effect on and and are still zero. The ‘instrumental inequalities’ assuming having a positive monotonic effect on are for and (Balke & Pearl, 1997). These ‘instrumental inequalities’ serve as null hypotheses of the corresponding principal stratum direct effect under investigation. At most two of the four inequalities associated with Theorem 2 can hold.

###### Example 1

Consider a population denoted for which half of the individuals follow counterfactual response type and and the other half of individuals follow counterfactual response type and We have for all . For such a population, the average pure direct effect, and the total direct effect, are both zero. However, the two null hypotheses associated with Theorem 2 are both falsified, indicating that the two principal stratum direct effects associated with individual response types and are both present in our population. As noted in Richardson & Robins (2010), at most one of the the four inequalities associated with Theorem 1 can hold, which means that without the monotonicity assumption, one can at most detect one individual level principal stratum direct effect. However, two of the inequalities associated with Theorem 2 hold. We see that and in this example. This means that both individual level principal stratum direct effects can be detected using our results from Theorem 2.

Earlier, for the setting where one does not assume has a positive monotonic effect on , we saw Theorem 1 would identify individual level principal stratum and natural direct effects even when the Sjölander (2009) bounds for the natural direct effect included zero. The key difference between our results and Sjölander (2009) results for the monotonicity setting is that we only assume that has a monotonic effect on , wheras Sjölander (2009) makes additional monotonicity assumptions. We are able to detect two different individual level principal stratum direct effects even under the scenario where the total effect or natural direct effect is zero. Detection of two different principal stratum direct effects is not previously described in literature (Sjölander, 2009; Cai et al., 2008; Balke & Pearl, 1997; Swanson et al., 2018; Richardson & Robins, 2010).

## 4 Pleiotropic Effects

In addition to variables and assumptions provided in Section 2, let denote an additional binary outcome. As before, assume treatment is randomized, so that We also require the consistency assumption for and This section will define pleiotropic effects, which are of importance in the context of making inferences abut the effect of a single gene or treatment on two distinct outcomes or phenotypes (Solovieff et al., 2013). The results provided herein enable scientists to discover such effects. This section does not use notation or from the previous three sections.

###### Definition 1 (Pleiotropic Effects)

A treatment or exposure is defined to have an individual level pleiotropic effect on outcomes and if there exists an individual for whom one of the following four counterfactual response types holds:

 Y1(ω)=1,Y0(ω)=0,Z1(ω)=1,Z0(ω)=0; Y1(ω)=1,Y0(ω)=0,Z1(ω)=0,Z0(ω)=1; Y1(ω)=0,Y0(ω)=1,Z1(ω)=1,Z0(ω)=0; Y1(ω)=0,Y0(ω)=1,Z1(ω)=0,Z0(ω)=1.

For the first condition, the treatment is causative for both and For the second condition, the treatment is causative for and preventative for For the third condition, the treatment is preventative for and causative for Finally, for the last condition, is preventative for both and

###### Theorem 3

Suppose is randomized at baseline. If , then there exists a non-empty subpopulation such that for all and

The other three counterfactual response types can be likewise detected using analogous results collected in the online supplement. Theorem 3 enables the detection of individual level pleiotropic effects, but does not distinguish between two different forms of individual level pleiotropy. The first situation is where causes which in turn causes (or causes which in turn causes ) and the second situation is where causes through a pathway not through and similarly causes through a pathway not through The first situation is called ‘mediated pleiotropy’ and the second situation is known as ‘biologic pleiotropy’ (Solovieff et al., 2013). Theorem 3 allows an investigator to detect if at least one of these two types of pleitropy is present.

## 5 Data Analysis

Yerushalmy (2014), and other authors since (Hernández-Díaz et al., 2006; VanderWeele et al., 2012a), have noted that maternal smoking is associated with lower infant mortality among low birth weight infants , sometimes referred to as the birth weight paradox. In Yerushalmy’s data of white mothers, the difference infant mortality comparing low birth infants whose mothers smoke versus those who do not is . This might be interpreted as evidence for a direct effect of maternal smoking on lowering infant mortality through pathways independent of low birth weight. However, such analyses ignore the possibility of unmeasured factors that are common causes of both low birth weight and infant mortality , such as possibly malnutrition or a birth defect (Hernández-Díaz et al., 2006; VanderWeele et al., 2012a). Even if the relationship between maternal smoking and birth weight is unconfounded, and also that between maternal smoking and infant mortality, an unmeasured common cause of low birth weight and infant mortality could introduce bias of the analysis. The results provided in our paper do not require having data on, or having controlled for, potential common causes of low birth weight and infant mortality. The relevant condition from our results for testing for a controlled, natural, or principal stratum direct effect is which gives, using Yerushalmy’s data, with a

perecent confidence interval of

and so there is in fact no statistical evidence for a protective direct effect of maternal smoking on infant mortality not through birthweight for the low birthweight infants using our conditions.

## 6 Conclusion

This work demonstrates that testing for individual level natural direct effects, principal stratum direct effects, and pleiotropic effects can be implemented with no stronger identification assumptions than testing for a non-zero total effect. The identifiability assumptions that are needed to test the efficacy of a drug in a randomized clinical trial are exactly the identifiability assumptions needed to test for the relevant individual level direct effects. This embeds testing for these direct effects firmly within the Neyman-Pearson testing paradigm. Previous literature studied the ‘instrumental inequalities’ and made significant contributions to understanding some of their consequences, including that falsification of the ‘instrumental inequality’ provides evidence for a non-zero population average controlled direct effect (Cai et al., 2008). Our paper is the first to establish a tests for an individual level principal stratum direct effects and natural direct effects without cross-world or graphical assumptions. Under monotonicity, our conditions can be used to detect individual level natural direct effects even when the bounds for the population average natural direct effects include zero. We also generate results that provide information on the magnitude of the corresponding principal stratum direct effects. Additionally, we define a new causal effect, individual level pleiotropy, in the counterfactual or potential outcome framework, and derive the associated empirical conditions which could be used to detect such an effect in a population.

## 7 Acknowledgements

We thank James M. Robins, Eric J. Tchetgen Tchetgen and Linbo Wang for helpful discussions. This research was supported by the National Institutes of Health.

## 8 Supplementary Materials

The online supplement contains theorems and proofs of related results, and guidance for conducting inference for these direct effects in various study designs, including observational studies. The online supplement also explains how to derive similar results for the case of principal stratification when denotes survival.

## Appendix

### .1 Proofs of Theorems

###### Proof (of Theorem 1)

We prove the contrapositive. Assume that no individual of response type and exists in our population. Then for all individuals in our population Here, denotes the usual indicator function. Taking expectations,

 E[I{Y1,M1(ω)=y,M1(ω)=m}+I{Y0,M0(ω)=1−y,M0(ω)=m}] ≤ 1⟺ pr{Y1,M1(ω)=y,M1(ω)=m}+pr{Y0,M0(ω)=1−y,M0(ω)=m} ≤ 1⟺ pr{Y1,M1(ω)=y,M1(ω)=m∣X=1}+pr{Y0,M0(ω)=1−y,M0(ω)=m∣X=0} ≤ 1⟺ pr(Y=y,M=m∣X=1)+pr(Y=1−y,M=m∣X=0) ≤ 1.

The second to third line uses and the third to last line uses consistency of counterfactuals.

###### Proof (of Corollary 1)

From Theorem 1, we have that there exists a non-empty subpopulation such that for every and . From the composition axiom and , we have that and , which provides the following results for all : (1) (2) (3) (4) (5) (6) (7)

###### Proof (of Proposition 1)

Take expectation of and to generate the the result. A table with the relevant frequencies of individuals is provided in the online supplement.

###### Proof (of Theorem 2)

We prove the contrapositive. Assume that no individual of response type and exists in our population. Then for all individuals in our population This last assertion is true after examining a counterfactual table that is provided in the online supplement. Taking expectations, we have

 pr{Y1,M1(ω)=1,M1(ω)=0}−pr{Y0,M0(ω)=1,M0(ω)=0} ≤ 0⟺ pr{Y1,M1(ω)=1,M1(ω=0∣X=1)}−pr{Y0,M0(ω)=1,M0(ω)=0∣X=0} ≤ 0⟺ pr(Y=1,M=0∣X=1)−pr(Y=1,M=0∣X=0) ≤ 0.

The first to second line uses and the second to third line uses consistency of counterfactuals.

###### Proof (of Proposition 2)

Take expectation of , , and . A table with the relevant frequencies of individuals is provided in the online supplement.

###### Proof (of Corollary 2)

Similar to Corollary 1.

###### Proof (of Theorem 3)

We prove the contrapositive. Assume that no individual of response type and exists in our population. Then for all individuals in our population Taking expectations,

 pr{Y1(ω)=y,Z1(ω)=1}+pr{Y0(ω)=0,Z0(ω)=0} ≤ 1⟺ pr{Y1(ω)=1,Z1(ω)=1∣X=1}+pr{Y0(ω)=0,Z0(ω)=0∣X=0} ≤ 1⟺ pr(Y=1,Z=1∣X=1)+pr(Y=0,Z=0∣X=0) ≤ 1.

The first to second line uses and the second to third line uses consistency of counterfactuals.

## References

• Avin et al. (2005) Avin, C., Shpitser, I. & Pearl, J. (2005). Identifiability of path-specific effects. In

Proceedings of the 19th international joint conference on Artificial intelligence

. Morgan Kaufmann Publishers Inc.
• Balke & Pearl (1997) Balke, A. & Pearl, J. (1997). Bounds on treatment effects from studies with imperfect compliance. Journal of the American Statistical Association 92, 1171–1176.
• Cai et al. (2008) Cai, Z., Kuroki, M., Pearl, J. & Tian, J. (2008). Bounds on direct effects in the presence of confounded intermediate variables. Biometrics 64, 695–701.
• Frangakis & Rubin (2002) Frangakis, C. E. & Rubin, D. B. (2002). Principal stratification in causal inference. Biometrics 58, 21–29.
• Hernández-Díaz et al. (2006) Hernández-Díaz, S., Schisterman, E. F. & Hernán, M. A. (2006). The birth weight “paradox” uncovered? American journal of epidemiology 164, 1115–1120.
• Hudgens et al. (2003) Hudgens, M. G., Hoering, A. & Self, S. G. (2003). On the analysis of viral load endpoints in HIV vaccine trials. Statistics in Medicine 22, 2281–2298.
• Imai (2008) Imai, K. (2008). Sharp bounds on the causal effects in randomized experiments with “truncation-by-death”.

Statistics & probability letters

78, 144–149.
• Imai et al. (2013) Imai, K., Tingley, D. & Yamamoto, T. (2013). Experimental designs for identifying causal mechanisms. Journal of the Royal Statistical Society: Series A (Statistics in Society) 176, 5–51.
• Imbens & Rubin (2015) Imbens, G. W. & Rubin, D. B. (2015). Causal inference in statistics, social, and biomedical sciences. Cambridge University Press.
• Pearl (2009) Pearl, J. (2009). Causality. Cambridge University Press.
• Ramsahai & Lauritzen (2011) Ramsahai, R. & Lauritzen, S. (2011). Likelihood analysis of the binary instrumental variable model. Biometrika 98, 987–994.
• Ramsahai (2013) Ramsahai, R. R. (2013). Probabilistic causality and detecting collections of interdependence patterns. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 75, 705–723.
• Richardson & Robins (2010) Richardson, T. S. & Robins, J. M. (2010). Analysis of the binary instrumental variable model. Heuristics, Probability and Causality: A Tribute to Judea Pearl , 415–444.
• Robins & Greenland (1992) Robins, J. M. & Greenland, S. (1992). Identifiability and exchangeability for direct and indirect effects. Epidemiology , 143–155.
• Sjölander (2009) Sjölander, A. (2009). Bounds on natural direct effects in the presence of confounded intermediate variables. Statistics in Medicine 28, 558–571.
• Solovieff et al. (2013) Solovieff, N., Cotsapas, C., Lee, P. H., Purcell, S. M. & Smoller, J. W. (2013). Pleiotropy in complex traits: challenges and strategies. Nature Reviews Genetics 14, nrg3461.
• Swanson et al. (2018) Swanson, S. A., Hernán, M. A., Miller, M., Robins, J. M. & Richardson, T. (2018). Partial Identification of the Average Treatment Effect Using Instrumental Variables: Review of Methods for Binary Instruments, Treatments, and Outcomes. Journal of the American Statistical Association .
• VanderWeele (2015) VanderWeele, T. J. (2015). Explanation in causal inference: methods for mediation and interaction. Oxford University Press.
• VanderWeele et al. (2012a) VanderWeele, T. J., Mumford, S. L. & Schisterman, E. F. (2012a). Conditioning on intermediates in perinatal epidemiology. Epidemiology (Cambridge, Mass.) 23, 1.
• VanderWeele et al. (2012b) VanderWeele, T. J., Richardson, T. S. et al. (2012b). General theory for interactions in sufficient cause models with dichotomous exposures. The Annals of Statistics 40, 2128–2161.
• VanderWeele & Robins (2008) VanderWeele, T. J. & Robins, J. M. (2008). Empirical and counterfactual conditions for sufficient cause interactions. Biometrika 95, 49–61.
• Wang et al. (2017) Wang, L., Robins, J. M. & Richardson, T. S. (2017). On falsification of the binary instrumental variable model. Biometrika 104, 229–236.
• Yerushalmy (2014) Yerushalmy, J. (2014). The relationship of parents’ cigarette smoking to outcome of pregnancy–implications as to the problem of inferring causation from observed associations. International journal of epidemiology 43, 1355–1366.
• Zhang & Rubin (2003) Zhang, J. L. & Rubin, D. B. (2003). Estimation of Causal Effects via Principal Stratification When Some Outcomes are Truncated by “Death”. Journal of Educational and Behavioral Statistics 28, 353–368.

## Appendix A Notation and Assumptions

### a.1 Notation

The individual level total effect is defined as The individual level pure direct effect is defined as and the total indirect is defined as Similarly, the total direct is defined as and the individual level pure indirect effect, is defined as The individual level controlled direct effect is defined as . We say there is a principal stratum direct effect if for some individual for whom we have Population total, natural direct, pure direct, natural indirect, pure indirect and controlled direct effects are derived through taking expectations of the relevant individual level effects. The population level principal stratum direct effect of stratum is defined as for some (Frangakis & Rubin, 2002).

Since treatment is randomized at baseline, we assume which is the weakest possible randomization assumption. Throughout, we require the consistency assumption for both and , which means that when then and . This assumption states that the value of and that would have been observed if had been set to what in fact they were observed to be is equal respectively to the value of and that was observed. Additionally, the consistency assumption for means that when and then . We assume the composition axiom, which states that when for some , then . The randomization, consistency, and composition are the only assumptions necessary to derive our results. If one insisted, the randomization and consistency assumption for and would suffice, without the need for the composition axiom, for deriving the results provided in Theorem 1 and 2 in the main text, but we have chosen to keep these assumptions throughout the online supplement for the sake of clarity and concision. Let denote the usual indicator function. Also denote

## Appendix B Pleiotropy

We provide the full statement of Theorem 3 in the main text and the associated proof. Denote and due to space constraints.

###### Theorem 4 (Complete Statement of Theorem 3)

Suppose is randomized.

If then there exists a non-empty subpopulation such that for all and

If then there exists a non-empty subpopulation such that for all and

If then there exists a non-empty subpopulation such that for all and

If then there exists a non-empty subpopulation such that for all and

###### Proof

We prove the contrapositive. Assume that no individual of response type and exists in our population. Then for all individuals in our population Taking expectations, we have

 E[I{Y1(ω)=1,Z1(ω)=0}+I{Y0(ω)=0,Z0(ω)=0}] ≤ 1⟺ E[I{Y1(ω)=1,Z1(ω)=1}]+E[I{Y0(ω)=0,Z0(ω)=0}] ≤ 1⟺ pr{Y1(ω)=1,Z1(ω)=1}+pr{Y0(ω)=0,Z0(ω)=0} ≤ 1⟺ pr{Y1(ω)=1,Z1(ω)=1∣X=1}+pr{Y0(ω)=0,Z0(ω)=0∣X=0} ≤ 1⟺ pr(Y=1,Z=1∣X=1)+pr(Y=0,Z=0∣X=0) ≤ 1.

The third to fourth line uses and the fourth to last line uses consistency of counterfactuals. The other inequalities are trivially proved through similar arguments.

## Appendix C Complete tables and results for direct effects without monotonicity

Table C provides a list of individual response types and the associated frequencies in the population . Table C provides information the same individual response types but examines different functions of the counterfactuals or potential outcomes. Denote and Due to space limitations, we were unable to append the column that is present in Table C to Table C. As the ordering of in both tables is kept the same, the last column in Table C can be appendend to Table C without any changes.