1 Introduction
Assessments of the treatment effect in specific subgroups often guide decisions regarding product labeling, reimbursement and clinical practice. Yusuf et al. (1991) distinguish between two types of subgroups in randomized clinical trials: “proper subgroups” characterized by baseline data, and “improper subgroups” characterized by postrandomization data. For “improper subgroups”, a naive analysis would compare test and control treatment in patients observed to fall into the subgroup. Such analyses may be misleading as postrandomization data are potentially affected by treatment. Nevertheless, subgroups characterized by postrandomization data continue to be of scientific, regulatory and practical interest (Hirji and Fagerland, 2009; Bohula et al., 2015; Ridker et al., 2018).
In a seminal paper, Frangakis and Rubin (2002) proposed principal stratification to overcome the issues with “improper subgroups”. In this framework, questions related to such subgroups are expressed in the language of potential outcomes (Neyman, 1923; Rubin, 1974). For each patient, we envisage his/her potential postrandomization data when randomized to test and control treatment, respectively. The principal strata then consist of those patients which would fall into subgroups defined by different combinations of their potential outcomes. As potential outcomes can be seen as baseline covariates, the principal stratum is a “proper subgroup”, and hence the treatment effect in this subgroup, the principal stratum effect, has a causal interpretation.
Statistical inference is challenging as the principal stratum effect is not fully identifiable from the observed data without additional assumptions. In the following, we use a Bayesian modelbased approach for which partial identifiability causes no real difficulty (Lindley, 1972). In the Bayesian approach, substantive assumptions are transparently introduced through priors on model parameters, which can be changed in sensitivity analyses. A tutorial on principal stratification is provided by Egleston et al. (2010), and recent research includes Baccini et al. (2017), Ding and Lu (2017), and Egleston et al. (2017).
Principal stratification is one of five strategies described in the draft regulatory guidance document ICH E9(R1) on estimands and sensitivity analyses in clinical trials (ICH, 2017). The approach is also discussed in the NRC (2010) report commissioned by the FDA. However, experience with the principal stratification framework in a regulatory context is currently very limited. In the following, we describe a case study where this framework was used to address a key regulatory question regarding product labeling. This should facilitate further innovative applications of this framework by practitioners.
Our motivating example is the EXPAND study, a large placebocontrolled trial of siponimod in patients with secondary progressive multiple sclerosis (NCT01665144 on ClinicalTrials.gov). The primary objective of this trial was to demonstrate the efficacy of siponimod relative to placebo in delaying disability progression, which was achieved (Kappos et al., 2018). However, the treatment effect in the subgroup of patients who would not relapse during the trial is relevant from both a scientific and regulatory perspective. Assessing this subgroup treatment effect is challenging as there is strong evidence that siponimod reduces relapses (Selmaj et al., 2013; Kappos et al., 2018).
In Section 2 we discuss in more detail the EXPAND study and the scientific question of interest. The principal stratum estimand and scientific assumptions for identification are described in Section 3. Section 4 provides the Bayesian analysis model and methods for sensitivity analysis. Case study results are then shown in Section 5. The article closes with a discussion (Section 6). Software code for the main analysis is provided as supplementary information.
2 Motivating case study  the EXPAND trial
EXPAND was a randomized, doubleblind, placebocontrolled, event and exposuredriven phase 3 study evaluating the efficacy of siponimod in patients with secondary progressive multiple sclerosis (SPMS). 1651 patients were randomized in a 2:1 ratio to receive oncedaily 2mg siponimod or placebo. The primary objective was to demonstrate efficacy of siponimod relative to placebo in delaying the time to 3month confirmed disability progression (CDP) as measured by the Expanded Disability Status Scale (EDSS). The EDSS is an ordinal scale used for assessing neurologic impairment in MS based on a neurological examination. It combines scores in seven functional systems and an ambulation score, and ranges from 0 (no impairment) to 10 (death due to MS). 3month CDP is defined by a prespecified increase from EDSS baseline that is subsequently sustained for at least 3 months. The study achieved its objective with an estimated hazard ratio of 0.79 (95% CI: 0.65–0.95). Full study results are reported in
Kappos et al. (2018).While relapses in SPMS are relatively infrequent compared to the preceding relapsingremitting disease stage, some patients did experience relapses during the study. During these relapses, patients experience an increased EDSS score from which they may fully or partly recover (a CDP confirmation may not take place during a relapse). As expected based on prior trials (Selmaj et al., 2013), siponimod also substantially reduced the annualized relapse rate with a rate ratio of 0.45 (95% CI: 0.34–0.59). This raised the question of siponimod’s ability to delay CDP unrelated to its effect on relapses. Of particular interest was the effect of siponimod among the subgroup of patients for whom relapses would be absent during the study.
Special care must be taken when defining a treatment effect through such a subgroup, since naive classification based simply on absence of onstudy relapses would constitute an “improper” subgroup, defined by postrandomization outcomes that are affected by treatment. To overcome this issue, a principal stratum estimand was defined as the relative effect of treatment on the occurrence of 3month CDP within a time interval from time 0 (randomization) to (e.g. 2 years) among the subgroup of EXPAND patients who would not relapse between randomization and regardless of treatment assignment. The critical distinction between this estimand and one based purely on observed occurrence of onstudy relapse is that the principal stratum estimand is defined in terms of potential outcomes and is hence a proper subgroup.
3 Principal stratum estimand
In this section, we provide a mathematical representation of the estimand defined in Section 2 and discuss the extent to which it can be identified from the observed data.
3.1 Principal stratification
The principal strata are defined in terms of potential outcomes, see for example (Neyman, 1923; Rubin, 1974). Let be an indicator variable denoting treatment assignment, with 0 and 1 corresponding to control and active treatment, respectively. For this exposition, we consider a fixed timeinterval of interest from time (randomization) to time , e.g. 12 or 24 months after randomization. Now, define for each subject the following binary potential outcomes:

occurrence of relapse over the period under treatment ;

occurrence of CDP over the period under treatment .
Note that all subjects have a pair of potential outcomes, only one of which is observed (i.e. the potential outcome corresponding to the treatment actually assigned). Also note that in a randomized trial, the potential outcomes are independent of the treatment assignment. Finally, we denote by and the corresponding observed outcomes for the intercurrent event and the primary outcome, respectively.
We now define four principal strata based on the potential outcomes :

Immune (): Subjects that would not experience relapse regardless of treatment assignment, i.e. ;

Doomed (): Subjects that would experience relapse regardless of treatment assignment, i.e. ;

Benefiter (): Subjects that would only experience relapse if treated with control, i.e. ;

Harmed (): Subjects that would only experience relapse if treated with the active treatment, i.e. .
As discussed above, randomization ensures that the potential outcomes and are jointly independent of treatment assignment, i.e. . In other words, the value of the pair is not affected by treatment and can thus be regarded as a pretreatment characteristic.
We are now ready to define the estimand of interest:
(1) 
This is the principal stratum causal effect of treatment on CDP (expressed as a risk ratio) in the population of immune patients.
We consider inference about the estimand in Equation (1) using the Bayesian framework; see Section 4 for the model specification. One could also proceed with estimation using the frequentist framework, although doing so would likely require several identifying assumptions in order to produce a point estimate (or one could resort to estimating bounds). It is nevertheless illuminating to consider the extent to which the estimand is identifiable. We hence discuss this topic in Section 3.3.
3.2 Assumptions
We make two main assumptions:
Assumption 1 – Joint Exchangeability: . That is, the treatment assignment is jointly independent of the potential outcomes. In our setting, the trial is randomized so this assumption holds by design.
Assumption 2 – Monotonicity: For any patient, or, equivalently, .
The monotonicity assumption rules out the “harmed” principal stratum and allows some patients to be classified as belonging to exactly one stratum. For example, a patient on the control arm (
) with must be “immune”, and a patient on the active arm () with must be “doomed”.The monotonocity assumption warrants careful consideration. It is a partially testable assumption as it implies that
and these probabilities are identified under Assumption 1. In fact, we can rule out Assumption 2 if
. However, failure to rule out Assumption 2 does not mean that it is true. Thus, one must ultimately justify the assumption based on substantive considerations. Analyses that assess sensitivity to departures from monotonicity may be warranted. We discuss options for such sensitivity analyses in Section 4.3.3 Estimand identification
In this section we discuss identifiability of the estimand defined in Equation (1). First, assumptions 1 and 2 allow identification of the principal strata proportions as follows:
The last equation follows from the consistency assumption that is fundamental to causal inference, see for example Hernán and Robins (2018). Hence, the proportion of doomed patients can be estimated by the proportion of patients experiencing the event in the active arm. Similarly, the proportion of immune patients is estimated by . Finally, the proportion of benefiters is simply obtained by .
Now, we briefly discuss the extent to which the estimand of interest can be identified from the data. By applying the conditional probability formula, followed by the monotonicity and exchangeability assumptions, the denominator of Equation (1) is identified as follows:
Hence the denominator is estimated as the proportion of outcomes among control patients for whom in the period of interest. However, the numerator of Equation (1) is not identifiable because patients for which could belong to either the immune stratum or the benefiter stratum. We will hence derive bounds on the numerator which lead to a range of feasible values for the estimand of interest.
We can use the law of total probability (making no further assumptions) to write
which can be rearranged as
(2) 
On the righthand side of this equation, all quantities except are identifiable from the observed data. Specifically, , and and are functions of the identifiable principal strata proportions. When we evaluate the righthand side over the theoretical range of values for (i.e. 0 to 1), we obtain the range of feasible values for . If is large relative to (i.e. is small) then the range of feasible values will be narrow; conversely if is large relative to then this range will be wide.
In order to further identify the numerator, additional substantive assumptions would be required. In some settings, for example, it might be reasonable to assume that . This untestable assumption, which says that probability of the outcome under treatment is lower in the benefiter stratum than the doomed stratum, would narrow the range further but will not lead to full identifiability. Full identifiability will ultimately require several additional untestable assumptions which may call into question the reliability of any conclusions drawn from such an analysis. Rather than imposing additional assumptions to obtain full identifiability, we will use the Bayesian framework to draw inference about the target estimand.
4 Modeling and inference
4.1 Statistical model
Let
be a random variable indicating principal stratum membership and define
to be the probability of belonging to stratum . Note that we include the harmed () stratum in the model specification; the monotonicity assumption is realized with the use of a strongly informative prior.Next, define
as the stratumspecific probability (on the logit scale) of the primary outcome under treatment
. The estimand of interest is then expressed mathematically as .Let
be a vector of all model parameters (
and ). Working in the Bayesian framework, the posterior distribution of is(3) 
The relapse model is given as
Because each combination of implies membership in one of two principal strata, the disability model for given
is represented by a mixture of two Bernoulli distributions. For example, the distribution of
given is (under consistency and randomization) a mixture of two Bernoullis with respective success probabilities and , and mixing proportions and . Table 1 spells out the likelihood for every combination of and .Implied PS  Disability model  

(0,0)  or  
or  
or  
or 
To complete the model specification within the Bayesian framework we need to assign prior distributions for the parameters. For the stratum probabilities we use a logodds scaled categorical distribution with real valued parameters
. This transformation allows straightforward incorporation of covariates (see Section 4.2) and improves sampling efficiency (Carpenter et al., 2017). The principal stratum probabilities are then recovered with the softmax function:The softmax function maps the 4dimensional vector of real numbers to a 4dimensional simplex, i.e. ensures that . We set because the softmax function is invariant under adding a constant to each component of its input.
We parameterize the activearm outcome parameter as . We then assume independent weakly informative normal priors for the principal stratum parameters , and the outcome parameters and , . The specific priors used in our case study are shown in Section 5.1.
The monotonicity assumption is encoded through the use of a strongly informative prior on with extreme location (relative to a plausible range of the
parameters) and small standard deviation. Use of such a prior will essentially imply that
, and that this probability will remain equal to 1 in the posterior distribution. Because monotonicity is enforced through a strongly informative prior, a natural way to assess sensitivity to the assumption is to gradually relax the prior towards “weaker” forms of monotonicity. To this end, one could both shift the location of the prior closer to zero and increase the prior standard deviation to be closer to that used for the other principal strata.4.2 Inclusion of covariates and handling of missing data
Let denote a vector of baseline covariates and be an indicator variable that denotes whether is missing, i.e., if is missing and if is observed. We assume that is independent of given . We also note that is independent of , due to randomization. To include covariates and handle missing data, we further index the model parameters from the previous section by , so that and expit. In (4.1), the disability and relapse models are conditioned on and , where conditioning on follows from our assumption on the missingness mechanism. Further, , and is a Bernoulli mixture with parameters indexed by ; for example, if , we have a mixture with parameters and with mixture weight .
To recover the marginal quantities and , we use the following formulae:
(4)  
where
denotes the joint cumulative distribution function of
.In our case study, we use two dichotomous covariates (see Section 5). We thus obtain estimates of covariatespecific parameters and by fitting (4.1
) separately within each of the four covariate combinations. In general, if covariates are continuous or have too many categories to be treated independently, models could be fitted with regression techniques. For example, the principal stratum parameters could be estimated with multinomial logistic regression with
where is a vector of covariate coefficients corresponding to principal stratum .5 The EXPAND trial – Principal stratum analysis
In the context of the EXPAND trial, the principal stratum of nonrelapsers corresponds to the immune stratum defined in Section 3. We conducted separate analyses for three difference choices for time and months. Two covariates were used: baseline EDSS score dichotomized to high () and low (), and occurrence of relapses within 2 years prior to study (yes/no). Hence four covariate strata were obtained.
5.1 Prior distributions
We used the following prior distributions in our analysis:

for with the following rationale. Without covariates, a prior for and would imply a prior median of approximately 0.31 for and
with a prior 95% CI of (0.04, 0.80). This prior does not overly favor extreme parameter values, nor does it assign excess prior probability to one principal stratum over another. With covariates, the effective variance reduces by a factor approximately proportional to the number of covariate combinations (assuming roughly uniform distribution across covariate groups). To account for this, we thus increase the prior variance within each covariate by a factor of 4, i.e. use a
prior for . 
. This prior ensures that with prior probability essentially equal to 1.

, for all , i.e. identical and independent priors in all principal strata. The mean of this prior was chosen to reflect the expected twoyear disability rate among untreated patients as described in the EXPAND study protocol. The variance was motivated with similar reasoning as above; in the absence of covariates, a prior would imply a prior median disability rate of 0.3 with a 95% CI of (0.06, 0.75) which well covers the range of plausible values. Since we use two dichotomous covariates, the prior variance within each covariate combination is increased by a factor of 4.

With parameterized as , represents the difference from placebo (on a logoddsratio scale) when treated with active treatment. For we used a prior to represent a prior expectation of no treatment effect, with the variance motivated exactly as above.
Importantly, we avoided the use of flat noninformative priors (e.g. ) because such distributions can place a large amount of prior probability on regions of the parameter space that are highly implausible. For example, a diffuse normal prior for would yield an implied bimodal prior for with most of the prior mass concentrated near 0 or 1. Similarly for , even a moderately diffuse prior (e.g.
) implies only a 0.03 prior probability for the range 0.03 to 0.75 (which is the actual 95% credible interval obtained with the prior specified above).
To assess sensitivity to departures from monotonicity, we investigated two alternative prior distributions for . For “weak monotonicity”, we used a prior which implies (in the absence of covariates) a prior median of 0.04 for and a prior 95% CI of (0.01, 0.13). In other words, this prior allows for the possibility that some patients might belong to the harmed stratum, but assigns low prior probability to this relative to the other principal strata. For “no monotonicity”, we used the same prior for as for and (i.e. ). In this setting, the prior probability of belonging to the harmed stratum is not expected to be any larger or smaller than for the other strata.
Finally, we note that because , the implied prior for is not identical to those of and . Indeed, the prior median for is centered at 0.29 and the prior 95% CI is somewhat narrower. Different priors could be used for and that would result in a more “symmetric” prior distribution for ( notwithstanding). For example, a bivariate normal prior for with a correlation of would result in approximately equal prior distributions. Working on the natural parameter scale and placing a Dirichlet prior on would also achieve the same goal. Both of these prior configurations were investigated and did not produce results substantially different from those shown in Section 5.3.
5.2 Details on estimation
Cov. stratum  Treatment  # randomized  # available  # relapses  # CDPs 

1  Siponimod  208  167  22  30 
1  Placebo  107  81  13  22 
2  Siponimod  300  236  15  51 
2  Placebo  155  126  17  35 
3  Siponimod  180  145  20  20 
3  Placebo  95  74  11  18 
4  Siponimod  408  317  13  61 
4  Placebo  188  137  7  20 
The model described in Section 4
was fitted using the probabilistic programming language Stan. This language provides Bayesian inference through Markov chain Monte Carlo (MCMC) methods using the NoUTurn sampler (NUTS). See
Carpenter et al. (2017) for an overview. Four chains were simulated each with 1000 warmup (tuning) iterations and 1000 sampling iterations which were saved for inference. Chains were randomly seeded, and mixing and convergence were assessed using graphical methods (e.g. trace plots), diagnostics such as Rhat (Gelman et al., 2013, p. 285) and checking for divergent transitions. Table 2 shows the summary statistics for the month time point, grouped by covariate stratum and treatment assignment. The Stan model file is provided in an online supplement.5.3 Results
Estimated principal strata proportions are shown in Figure 1. We see that the posterior probability of belonging to the nonrelapser stratum is substantially larger than that of any other strata, with median probability of at least 0.8 under monotonocity. As the monotonicity assumption is relaxed, the posterior probability of belonging to the nonrelapser remains the largest among all strata, and that for the harmed stratum is at most (under no monotonicity) roughly similar to the definiterelapser stratum.
Figure 2 shows inference for the estimand of interest (see Equation (1)). A risk ratio of implies a beneficial effect of treatment. Under monotonicity we see a consistent benefit of treatment with posterior median risk ratios ranging from 0.80 to 0.85, depending on the time point. Further, there is at least approximately 70–75% posterior probability that the risk ratio for both endpoints and all time points. We note that the 95% credible intervals get wider as increases. This is largely explained by decreasing number of patients available for analysis later in the trial (median followup time was approximately 18 months).
The estimates remain fairly consistent as the monotonicity assumption is relaxed; posterior medians shift slightly but credible intervals are largely overlapping. While there is a slight increase in uncertainty (wider credible intervals) with weak or no monotonicity, it appears that the qualitative conclusions of the analysis do not depend strongly on this assumption.
6 Discussion
We have proposed a principal stratum estimand to quantify the effect of treatment in the population of patients that would not experience a postrandomization event based on potential outcomes. Our work is motivated by an interest in quantifying the efficacy of siponimod in a nonrelapsing population of patients with secondary progressive multiple sclerosis. We used a Bayesian approach for statistical inference on the principal stratum estimand. This has the benefit of allowing some structural assumptions to be encoded using informative priors, which can be easily changed in sensitivity analyses. Furthermore, while the estimand of interest would be nonidentifiable if estimated in the frequentist framework (see Section 3.3), working in the Bayesian framework allows straightforward calculation of the posterior distribution and derivation of inferential posterior summaries as long as sensible priors are used. Care is needed when specifying priors as results could be sensitive to particular choices. Very diffuse priors may be unintentionally informative in this context, and can also lead to computational difficulties. We used priors that capture the plausible range of parameter values. While not central to the methodology developed in this paper, we included covariates in the context of handling missing data. We used a missing at random assumption here, while a missing completely at random assumption was made in the companion article by Cree et al. (2018). Results for both missing data assumptions were comparable.
The draft regulatory guidance document ICH E9(R1) on estimands and sensitivity analyses in clinical trials discusses five strategies (i.e. treatmentpolicy, hypothetical, while on treatment, composite, principal stratum) for defining estimands in the presence of postrandomization events, referred to as “intercurrent” events (ICH, 2017). In this article, we developed methods associated with the principal stratum strategy in which the intercurrent event was the occurrence of relapse over a fixed time period and the outcome was disability progression over the same period. Alternatively, the treatmentpolicy strategy would define an estimand that focuses solely on the occurrence of disability progression over a fixed time period and ignores the intercurrent event (i.e. the intenttotreat effect). This was the primary estimand in the EXPAND study. The composite strategy would define a new variable that combines the intercurrent event and the outcome into a single variable, e.g. no relapse and no disability progression over a fixed period of time. The estimand would then be the intentiontotreat effect based on the composite endpoint. The hypothetical strategy would envisage a setting where relapses do not occur. This would require a precise description under which conditions this setting may be realistic. Finally, the while relapsefree strategy (corresponding to the while on treatment strategy in ICH (2017)) would define a new variable, e.g. disability progression prior to relapse.
The estimand framework has been discussed in the literature (Akacha et al., 2017; Leuchs et al., 2015; Mallinckrodt et al., 2012; Mehrotra et al., 2016). Relatively speaking, the principal stratum strategy has received less attention. Permutt (2016) in his taxonomy of estimands highlights that the effect in the principal stratum is “easy to define precisely, if not to estimate”. Akacha et al. (2017) note that for the principal stratum strategy “more formal training of clinical statisticians and practical experience is needed”. The EXPAND study example described in this article provides such a practical example.
The methods developed in this paper apply to binary variables, both in the case of outcome and for intercurrent event. Extensions for continuously distributed outcome variables (e.g. normally distributed) are straightforward. Another possible extension would be to treat both disability and relapse as timetoevent variables, for which one would define
and as the time to relapse and disability under treatment , respectively. A patient would be then considered immune up to time if both and . Survival distributions andwould need to be defined, for example with parametric or semiparametric models. The joint survival distribution
could be modeled with a copula. Other approaches include (Ding and Lu, 2017), who develop methodology that constructs weighted samples based on principal scores, defined as the conditional probabilities of the latent principal strata given covariates. Their analysis does not rely on any modeling assumptions on the outcome variable. However, identification of the principal causal effects relies on several assumptions, including exclusionrestriction (no effect on the intermediate variable implies zero effect on the outcome, see e.g. Angrist et al. (1996)), which would not be an appropriate assumption in our setting.In this article, we considered the use of principal stratification to assess the treatment effect in a subgroup who would not relapse regardless of treatment assignment. The methods developed can be applied, for binary outcomes, to draw inference about the causal effect among compliers (Little and Kang, 2015) and the causal effect among survivors (Rubin et al., 2006). The former estimand is highly relevant in the context of noninferiority or equivalence trials and the latter estimand in the context of trials in which functional outcomes may be truncated by death (ICH, 2017).
References
 Akacha et al. (2017) Akacha, M., F. Bretz, D. Ohlssen, G. Rosenkranz, and H. Schmidli (2017). Estimands and their role in clinical trials. Statistics in Biopharmaceutical Research 9(3), 268–271.
 Akacha et al. (2017) Akacha, M., F. Bretz, and S. Ruberg (2017). Estimands in clinical trials–broadening the perspective. Statistics in Medicine 36(1), 5–19.
 Angrist et al. (1996) Angrist, J. D., G. W. Imbens, and D. B. Rubin (1996). Identification of causal effects using instrumental variables. Journal of the American statistical Association 91(434), 444–455.
 Baccini et al. (2017) Baccini, M., A. Mattei, and F. Mealli (2017). Bayesian inference for causal mechanisms with application to a randomized study for postoperative pain control. Biostatistics 18(4), 605–617.
 Bohula et al. (2015) Bohula, E. A., R. P. Giugliano, C. P. Cannon, J. Zhou, S. A. Murphy, J. A. White, A. M. Tershakovec, M. A. Blazing, and E. Braunwald (2015). Achievement of dual lowdensity lipoprotein cholesterol and highsensitivity Creactive protein targets more frequent with the addition of ezetimibe to simvastatin and associated with better outcomes in IMPROVEIT. Circulation 132(13), 1224–1233.
 Carpenter et al. (2017) Carpenter, B., A. Gelman, M. Hoffman, D. Lee, B. Goodrich, M. Betancourt, M. Brubaker, J. Guo, P. Li, and A. Riddell (2017). Stan: A probabilistic programming language. Journal of Statistical Software, Articles 76(1), 1–32.
 Cree et al. (2018) Cree, B., B. Magnusson, N. Rouyrre, R. Fox, G. Giovannoni, P. Vermersch, A. BarOr, R. Gold, D. Piani Meier, G. Karlsson, D. Tomic, C. Wolf, F. Dahlke, and L. Kappos (2018). Disentangling treatment effects on disability and relapses: analysis of siponimod in secondary progressive multiple sclerosis. In preparation.
 Ding and Lu (2017) Ding, P. and J. Lu (2017). Principal stratification analysis using principal scores. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 79(3), 757–777.
 Egleston et al. (2010) Egleston, B. L., K. L. Cropsey, A. B. Lazev, and C. J. Heckman (2010). A tutorial on principal stratificationbased sensitivity analysis: application to smoking cessation studies. Clinical Trials 7(3), 286–298.
 Egleston et al. (2017) Egleston, B. L., R. G. Uzzo, and Y.N. Wong (2017). Latent class survival models linked by principal stratification to investigate heterogeneous survival subgroups among individuals with early stage kidney cancer. Journal of the American Statistical Association 112(518), 534–546.
 Frangakis and Rubin (2002) Frangakis, C. E. and D. B. Rubin (2002). Principal stratification in causal inference. Biometrics 58(1), 21–29.
 Gelman et al. (2013) Gelman, A., J. B. Carlin, H. S. Stern, D. B. Dunson, A. Vehtari, and D. B. Rubin (2013). Bayesian Data Analysis. Chapman & Hall/CRC.
 Hernán and Robins (2018) Hernán, M. A. and J. M. Robins (2018). Causal Inference. Boca Raton: Chapman & Hall/CRC, forthcoming.
 Hirji and Fagerland (2009) Hirji, K. F. and M. W. Fagerland (2009). Outcome based subgroup analysis: a neglected concern. Trials 10(1), 33.
 ICH (2017) ICH (2017). Draft ICH E9(R1) addendum on estimands and sensitivity analysis in clinical trials to the guideline on statistical principles for clinical trials. International Conference on Harmonisation. http://www.ema.europa.eu/docs/en_GB/document_library/Scientific_guideline/2017/08/WC500233916.pdf, accessed August 31, 2017.
 Kappos et al. (2018) Kappos, L., A. BarOr, B. Cree, R. J. Fox, G. Giovannoni, R. Gold, P. Vermersch, D. L. Arnold, S. Arnould, T. Scherz, C. Wolf, E. Wallström, and F. Dahlke (2018). Siponimod versus placebo in secondary progressive multiple sclerosis (expand): a doubleblind, randomised, phase 3 study. The Lancet 391(10127), 1263–1273.
 Leuchs et al. (2015) Leuchs, A.K., J. Zinserling, A. Brandt, D. Wirtz, and N. Benda (2015). Choosing appropriate estimands in clinical trials. Therapeutic Innovation & Regulatory Science 49(4), 584–592.
 Lindley (1972) Lindley, D. (1972). Bayesian Statistics, A Review. Society for Industrial and Applied Mathematics.
 Little and Kang (2015) Little, R. and S. Kang (2015). Intentiontotreat analysis with treatment discontinuation and missing data in clinical trials. Statistics in Medicine 34(16), 2381–2390.
 Mallinckrodt et al. (2012) Mallinckrodt, C. H., Q. Lin, I. Lipkovich, and G. Molenberghs (2012). A structured approach to choosing estimands and estimators in longitudinal clinical trials. Pharmaceutical Statistics 11(6), 456–461.
 Mehrotra et al. (2016) Mehrotra, D. V., R. J. Hemmings, and E. RussekCohen (2016). Seeking harmony: estimands and sensitivity analyses for confirmatory clinical trials. Clinical Trials 13(4), 456–458.

Neyman (1923)
Neyman, J. (1923).
On the application of probability theory to agricultural experiments. essay on principles. section 9 (translated).
Statistical Science 5, 465–472.  NRC (2010) NRC (2010). The Prevention and Treatment of Missing Data in Clinical Trials. Washington, DC: The National Academies Press.
 Permutt (2016) Permutt, T. (2016). A taxonomy of estimands for regulatory clinical trials with discontinuations. Statistics in Medicine 35(17), 2865–2875.
 Ridker et al. (2018) Ridker, P. M., J. G. MacFadyen, B. M. Everett, P. Libby, T. Thuren, and R. J. Glynn (2018). Relationship of Creactive protein reduction to cardiovascular event reduction following treatment with canakinumab: a secondary analysis from the CANTOS randomised controlled trial. The Lancet 391(10118), 319–328.
 Rubin (1974) Rubin, D. B. (1974). Estimating causal effects of treatments in randomized and nonrandomized studies. Journal of Educational Psychology 66(5), 688–701.
 Rubin et al. (2006) Rubin, D. B. et al. (2006). Causal inference through potential outcomes and principal stratification: Application to studies with “censoring” due to death. Statistical Science 21(3), 299–309.
 Selmaj et al. (2013) Selmaj, K., D. K. Li, H.P. Hartung, B. Hemmer, L. Kappos, M. S. Freedman, O. Stüve, P. Rieckmann, X. Montalban, T. Ziemssen, L. Z. Auberson, H. Pohlmann, F. Mercier, F. Dahlke, and E. Wallström (2013). Siponimod for patients with relapsingremitting multiple sclerosis (bold): an adaptive, doseranging, randomised, phase 2 study. The Lancet Neurology 12(8), 756 – 767.
 Yusuf et al. (1991) Yusuf, S., J. Wittes, J. Probstfield, and H. A. Tyroler (1991). Analysis and interpretation of treatment effects in subgroups of patients in randomized clinical trials. JAMA 266(1), 93–98.