1 Sampling properties and the observed data
For a welldefined causal question, investigators can specify a set of eligibility criteria that define an actual population of individuals to whom research findings would be applicable, in the sense that we can in principle identify all individuals who meet the criteria. For instance, when designing a randomized trial, the trial eligibility criteria define an actual population of all trialeligible individuals. In this paper, we view the actual population as a simple random sample from an (infinite) superpopulation of individuals; we refer to this superpopulation as the target population [10]. We are interested in causal quantities that pertain to the target population or to its subsets (e.g., defined by trial participation status).
To introduce some notation, let
denote a vector of
baseline covariates; the treatment assignment indicator; the observed outcome; and the trial participation indicator, with for randomized individuals and for nonrandomized individuals (individuals who are either not invited to participate in the trial or who are invited but decline). To capture the notion that some nonrandomized individuals in the actual population () may not be sampled, let be an indicator for whether an individual in the actual population is sampled and contributes data to the analyses, with for sampled individuals and for nonsampled individuals..We can now describe the sampling properties that underlie nested and nonnested study designs. These properties describe how the observed study sample relates to the actual population; the underlying actual population and (hypothetical) target population are the same. Figure 1 illustrates the conceptual relationships between designs, their sampling properties, and the observed data.
In the main text of this paper, we consider simple random samples, with known or unknown sampling probabilities, from the actual population or from the nonrandomized subset of the actual population. As we discuss below, our main results, with minor modifications, hold when the sampling probability is a known function of auxiliary baseline covariates rather than a known constant (i.e., when we have random sampling, not simple random sampling). Allowing the sampling probabilities to depend on auxiliary covariates, however, does not lead to additional insights regarding study design [11]; for this reason, in the main text, we assume that the sampling probability does not depend on covariates.
1.1 Nested trial designs
We consider two variants of the nested trials design: when a census of the actual population is taken and when the nonrandomized individuals are subsampled.
Census of the actual population: In this variant of the nested trial design, the individuals contributing data to the analysis are assumed to be a census of the actual population, that is,
thus, nested trial designs can be viewed as simple random samples from the superpopulation. In this design, it is common to define the target population implicitly, based on the actual population in which the trial is nested. For example, in comprehensive cohort studies [12], investigators nest a trial within a cohort of all individuals who met the trial’s eligibility criteria and were invited to participate in the trial. They then define the target population as the population from which cohort members (i.e., the actual population of trialeligible individuals invited to participate in the trials) could have been a simple random sample. Thus, in this design, investigators need to ensure that the cohort represents the target population they are interested in; that is, the trial eligibility criteria need to be broad enough to address the research question and the individuals invited to participate in the trial (who form the cohort in which the trial is nested) need to represent the target population of interest.
Subsampling of nonrandomized individuals: In this variant, we collect data from all randomized individuals in the actual population but only collect baseline covariate data from a subsample of the nonrandomized individuals in the actual population, with sampling probability that is a known constant. The sampling properties can be summarized as
where is a known constant, with . Note that the nested trial design with a census of the actual population can be viewed as a special case of the subsampling design, with . Using is statistically less efficient than using , but may improve research economy, for example, if the collection of covariate data among nonrandomized individuals is expensive [11]. Furthermore, as noted, a variant of the nested trial design with subsampling allows the selection of nonrandomized individuals to depend on auxiliary baseline covariates; we show how our results extend to that case in the Appendix.
1.2 Nonnested trial designs
In nonnested trial designs, data from randomized and nonrandomized individuals are obtained separately. Investigators assume that data from all randomized individuals can be combined with data from a simple random sample of nonrandomized individuals from the actual population, with sampling probability that was an unknown constant (e.g., [4]). The sampling properties can be summarized as
An example of nonnested trial design is the composite dataset design [4, 7]. Here, investigators append the data from a randomized trial to data from a convenience sample of nonrandomized individuals, often obtained from routinely collected data sources (such as claims or electronic medical records databases, or prospective cohort studies). The assumption, often left unstated, is that the sample of nonrandomized individuals is a simple random sample from the population of nonrandomized individuals (or a welldefined subset thereof) to whom the investigators wish to extend the trial results [7, 4].
1.3 The observed data
In both nested and nonnested designs, we collect data on baseline covariates, treatment, and outcome from randomized individuals; in contrast, as we show in Section 3, only baseline covariate data are needed from nonrandomized individuals.
More specifically, for nested designs the observed data consist of realizations of
Because all randomized individuals are sampled, we have that . No covariate, treatment, or outcome data are available for nonsampled nonrandomized individuals, . Note also that in nested trial designs with a census of the actual population, the subset does not exist.
In nonnested trial designs (e.g., composite dataset designs), we typically do not know the number of nonsampled nonrandomized individuals, thus the observed data consist of realizations only of
2 Causal quantities of interest and identifiability conditions
2.1 Causal quantities of interest
In order to define causal quantities, let be the potential (counterfactual) outcome under intervention to set treatment to [13, 14]. We are interested in the mean of these potential outcomes in the target population or in the nonrandomized subset of the target population . For example, captures the outcome under the strategy of treating all individuals in the target population with . And it is often scientifically and methodologically interesting to compare against , to examine whether the potential outcome mean under treatment differs among trialparticipants and nonparticipants in the target population [3].
2.2 Identifiability conditions
For all study designs, the following identifiability conditions are sufficient to extend inferences from a clinical trial to a target population [3, 7]:
(1) Consistency of potential outcomes: interventions are welldefined, so that if then . Implicit in this notation is that the offer to participate in the trial and trial participation itself do not have an effect on the outcome except through treatment assignment (e.g., there are no Hawthorne effects).
(2) Mean exchangeability among trial participants: . This condition is expected to hold because of randomization (marginal or conditional on ).
(3) Positivity of treatment assignment in the trial: for each and each with positive density among randomized individuals, . This condition is also expected to hold because of randomization.
(4) Mean generalizability (exchangeability over ): for each . For binary , this condition implies the mean transportability condition (provided both conditional expectations are welldefined).
(5) Positivity of trial participation: for each with positive density in the target population, .
In the conditions listed above, we have used generically to denote baseline covariates. It is possible, however, that strict subsets of are adequate to satisfy different exchangeability conditions. For example, in a marginally randomized trial the mean exchangeability among trial participants holds unconditionally. Furthermore, to focus on issues related to selective trial participation, we will assume complete adherence to the assigned treatment and no losstofollowup.
2.3 Trial eligibility criteria and choice of target population
Now that we have specified the causal quantities of interest and listed identifiability conditions, we can consider the choice of target population in more detail. As noted, the target population should be determined by the scientific question investigators hope to address. In many cases, when using the methods described in this paper, it is sensible to limit the target population to the population of individuals meeting the trial eligibility criteria or to a subset of that population. To the extent that the variables used to define the trial eligibility criteria are needed to satisfy the mean generalizability condition, the restriction to trialeligible individuals is needed for the positivity of trial participation condition to hold – individuals not meeting the criteria are not allowed to enter the trial. In some cases, however, investigators may be able to argue that only a subset of the variables used to determine trial eligibility are necessary for the mean generalizability condition to hold. In such cases, the target population can be broader than the population of trial eligible individuals. The essential requirement is that the distributions of covariates needed to satisfy the mean generalizability condition among randomized and nonrandomized individuals should have common support.
3 Identification via the gformula
We begin by considering identification by the gformula [15]. Using the identifiability conditions of Section 2, it is straightforward to show that the potential outcome mean in the target population [3] can be reexpressed as
(1) 
where denotes the distribution of in the target population.
The potential outcome mean among nonrandomized individuals in the target population [7] can be reexpressed as
(2) 
where denotes the distribution of among nonrandomized individuals in the target population (i.e., the subset with ).
First, we note that both results involve the conditional expectation of the outcome among trial participants assigned to treatment , . Because both nested and nonnested designs assume that all randomized individuals are sampled, this expectation is identifiable in both designs.
Next, we turn out attention to the identification of and , which are necessary to identify and , respectively. There are interesting differences between the designs when it comes to identifying these distributions and we consider each design individually below.
3.1 Nested trial designs
Census of the actual population: Identification is most straightforward in this design, because data are available from all members of the actual population (both randomized and nonrandomized) and the actual population is a simple random sample from the target population. Thus, is identifiable. Furthermore, in this design, every subgroup of the actual population defined on the basis of baseline covariates or trial participation is a simple random sample from the corresponding subgroup in the target population. Thus, the distribution of covariates among nonrandomized individuals can also be identified. It follows that all the components on the righthandsides of (1) and (2) are identifiable, establishing that and are identifiable.
Subsampling of nonrandomized individuals: For this design, identification of the marginal distribution of is slightly more involved because the nonrandomized individuals contributing data to the analysis are a subsample from the nonrandomized individuals in the actual population.
First, by the law of total probability, for binary
,Clearly, , for is identifiable because the randomized and nonrandomized sampled individuals are simple random samples of the target population subsets with and , respectively. The only difficulty, then, is identification of the marginal probability of trial participation, , because . As we show in the Appendix, under the sampling properties of the nested trial design with subsampling of nonrandomized individuals,
(3) 
The odds of nonparticipation in the trial among sampled individuals,
, are identifiable; and, as defined in Section 1.1, is a known constant. It follows that is identifiable and, consequently, is also identifiable because all the components of the integral on the righthandside of (1) are identifiable.Turning our attention to , we note that it is identifiable because the sampled nonrandomized individuals are a simple random sample from the nonrandomized individuals in the actual population. It follows that is identifiable because all the components of the integral on the righthandside of (2) are identifiable.
3.2 Nonnested trial designs
Using an argument parallel to that for nested trial designs with subsampling, when the probability of sampling a nonrandomized individual is unknown, the probability of trial participation, , can be expressed in the form of equation (3), substituting the for ,
Because, as defined in Section 1.2, is an unknown constant, is not identifiable and consequently is also not identifiable.
Turning our attention to , we see that it is identifiable because the nonrandomized individuals contributing data to the analysis are a simple random sample from the nonrandomized individuals in the actual population (even though the sampling probability is unknown). It follows that is identifiable in nonnested trial designs because all the components of the integral in (2) are identifiable.
4 Identification via IP weighting
There has been considerable recent interest [1, 3, 2, 4, 7] in using weighting methods to identify the potential outcome means in equations (1) and (2), because the specification of models for the probability of trial participation is often considered a somewhat easier task than the specification of models for the outcome among trial participants.
First, consider , which we argued is identifiable in nested trials. As shown in [3], we can reexpress the righthandside of (1) as
(4) 
where denotes the indicator function.
Now, consider , which we argued is identifiable by the gformula in both nested and nonnested trials. As shown in [7], we can reexpress the righthandside of (2) as
(5) 
The probability of treatment among trial participants, is under the control of the investigators and does not pose any difficulties for identification of either functional. Now, for each design, we focus our attention on the conditional probability or the conditional odds of trial participation, which appear in expressions (4) and (5), respectively.
4.1 Nested trial designs
Census of the actual population: Identification of in this design is an obvious consequence of the fact that individuals contributing data to the analysis are a simple random sample from the target population. In other words, because we have sampled all individuals in the actual population, which is a simple random sample of the target population, .
Subsampling of nonrandomized individuals: Identification of is only a little more difficult when we sample nonrandomized individuals from the actual population. As we show in the Appendix, under the sampling properties of this design,
(6) 
where the conditional odds of trial participation among sampled individuals, , are identifiable and is a known constant defined in Section 1.1. It follows that is identifiable and the odds of trial participation can be written as
(7) 
In sum, the IP weighting reexpressions of the functionals of interest are identifiable in nested trial designs.
4.2 Nonnested trial designs
We can use an argument parallel to that for nested trial designs with subsampling, to establish that, when the sampling probability for nonrandomized individuals is unknown, the probability of trial participation, , can be expressed as,
(8) 
Because, as defined in Section 1.2, is unknown, the conditional probability of trial participation, which appears on the right hand side of (4), is not identifiable; this confirms our earlier result that cannot be identified in nonnested trials.
Furthermore, the conditional odds of trial participation are also not identifiable because they depend on . In fact, using equation (7), substituting for , we see that the odds of trial participation in the target population are, up to an unknown multiplicative constant, equal to the odds of trial participation among sampled individuals,
(9) 
We have come to an apparent conflict: the right handside of (5) involves the conditional odds of trial participation, a quantity that is not identifiable in nonnested designs. Yet, we argued in the previous section that the lefthandside of (5) is identifiable. The conflict can be easily resolved by noting that, because both the numerator and the denominator of (5) are multiplied by the unknown constant , which cancels out, identification via IP weighting is possible (see the appendix of [7] for technical details).
Table 1 summarizes the sampling properties and identification results for each study design.
5 Estimating the probability of trial participation
In realistic analyses, the dimension of will be fairly large, necessitating some modeling assumptions about or [16]. In this section we discuss the relationship between study design and model specification and estimation approaches.
5.1 Nested trial designs
Census of the actual population: In this type of nested trial design, it is straightforward to estimate the probability of trial participation, , in the sense that we can use the model we believe is most likely to be correctly specified for the target population.
For concreteness, suppose that we are willing to assume a parametric model,
for the probability of trial participation in the target population, , with a finite dimensional parameter. In the nestedtrial designs with a census of nonrandomized individuals, we typically estimate the parameters by maximizing the likelihood functionwhere , and is the number of individuals in the study (i.e., the actual population). Under reasonable technical conditions [17], the usual maximum likelihood methods use a samplesize normalized objective function that converges uniformly in probability to
(10) 
For example, when is a logistic model,
is the large sample limit of the samplesize normalized loglikelihood function for logistic regression.
Subsampling of nonrandomized individuals: When we subsample of the nonrandomized individuals in the actual population, it is not possible to maximize the likelihood function above, because data are not available from all nonrandomized individuals in the actual population. A natural idea is to use equation (6), which provides an explicit formula for identifying the conditional probability of trial participation, using the probability of trial participation among sampled individuals, , and the sampling probability for nonrandomized individuals, . When modeling the probability of trial participation among sampled individuals, however, the following difficulty arises: in general, when sampling depends on trial participation status, the correctly specified model for trial participation does not have the same form as the correctly specified model in the target population, with the notable exception of the logistic regression model [18, 19]. This implies that naive estimation of the parameters of the model for trial participation among sampled individuals will typically be inconsistent for the population model.
Because the sampling probability of nonrandomized individuals is known, we can use the following weighted pseudolikelihood function, which only uses data from sampled individuals [18, 20],
with . Weighted maximum likelihood methods use a samplesize normalized objective function that converges uniformly in probability to
(11) 
which is restricted to sampled individuals ().
As we show in the Appendix, under the sampling properties for this design, the large sample limits of the objective functions in (10) and (11) are equal, . It follows that, under reasonable technical conditions [17], weighted likelihood estimation of in the nested trial design with subsampling of nonrandomized individuals converges in probability to the same parameter as unweighted regression in the actual population.
In practical terms, as long as a reasonable parametric model can be specified for the target population, the model parameters can be estimated using weighted maximum likelihood methods [18] on data from sampled individuals, with individual level weights equal to 1 for randomized individuals, ; for sampled nonrandomized individuals, ; and 0 for unsampled individuals, .
5.2 Nonnested trial designs
In nonnested trial designs, the weighting approach described above is not applicable because the sampling probability of nonrandomized individuals is unknown. Provided, however, that the sampling probability does not depend on (i.e., the assumed sampling property), we can show that, if a logistic model for trial participation is correctly specified in the target population, then a logistic model is correctly specified in the nonnested trial design. To see this, suppose that we are willing to assume a logistic regression model in the population, such that
Using the result in (8) and taking logarithms, we have that
Equating the righthandsides of the last two equations, we obtain
(12) 
where , a wellknown result in the context of casereferent studies [21]. Thus, if a logistic model is correctly specified in the target population, then a model of the same functional form is correctly specified in the nonnested trial design. In fact, the coefficients in the two models are equal, and only the intercept differs. Because , : the subsampling of nonrandomized patients simply results in an intercept that is “shifted” upwards. As we have shown in the section on IP weighting, the resulting shift in the odds of participation does not affect identification of the potential outcome mean in the nonrandomized individuals, , which is the parameter of interest in nonnested trial designs with unknown sampling probability of nonrandomized individuals.
6 Discussion
We presented a unified description of study designs for extending inferences from randomized trials to a welldefined target population and showed that commonly invoked identifiability conditions need to be combined with the sampling properties of each study design in order to determine which causal quantities can be identified. Our approach uses a superpopulation framework, which is a natural choice for extending trial findings beyond the sample of randomized individuals [24].
In nonnested trial designs, where the sampling probability for nonrandomized individuals is unknown, the marginal potential outcome means in the target population are not identifiable, but the potential outcome means in the subpopulation of nonrandomized individuals are identifiable. This restriction may be less severe than it appears: for most trials, we want to estimate the effect of applying the interventions to a new population, which can be represented by a wellchosen sample of nonrandomized individuals [7]. In any case, when available, knowledge of the sampling probability of nonrandomized individuals can be used to mitigate these limitations, without requiring the collection of covariate information from all nonrandomized individuals in the actual population. Thus, in general, nested trial designs will often be the preferred approach for generalizing trial findings when it is possible to define and sample the actual population when a randomized trial is planned. Such nested trial designs will typically have broad (pragmatic [25]) eligibility criteria and define the target population as the population of individuals meeting the trial eligibility criteria. When that is not possible, as is the case in already completed randomized trials, nonnested trial designs might be a reasonable alternative. For example, in nonnested trial designs, the comparison of estimates for the potential outcome means among randomized, , and nonrandomized individuals, , is of practical interest: provided the identifiability conditions hold, if , we may conclude that the trial results are likely generalizable (up to sampling variability); in contrast, if the estimates are different, trial results may not be generalizable.
We also showed that the different study designs have implications for identifying and estimating the conditional probability of trial participation. This probability is of inherent interest because it captures aspects of decisionmaking related to trial participation [26, 27]. We showed that the probability is identifiable in nested trial designs, but not in nonnested trial designs (e.g., composite dataset designs). Indeed, any reasonable parametric model for the probability of participation in the population can be identified in nested trial designs. In nested trial designs with sampling of nonrandomized individuals, estimation of model parameters can be facilitated by the use of weighted maximum likelihood estimation where randomized patients are given weight 1 and nonrandomized patients are given weight equal to the inverse of the probability of being sampled among nonrandomized individuals in the actual population. In nonnested trial designs, model specification is complicated by the fact that, when sampling depends on trial participation status, the model for the probability of trial participation among sampled individuals is not of the same form as the model in the population (the logistic regression model being a notable exception [18]).
The probability of trial participation in the target population is also important for identification and estimation using inverse probability (or odds) weighting methods. Our argument about the odds of participation after selection of nonrandomized individuals being equal to the odds of participation in the target population up to an unknown multiplicative constant, clarifies how the validity of estimators when using composite datasets designs [4, 7] depends critically on the assumed sampling properties.
Astute readers will have noticed the many connections between our results and the theory of casereferent (casecontrol) studies [28, 21, 18, 20, 19]. Indeed, our approach can be placed in the casebase paradigm, viewing randomized individuals as “cases” in cumulative incidence casereferent study [28] nested in the “cohort” of the actual population. An interesting parallel with casereferent studies: the difficulty in specifying the population of nonrandomized individuals that should be sampled in composite dataset designs is similar in nature to the validity issues of casereferent studies with a secondary base [29, 30, 31].
In this paper, for simplicity, we focused on causal quantities that are most meaningful for point treatments with complete adherence and no loss to followup. Our results can be extended to address timevarying treatments using wellknown extensions of the identifiability conditions for randomized trials [15, 32, 24], without any changes to the sampling properties or the modeling assumptions about the probability of trial participation. Perhaps, then, the most consequential causal assumption that we required was that the invitation to participate in the trial and trial participation itself do not have an effect on the outcome except through treatment assignment. Unless investigators are willing to contemplate much more complex study designs involving multistage data collection about (and possibly randomization of) the invitation to participate, trial participation itself, and treatment assignment [33], our results are best viewed as applying to trials where the notthroughtreatment effects of the invitation to participate in the trial and of trial participation are negligible compared to the effects of treatment. For example, they are applicable to pragmatic randomized trials embedded in large healthcare systems or registries, where trial procedures other than treatment assignment can be assumed to be similar to usual medical practice [34, 25, 35].
7 Figure
8 Table
Study design  Sampling probabilities 




Nestedtrial 

and  

and  
Nonnested trial 

Not identifiable  Not identifiable 
References
 [1] S. R. Cole and E. A. Stuart, “Generalizing evidence from randomized clinical trials to target populations: the ACTG 320 trial,” American Journal of Epidemiology, vol. 172, no. 1, pp. 107–115, 2010.
 [2] A. L. Buchanan, M. G. Hudgens, S. R. Cole, K. R. Mollan, P. E. Sax, E. S. Daar, A. A. Adimora, J. J. Eron, and M. J. Mugavero, “Generalizing evidence from randomized trials using inverse probability of sampling weights,” Journal of the Royal Statistical Society. Series A (Statistics in Society), vol. 181, no. 4, pp. 1193–1209, 2018.
 [3] I. J. Dahabreh, S. E. Robertson, E. J. T. Tchetgen, E. A. Stuart, and M. A. Hernán, “Generalizing causal inferences from individuals in randomized trials to all trialeligible individuals,” Biometrics, 2018.
 [4] D. Westreich, J. K. Edwards, C. R. Lesko, E. Stuart, and S. R. Cole, “Transportability of trial results using inverse odds of sampling weights,” American Journal of Epidemiology, vol. 186, no. 8, pp. 1010–1014, 2017.
 [5] C. R. Lesko, A. L. Buchanan, D. Westreich, J. K. Edwards, M. G. Hudgens, and S. R. Cole, “Practical considerations when generalizing study results: a potential outcomes perspective,” Epidemiology, vol. 28, no. 4, pp. 553–561, 2017.
 [6] K. E. Rudolph and M. J. van der Laan, “Robust estimation of encouragement design intervention effects transported across sites,” Journal of the Royal Statistical Society. Series B (Statistical Methodology), vol. 79, no. 5, pp. 1509–1525, 2017.
 [7] I. J. Dahabreh, S. E. Robertson, E. A. Stuart, and M. A. Hernán, “Transporting inferences from a randomized trial to a new target population,” arXiv preprint arXiv:1805.00550, 2018.
 [8] N. Keiding and T. A. Louis, “Perils and potentials of selfselected entry to epidemiological studies and surveys,” Journal of the Royal Statistical Society. Series A (Statistics in Society), vol. 179, no. 2, pp. 319–376, 2016.
 [9] M. Hernán, “Discussion of “Perils and potentials of selfselected entry to epidemiological studies and surveys”,” Journal of the Royal Statistical Society. Series A (Statistics in Society), vol. 179, no. 2, pp. 346–347, 2016.

[10]
J. M. Robins, “Confidence intervals for causal parameters,”
Statistics in Medicine, vol. 7, no. 7, pp. 773–785, 1988.  [11] I. J. Dahabreh, M. A. Hernán, S. E. Robertson, A. Buchanan, and J. A. Steingrimsson, “Generalizing trial findings in nested trial designs with subsampling of nonrandomized individuals,” arXiv preprint arXiv:1902.06080, 2019.
 [12] M. Olschewski and H. Scheurlen, “Comprehensive cohort study: an alternative to randomized consent design in a breast preservation trial.,” Methods of Information in Medicine, vol. 24, pp. 131–134, 1985.
 [13] D. B. Rubin, “Estimating causal effects of treatments in randomized and nonrandomized studies.,” Journal of Educational Psychology, vol. 66, no. 5, p. 688, 1974.
 [14] J. M. Robins and S. Greenland, “Causal inference without counterfactuals: comment,” Journal of the American Statistical Association, vol. 95, no. 450, pp. 431–435, 2000.
 [15] J. M. Robins, “A new approach to causal inference in mortality studies with a sustained exposure period – application to control of the healthy worker survivor effect,” Mathematical Modelling, vol. 7, no. 9, pp. 1393–1512, 1986.

[16]
J. M. Robins and Y. Ritov, “Toward a curse of dimensionality appropriate (CODA) asymptotic theory for semiparametric models,”
Statistics in Medicine, vol. 16, no. 3, pp. 285–319, 1997.  [17] W. K. Newey and D. McFadden, “Large sample estimation and hypothesis testing,” Handbook of econometrics, vol. 4, pp. 2111–2245, 1994.
 [18] C. F. Manski and S. R. Lerman, “The estimation of choice probabilities from choice based samples,” Econometrica: Journal of the Econometric Society, vol. 45, no. 8, pp. 1977–1988, 1977.
 [19] A. J. Scott and C. Wild, “Fitting logistic models under casecontrol or choice based sampling,” Journal of the Royal Statistical Society. Series B (Methodological), vol. 48, no. 2, pp. 170–182, 1986.
 [20] S. R. Cosslett, “Maximum likelihood estimator for choicebased samples,” Econometrica: Journal of the Econometric Society, vol. 49, no. 5, pp. 1289–1316, 1981.
 [21] N. Mantel, “Synthetic retrospective studies and related topics,” Biometrics, vol. 23, no. 3, pp. 479–486, 1973.
 [22] R. L. Prentice and R. Pyke, “Logistic disease incidence models and casecontrol studies,” Biometrika, vol. 66, no. 3, pp. 403–411, 1979.
 [23] N. E. Breslow, J. M. Robins, J. A. Wellner, et al., “On the semiparametric efficiency of logistic regression under casecontrol sampling,” Bernoulli, vol. 6, no. 3, pp. 447–455, 2000.
 [24] M. A. Hernán and J. M. Robins, Causal inference (forthcoming). Boca Raton, FL: Chapman & Hall/CRC, 2019.
 [25] I. Ford and J. Norrie, “Pragmatic trials,” New England Journal of Medicine, vol. 375, no. 5, pp. 454–463, 2016.

[26]
D. McFadden, “Conditional logit analysis of qualitative choice behavior,” in
Frontiers in econometrics (P. Zarembka, ed.), ch. 4, pp. 105–142, Berkeley, CA: Institute of Urban and Regional Development, University of California Berkeley, CA, 1973.  [27] E. A. Stuart, S. R. Cole, C. P. Bradshaw, and P. J. Leaf, “The use of propensity scores to assess the generalizability of results from randomized trials,” Journal of the Royal Statistical Society. Series A (Statistics in Society), vol. 174, no. 2, pp. 369–386, 2011.
 [28] O. S. Miettinen, “Estimability and estimation in casereferent studies,” American Journal of Epidemiology, vol. 103, no. 2, pp. 226–235, 1976.
 [29] O. S. Miettinen, “The “casecontrol” study: valid selection of subjects,” Journal of Chronic Diseases, vol. 38, no. 7, pp. 543–548, 1985.
 [30] O. S. Miettinen, “Response: The concept of secondary base,” Journal of Clinical Epidemiology, vol. 43, no. 9, pp. 1017–1020, 1990.
 [31] S. Wacholder, J. K. McLaughlin, D. T. Silverman, and J. S. Mandel, “Selection of controls in casecontrol studies: I. principles,” American Journal of Epidemiology, vol. 135, no. 9, pp. 1019–1028, 1992.
 [32] J. M. Robins, “Marginal structural models versus structural nested models as tools for causal inference,” in Statistical models in epidemiology, the environment, and clinical trials, pp. 95–133, Springer, 2000.
 [33] J. J. Heckman, “Randomization and social policy evaluation,” Tech. Rep. 107, National Bureau of Economic Research, Cambridge, Mass., USA, 1991.
 [34] T.P. van Staa, L. Dyson, G. McCann, S. Padmanabhan, R. Belatri, B. Goldacre, J. Cassell, M. Pirmohamed, D. Torgerson, S. Ronaldson, et al., “The opportunities and challenges of pragmatic pointofcare randomised trials using routinely collected electronic records: evaluations of two exemplar trials,” Health Technology Assessment, vol. 18, no. 43, pp. 1–146, 2014.
 [35] N. K. Choudhry, “Randomized, controlled trials in health insurance systems,” New England Journal of Medicine, vol. 377, no. 10, pp. 957–964, 2017.
Appendix A Identification of the probability of trial participation in nested trial designs with subsampling
a.1 Identification of the marginal probability of trial participation
Using the definition of conditional probability and rearranging,
Taking the ratio of the above expressions and exploiting the sampling properties for nonnested trial designs,
With a bit of algebra, the above expression can be rearranged to show that
By setting we see that in the nestedtrial design with a census of nonrandomized individuals .
a.2 Identification of the conditional probability of trial participation
The argument for the conditional probability is parallel to the one presented above for the marginal probability. Again, using the definition of conditional probability,
Taking the ratio of the above expressions and exploiting the sampling properties for nonnested trial designs,
The above expression can be rearranged to show that
By setting we see that in the nestedtrial design with a census of nonrandomized individuals .
Appendix B Estimating the conditional probability of trial participation
We outline the proof for the convergence in probability of the estimators for the conditional probability of trial participation described in the main text, without delving into the technical conditions needed to make the arguments rigorous.
Consider the likelihood function for the nested trial design with a census of the actual population,
and, the pseudolikelihood function for the nested trial design with known sampling probability of the nonrandomized individuals,
For , the sample sizenormalized objective function to be maximized is
Provided the technical conditions for the uniform law of large numbers obtain, the above objective function converges uniformly in probability, in the sense of the definition in Section 2.1 of
[17], toBy Theorem 2.1 of [17], if is uniquely maximized at , the parameter space is compact, and is continuous, then the estimator obtained by maximizing , converges in probability to , that is, .
For , the sample sizenormalized objective function to be maximized is
Because is assumed to be bounded away from 0, and provided the technical conditions for the uniform weak law of large numbers obtain, the above objective function converges uniformly in probability to
We will now show that .
By design, if , then ; if , then . Thus, to establish the result we only need to show that
Starting from the righthandside,
which establishes the result.
Because , it follows that the maximizer of , , converges in probability to , that is, .
To obtain the asymptotic distribution of the estimators, we need additional technical conditions as in Theorem 3.1 of [17]; provided these conditions hold, and are asymptotically normal.
Appendix C Nested trial design with covariatedependent sampling probabilities
c.1 Sampling properties
As noted in the main text, a more general version of the nested trial design assumes that the sampling probabilities for nonrandomized individuals depend on baseline auxiliary covariates. Let , where represents baseline auxiliary covariates that are available on all members of the actual population in which the trial is nested, and represents covariates that are only measured among randomized individuals () and sampled nonrandomized individuals ().
The identifiability conditions and identification results remain the same as in the main text; but the sampling properties of this design are
where is a known function that only depends on , allowing the sampling of nonrandomized individuals to depend on the auxiliary covariates that are available from all members of the actual population.
c.2 Identification of the conditional probability of trial participation
Using an argument similar to the case when the sampling probability for nonrandomized individuals does not depend on covariates, we obtain
which is identifiable because the inverse of the conditional odds of trial participation in the sampled data, , are identifiable, and is known, by design.
c.3 Estimating the probability of trial participation by weighted regression
As before, we assume a model for with finitedimensional parameter . The weighted pseudolikelihood function becomes
Note that the only difference between and is that the weights in the former depend on . The sample sizenormalized objective function to be maximized is
Because is assumed to be bounded away from 0, and provided the technical conditions for the uniform weak law of large numbers obtain, the above objective function converges uniformly in probability to
We will now show that .
As noted above, by design, if , then ; if , then . Thus, to establish the result we only need to show that
Starting from the righthandside,
which establishes the result.
Because , it follows that the maximizer of , , converges in probability to , that is, .
In practical terms, this result suggests that the conditional probability of trial participation in the target population can be estimated using a weighted regression of on among sampled patients, using weights equal to 1 for randomized patients (all of whom are sampled); for sampled nonrandomized individuals; for nonsampled nonrandomized individuals.
As above, provided the technical conditions of Theorem 3.1 of [17] hold, is asymptotically normal.
24h60m60s..32 24h60m60s
transportability_study_design, Date: August 24, 2019 Revision: 31.0