1 Study designs for transporting trial findings
The most natural study design for transporting inferences from trial participants to a new target population, is a trial nested within a cohort of eligible individuals, including those who refuse to participate in the randomized component of the study. In this design, investigators collect baseline covariate information from all cohort members, but often collect treatment and outcome information only from randomized individuals. For example, this setup occurs in comprehensive cohort studies  and in trials embedded in health-care systems where data are routinely collected from all members of the system .
Another design for transporting inferences to a new target population uses an artificial composite dataset created by appending the data from a completed trial to a separately obtained sample from the target population . As in the trial-nested-within-cohort design, baseline covariate information is available from all patients, but treatment and outcome information is only available from randomized participants (e.g., when the interventions examined in the trial are not commercially available and no observational data can be collected). This setup arises often in drug development and regulatory settings, because exposure and outcome data are available only from a small number of clinical trial participants prior to drug approval, but baseline covariate data can be collected from large samples of untreated individuals who would be eligible for participation in the trials. The setup also arises in public policy research, when randomized trials are conducted in selected samples but target population data are available from administrative databases or surveys.
The methods we describe in this tutorial can be used both in trials nested within cohorts of eligible individuals and composite datasets.
2 Causal contrasts
The data generated from the designs described in the previous section consist of independent observations, indexed by , on baseline covariates ; treatment ; outcome ; and a trial participation indicator that takes the value 1 for trial participants or 0 for non-participants. The data exhibit a special missingness pattern: for trial participants we have data on but for non-participants we only have data on . Table 1 shows the observed data structure for the special case of binary treatment.
We use the random variablesto denote the potential (counterfactual) outcome under intervention to set treatment to , possibly contrary to fact [18, 19]. We only consider discrete treatments in this tutorial; extensions to continuous treatments are straightforward. For any two treatments, , a causal effect of interest is the average treatment effect in the target population of eligible non-participants,
The average treatment effect among non-participants is not equal to the average treatment effect among trial participants, , when the effect of treatment varies over baseline covariates that are differentially distributed between trial participants and non-participants [20, 21].
When using the trials-nested-in-cohorts design, another causal effect, the average effect in the population of all eligible individuals,
, may also be of interest, and we have discussed its identification and estimation in previous work. Even if the primary target of inference for the trial-nested-in-cohort design is , investigators are typically also interested in efficiently estimating the effect among non-participants, . Comparing the effect among eligible non-participants against the effect among trial participants is often of substantial scientific and policy interest. Importantly, when using the composite dataset design, is not a meaningful estimand because the population implied by the composite dataset design is a mixture of the population of trial participants and the target population, with arbitrary mixing proportions (determined by the sample size of the trial and target sample). Thus, for composite datasets, is the more reasonable target of inference.
In the remainder of this tutorial we focus on the identification and estimation of the components of the average treatment effect among eligible non-participants, , under either of the designs discussed in Section 1.
3 Identifiability conditions & identification
3.1 Identifiability conditions
We now discuss sufficient conditions for identifying the mean of the potential outcomes under each treatment in in the population of trial-eligible non-participants, .
(1) Consistency of potential outcomes: The observed outcome for the th individual who received treatment equals that individual’s potential outcome under the same treatment, that is, if , then .
(2) Mean exchangeability (over ): Among trial participants, the potential outcome mean under treatment is independent of treatment, conditional on baseline covariates,
We expect conditional exchangeability in the trial to hold, regardless of whether the randomization was unconditional or conditional on covariates. The mean exchangeability assumption is weaker than the assumption of exchangeability in distribution, .
(3) Positivity of treatment assignment: In the trial, the probability of being assigned to each of the treatments being compared, conditional on the covariates needed for exchangeability, is bounded away from zero: for each .
(4) Mean transportability (exchangeability over ): The potential outcome mean is independent of trial participation, conditional on baseline covariates:
Again, this assumption is weaker than the often-invoked assumption of conditional transportability in distribution, .
(5) Positivity of trial participation: The probability of participating in the trial, conditional on the covariates needed to ensure mean transportability, is positive,
Consistency, mean exchangeability, and positivity of treatment assignment are expected to hold in (marginally or conditionally) randomized clinical trials of well-defined interventions. In contrast, mean transportability and positivity of trial participation are strong and largely untestable assumptions, and their plausibility needs to be assessed on the basis of substantive knowledge when transporting inferences from a trial to a new target population.
Under the above assumptions, as shown in the Appendix, for each treatment , the potential outcome mean in the target population of non-participants can be identified using the observed data,
These potential outcome means are inherently interesting. Furthermore, the treatment effect among trial-eligible non-participants can be identified as
In the next section we review methods for estimating .
4 Estimating potential outcome means in the target population
We consider three approaches for estimating the potential outcome mean in the target population of non-participants: (1) outcome modeling followed by standardization; (2) probability of trial participation modeling
followed by inverse odds weighting; and (3)doubly robust approaches that combine outcome and trial participation modeling. Under the consistency assumption, these methods can be thought of as solutions to a missing data problem for the outcome, where the missing data indicator is the product , and is the indicator function. In the Appendix, we show that the three approaches provide consistent estimators (i.e., converge in probability) for provided the identifiability conditions hold and models are correctly specified; here, we focus on developing intuition about the methods. Throughout, we assume that the models for the probability of participation and the expectation of the outcome are parametric (finite-dimensional), as is usually the case in applied work (we address more flexible models in the Discussion section).
4.1 Outcome modeling followed by standardization
The first approach transports inferences from trial participants to the population of non-participants by “extrapolating” an outcome regression model fit among the former to a sample of the latter . In essence, we use data from trial participants to estimate models for the expectation of the outcome and then standardize the model predictions to the baseline covariate distribution of the non-participants. The estimator for the potential outcome mean among eligible non-participants under treatment is
where is a predicted value from a model (with finite-dimensional parameter ) for . We usually estimate separate outcome models in each treatment group of the trial, to allow for all possible treatment covariate interactions. When the identifiability conditions hold and the model is correctly specified the estimator is consistent for the potential outcome mean among eligible non-participants.
4.2 Trial participation modeling and odds weighting
The second approach transports inferences from trial participants to the population of non-participants using a model for the probability of trial participation [6, 15]. In essence, we are treating the randomized trial participants as a sample from the target population with sampling probabilities that depend on baseline covariates and need to be estimated, an idea that connects this approach to survey sampling . Specifically, we estimate the potential outcome mean under treatment among trial non-participants as
with defined as
Here, is a predicted value from a model (with finite-dimensional parameter ) for the probability of participation in the trial , and is a predicted value from a model (with finite-dimensional parameter ) for the probability of being assigned to treatment among trial participants
. The probability of participation in the trial is, in general, unknown and has to be estimated (e.g., by fitting a logistic regression model). In contrast, the probability of treatment in the randomized trial is known (determined by the investigators) and the true values can be used instead of estimated values (estimating the probability may, however, lead to smaller standard errors). We refer to the estimator in (2) as an “odds of participation weighted” estimator and the weights as “odds of participation” weights because is the inverse of the estimated odds of trial participation conditional on baseline covariates.
An alternative odds of participation weighted estimator normalizes the weights to sum to 1,
In survey research, this estimator is referred to as the ratio estimator ; it can be obtained as the solution of the estimating equations of weighted least squares regression of the outcome on treatment, using weights equal to for trial participants and 0 for non-participants.
When the identifiability conditions hold and the model for the probability of participation is correctly specified both odds weighting estimators are consistent for the potential outcome mean among eligible non-participants. The small difference in the normalization of the weights between IOW1 and IOW2 can have a big effect when weights are highly variable , because estimator (2) is unbounded (i.e., it may produce estimates that fall outside the support of the outcome variable), whereas estimator (3) is bounded by the range of the observed outcome.
4.3 Doubly robust estimators
In practical applications, background knowledge is typically inadequate to ensure correct specification of the working models for the probability of participation or the expectation of the outcome, and misspecification of these models can lead to estimator inconsistency. We can gain some robustness to misspecification and increase efficiency by combining the two models to obtain doubly robust estimators that are consistent when either model is correctly specified [27, 28, 29]. Here, we examine three doubly robust estimators that are easy to implement in standard statistical software.
In-sample one-step doubly robust estimator:
The first doubly robust estimator we consider relies on estimating models for the conditional expectation of the outcome, ; the probability of trial participation, ; and (optionally) the probability of treatment among trial participants, . Predicted values from these models are then combined to obtain the unbounded estimator
where was defined in the previous section.
In-sample one-step doubly robust estimator with normalized weights:
Using the same strategy of normalizing the weights as for IOW2, an alternative, bounded (provided the outcome model is well-chosen) variant of DR1 is
Weighted regression doubly robust estimator:
A third doubly robust estimator involves fitting a model for the outcome conditional on covariates among trial participants, using a weighted regression with the weights as defined above, and then standardizing the predicted values, ,to the covariate distribution of eligible non-participants,
is the vector of estimated parameters from the weighted outcome regression. This estimator is bounded provided the outcome model is well-chosen and doubly robust when the outcome is modeled with a linear exponential family quasi-likelihood and the canonical link function [26, 31].
Provided the identification conditions hold, doubly robust estimators are consistent and asymptotically normal when either the model for the probability of participation or the expectation of the outcome is correctly specified 
. When both models are correctly specified the large-sample variance of the doubly robust estimators is less than or equal the variance of the inverse odds weighting estimators[28, 32]. When one of the two models is incorrectly specified, the asymptotic distribution of the doubly robust estimators remains normal (and centered on the true value) but their variance is increased. In rare cases, when misspecification of the outcome model is combined with highly variable weights, doubly robust estimators can perform worse than non-doubly robust estimators that use the same misspecified outcome model [33, 26].
When using parametric working models, all the estimators described above can be viewed as partial M-estimators  and it is possible to employ the usual “sandwich” approach to obtain their sampling variances (e.g., [35, 36]). Inference based on the non-parametric bootstrap , however, is easy to obtain with modern software and will often be preferred in practice.
5 Simulation study
We conducted a simulation study to examine the finite-sample performance of different estimators for the average treatment effect in the target population of eligible non-randomized patients.
5.1 Data generation
We run a factorial experiment using 3 trial sample sizes () 3 target population sample sizes () 2 magnitudes of departure from additive effects in the outcome model () 2 magnitudes of selection (), resulting in a total of 36 simulation scenarios.
We considered of 250, 500, or 1000 randomized participants and of 2,500, 5,000 or 10,000 non-randomized individuals. We generated baseline covariates for randomized trial participants (), as and ; ; . We then generated baseline covariates for the sample of eligible non-participants (), with ; ; . The difference in the means of the covariate distributions of trial participants and non-participants represents selection into the trial based on baseline covariates. The parameter controls selection on ; we used values 0 and 1, representing no and strong selection. Because the distribution of baseline covariates is homoskedastic over , a logistic regression model of on is correctly specified . We generated outcomes using the linear model
where is the main treatment effect, determines the magnitude of effect modification by ; , , and . We examined scenarios with different levels of effect modification by setting to 0 or 1; we set the “main” treatment effect to in all scenarios.
For each simulated dataset, we applied the estimators in equations (1) through (6), and also obtained a trial-only estimator of the treatment effect. All working models required for the different estimators were correctly specified, in the sense that the true models were nested within the parametric working models on which the estimators relied. Specifically, outcome models included main effects for all covariates and were fit separately in each arm; logistic regression models for trial participation and treatment included the main effects of all covariates; all models had intercept terms. We estimated the bias and variance for each estimator over 10,000 runs for each scenario.
5.2 Simulation results
summarize simulation results from selected simulation scenarios for continuous, normally distributed outcomes and linear outcome models. Additional simulation results are presented in Appendix TablesA1 and A2.
When all models were correctly specified, all estimators were approximately unbiased, even with fairly small trial and target population sample sizes. The outcome-model based estimator had the lowest variance, followed closely by the three doubly robust estimators. The probability of participation-based estimators had substantially larger variance than all other estimators; that variance, though, became smaller with increasing trial sample sizes. When trial sample size was much smaller than the target sample size, in the presence of strong selection on covariates, estimators that used weights normalized to sum to one (IOW2 and DR2 and DR3) had smaller variance compared to estimators that used unnormalized weights (IOW1 and DR1). As expected, in the presence of effect modification, the trial-only estimator gave different results compared to the estimators in equations (1) through (6). The trial-only estimator is biased for when selection into the trial depends on the effect modifier, but is, of course, unbiased for under very general conditions.
5.3 Code to implement the methods
In the Appendix, we provide R  code implementing the methods compared in the simulation study. Specifically, we provide a collection of basic stand-alone functions, one for each estimator in equations (1) through (6), using parametric working models estimated by standard maximum likelihood methods. Readers can modify the functions to incorporate alternative estimation approaches and to obtain bootstrap-based inference. To allow inference with standard errors obtained with the sandwich method, we also provide an implementation of the estimators using the R package geex . Lastly, we provide Stata code to reproduce the simulation study.
6 Transportability analyses for the Coronary Artery Surgery Study
The Coronary Artery Surgery Study (CASS) included a randomized trial nested within a cohort study, comparing coronary artery surgery plus medical therapy (henceforth, “surgery”) versus medical therapy alone for patients with chronic coronary artery disease. Of the 2099 eligible patients, 780 consented to randomization and 1319 declined. We excluded six patients for consistency with prior CASS analyses [41, 42] and in accordance with CASS data release notes; in total, we used data from a total of 2093 patients. Details about the design of the CASS are available elsewhere [43, 44]. Here, we focus on estimating the survival probability and treatment effects among eligible non-participants and comparing them against estimates obtained among trial participants.
We implemented the methods described in Section 4 to estimate the 10-year risk (cumulative incidence proportion) of death from any cause in the surgery and medical therapy groups, the risk difference, and the relative risk for the population of eligible patients who did not consent to randomization. Risks are reasonable measures of incidence in CASS because no patients were censored during the first 10 years of follow-up. The working models for the outcome, the probability of participation in the trial, and the probability of treatment were logistic regression models with the following covariates: age, severity of angina, history of previous myocardial infarction, percent obstruction of the proximal left anterior descending artery, left ventricular wall motion score, number of diseased vessels, and ejection fraction. We selected variables for inclusion in the models based on a previous analysis of the CASS data ; age and ejection fraction were modeled using restricted cubic splines with 5 knots 
. We used bootstrap re-sampling (10,000 samples of as many observations as in the dataset) to obtain percentile 95% confidence intervals.
Of the 2093 patients in the CASS dataset, 1686 had complete data on all baseline covariates (731 randomized, 368 to surgery and 363 to medical therapy; 955 non-randomized, 430 receiving surgery and 525 medical therapy); for simplicity, we only report analyses restricted to patients with complete data. Table 4 summarizes baseline covariates in trial participants (by treatment group) and non-participants. Figure 1 presents the kernel density of the estimated probability of trial participation for trial participants and non-participants and a kernel density of the estimated weights for trial participants. The sample proportion of non-participants divided by the sample average of the inverse odds of trial participation among trial participants was approximately 1.001.
Estimates of the 10-year risk (by treatment group), risk difference, and risk ratio are shown in Table 5. The outcome model-based estimator (OM), the inverse odds of participation estimators (IOW1 and IOW2), and the doubly robust estimators (DR1, DR2 and DR3) produced similar results, suggesting that findings are not driven by model specification decisions .
7 Transportability analyses in practice
We now discuss practical issues related to variable selection, other aspects of model specification, and positivity violations and highly variable weights, all of which arise in transportability analyses using the methods described above.
7.1 Practical considerations
Throughout, we have used to signify baseline covariates measured both among randomized trial participants and the sample from the target population. In principle, investigators can use any subset of the available covariates that satisfies the mean transportability assumption. When investigators are interested in estimating the potential outcome mean under each treatment (not only the average treatment effect), outcome predictors that are also associated with trial participation should be included in models for the outcome and the probability of participation (or both, when using doubly robust estimators). Including outcome predictors that are not associated with trial participation in models for the expectation of the outcome will often improve the precision of the outcome model-based and doubly robust estimators; including strong predictors of trial participation that are not associated with the outcome in regressions for the probability of participation will generally increase the variance of the odds weighted and doubly robust estimators without improving transportability. When investigators are primarily interested in the average treatment effect (instead of the potential outcome mean under each treatment), only effect modifiers (on the mean difference scale) need to be modeled [12, 8]. Because background knowledge about effect modification is typically very limited, even when interest is centered on treatment effect estimation, in practice it is probably best to include as many outcome predictors as possible in regression models for the expectation of the outcome or the probability of trial participation. We followed this strategy in our CASS re-analysis: we selected covariates for “adjustment” based on prior work on outcome modeling and used the same covariates when modeling trial participation, the outcome, and treatment in the trial.
Other aspects of model specification:
Models for the expectation of the outcome and the probability of trial participation need to be flexible in order to approximate the corresponding “true” conditional expectation/probability functions. This will often mean including non-linear terms (e.g., splines for continuous variables) or interactions between predictors. When modeling the expectation of the outcome (for outcome model-based or doubly robust estimators) we recommend fitting separate regression models in each treatment group in the trial, as we did in the CASS re-analysis (equivalent to fitting a single regression model that includes all possible treatment-covariate interactions).
More broadly, model specification for transportability analyses involves trading off bias and variance using informal  or formal methods (e.g., ). When background knowledge suggests that a large number of covariates need to be modeled, formal model specification search methods can be particularly helpful. In our experience, especially when using composite datasets, the variables measured both among trial participants and the sample of the target population are often few and model specification is not a pressing concern (of course, such cases raise concerns about violations of the mean transportability assumption and necessitate sensitivity analyses, which we address in the discussion). When richer data are available (e.g., when trials are nested in cohorts of eligible individuals ), transportability analyses need to be combined with more sophisticated strategies for model specification search (e.g.,  provides an overview in the context of causal inference for observational studies, but the same principles apply to transportability analyses).
Positivity violations and highly variable weights:
To prevent structural violations of the positivity of trial participation assumption, investigators should ensure that the sample from the target population meets the trial eligibility criteria. For example, if the trial restricted enrollment to patients under 85 years of age, it is prudent to apply the same restriction in the sample of patients from the target population. When positivity is violated, odds weighting estimators are inconsistent, whereas outcome model-based and doubly robust estimators rely heavily on the specification of the outcome model (to extrapolate from participants to non-participants) . Empirical (finite-sample) violations of positivity can arise due to chance, particularly when the trial sample size is small or when the mean transportability assumption requires adjustment for continuous covariates or a large number of discrete covariates. Empirical violations of positivity increase bias and variance in a way that depends on the particular estimator being used, model specification, and the underlying data generating mechanism.
It is always a good idea to examine the distribution of the estimated probabilities of trial participation (even if using an outcome model-based estimator), because values near zero are warning signs for possible positivity violations. Inspection of the estimated probabilities of trial participation can be combined with diagnostics for positivity violations . It is also useful to inspect the distribution of the weights that are used for the odds weighting and doubly robust estimators. By inspecting the distribution of the odds weights, investigators can identify extreme values and visually assess the spread of the weight distribution. A sometimes useful diagnostic is that the sample proportion of non-participants divided by the sample average of the estimated inverse odds of trial participation among trial participants, should be approximately equal to one; or, in symbols
Values different from 1, suggest positivity violations or model misspecification.111The rationale for the diagnostic is provided by the identity
In applied analyses, we have found that problems with extreme weights can often be addressed by making sensible modeling choices  and ensuring that the sample of non-participants is properly selected to avoid violations of the positivity of trial participation assumption. Trimming or truncation of extreme weights may also help, but these strategies shift the causal estimand, which is often undesirable.
In our CASS re-analyses, the estimated probabilities of trial participation were far from zero. Their distribution was similar among trial participants and non-participants (as shown in Figure 1), reflecting the fairly similar observed covariate distribution in trial participants and non-participants and the absence of strong selection into the trial (at least based on available covariates, as shown in Table 4). As noted, the sample proportion of non-participants divided by the sample average of the inverse odds of trial participation among trial participants was approximately 1, providing some reassurance that gross violations of positivity were absent.
In this tutorial, we reviewed methods for transporting inferences about the average effect of a time-fixed treatment from a randomized clinical trial to a new target population using baseline covariate data from randomized participants and a sample from the target population, but treatment and outcome data only from the randomized participants. We considered estimation approaches that rely on modeling the probability of trial participation, the expectation of the outcome, or both, and can be implemented easily in all popular statistical software packages.
A major challenge in applying any of the methods discussed in this tutorial is the need to collect adequate covariate information, both from trial participants and non-participants, for the mean transportability assumption to hold. Because the transportability assumption is not testable using the observed data, one has to rely on background substantive knowledge to assess its plausibility. Reasoning about the assumption can be facilitated using directed acyclic graphs, including recent graphical identification algorithms for assessing transportability [50, 51, 52]. Because background knowledge is often incomplete, it is often necessary to conduct sensitivity analyses, to examine how violations of the transportability assumption influence study results [53, 54].
Methods related to those discussed in this tutorial have been discussed in a number of recent publications [6, 7, 8, 9, 10, 11, 12, 13, 14, 15] addressing trial transportability (or the related but distinct concept of generalizability ). With few exceptions – such as the careful asymptotic study of an estimator closely related to DR1 in , or the targeted maximum likelihood estimators in  – prior work has focused on weighting [6, 11, 15] or stratification-based methods [8, 9, 10] that only rely on the probability of trial participation. Theoretical arguments, our simulation results, and practical experience suggest that methods that combine modeling the probability of trial participation with modeling the expectation of the outcome are most promising for applied work for two reasons: first, the double robustness property in effect gives investigators two opportunities for approximately correct inference ; second, doubly robust estimators often produce estimates that are more precise than those from methods that exclusively rely on modeling the probability of trial participation, even when the outcome model is misspecified [29, 26, 56, 32]. In our simulation studies, which used correctly specified parametric working models with few covariates, all estimators performed reasonably well in terms of bias. Interestingly, the two inverse odds weighting estimators had very different finite-sample performance in the presence of strong selection. Based on this observation, we recommend avoiding weighted estimators that do not normalize the weights to sum to 1.
When data are available on numerous baseline covariates, many of which are continuous, correct specification of parametric models for the probability of trial participation or expectation of the outcome will be impossible. Future research should address estimation using more flexible models (e.g., non-parametric or semi-parametric regression) to mitigate model misspecification. Flexible models are particularly appealing when using doubly robust estimators, because the estimators remain-consistent even when estimating the conditional expectation of the outcome or the probability of participation non-parametrically [57, 58]. Further research is also needed to study the behavior of different estimators under misspecification and to develop alternatives that are more robust to misspecification of the outcome model (e.g., along the lines suggested in [59, 60]). Lastly, throughout this tutorial, we have assumed perfect adherence to treatment in the randomized trial, no missing outcome data, and no measurement error. In practice, adherence is often imperfect, outcomes are missing (e.g., due to right censoring in failure-time analyses), and measurement error is a concern (e.g., differential measurement error in effect modifiers when using composite datasets). Established methods to address these issues in the trial data can be combined with the methods described in this tutorial in a modular fashion. For example, adjustment for imperfect adherence via inverse probability of treatment weighting can be combined with inverse odds weighting for transportability. Future work should assess the properties of such combined procedures and evaluate them in practical applications.
The authors thank Dr. Nina Joyce (Brown University) and Dr. John Wong (Tufts Medical Center) for helpful comments on earlier versions of the manuscript.
This work was supported in part through Patient-Centered Outcomes Research Institute (PCORI) Methods Research Awards ME-1306-03758 and ME-1502-27794 to I.J. Dahabreh, and ME-1503-28119 to M.A. Hernán. All statements in this paper, including its findings and conclusions, are solely those of the authors and do not necessarily represent the views of the PCORI, its Board of Governors, or the Methodology Committee.
-  Peter M Rothwell. External validity of randomised controlled trials: “to whom do the results of this trial apply?”. The Lancet, 365(9453):82–93, 2005.
-  Andrew Evans and Lalit Kalra. Are the results of randomized controlled trials on anticoagulation in patients with atrial fibrillation generalizable to clinical practice? Archives of Internal Medicine, 161(11):1443–1447, 2001.
-  Linda S Elting, Catherine Cooksley, B Nebiyou Bekele, Michael Frumovitz, Elenir BC Avritscher, Charlotte Sun, and Diane C Bodurka. Generalizability of cancer clinical trial results. Cancer, 106(11):2452–2458, 2006.
-  Philippe Gabriel Steg, José López-Sendón, Esteban Lopez de Sa, Shaun G Goodman, Joel M Gore, Frederick A Anderson, Dominique Himbert, Jeanna Allegrone, and Frans Van de Werf. External validity of clinical trials in acute myocardial infarction. Archives of Internal Medicine, 167(1):68–73, 2007.
-  Antonio L Dans, Leonila F Dans, Gordon H Guyatt, Scott Richardson, Evidence-Based Medicine Working Group, et al. Users’ guides to the medical literature: XIV. how to decide on the applicability of clinical trial results to your patient. JAMA, 279(7):545–549, 1998.
-  Stephen R Cole and Elizabeth A Stuart. Generalizing evidence from randomized clinical trials to target populations: the ACTG 320 trial. American Journal of Epidemiology, 172(1):107–115, 2010.
-  Eloise E Kaizar. Estimating treatment effect via simple cross design synthesis. Statistics in Medicine, 30(25):2986–3009, 2011.
-  Colm O’Muircheartaigh and Larry V Hedges. Generalizing from unrepresentative experiments: a stratified propensity score approach. Journal of the Royal Statistical Society: Series C (Applied Statistics), 63(2):195–210, 2014.
-  Elizabeth Tipton. Improving generalizations from experiments using propensity score subclassification assumptions, properties, and contexts. Journal of Educational and Behavioral Statistics, 38(3):239–266, 2012.
-  Elizabeth Tipton, Larry Hedges, Michael Vaden-Kiernan, Geoffrey Borman, Kate Sullivan, and Sarah Caverly. Sample selection in randomized experiments: A new method using propensity score stratified sampling. Journal of Research on Educational Effectiveness, 7(1):114–135, 2014.
-  Erin Hartman, Richard Grieve, Roland Ramsahai, and Jasjeet S Sekhon. From SATE to PATT: combining experimental with observational studies to estimate population treatment effects. Journal of the Royal Statistical Society Series A (Statistics in Society), 10:1111, 2013.
-  Zhiwei Zhang, Lei Nie, Guoxing Soon, and Zonghui Hu. New methods for treatment effect calibration, with applications to non-inferiority trials. Biometrics, 72(1):20–29, 2016.
-  Ashley L Buchanan, Michael G Hudgens, Stephen R Cole, Katie R Mollan, Paul E Sax, Eric S Daar, Adaora A Adimora, Joseph J Eron, and Michael J Mugavero. Generalizing evidence from randomized trials using inverse probability of sampling weights. Journal of the Royal Statistical Society: Series A (Statistics in Society), 2016.
-  Kara E Rudolph and Mark J Laan. Robust estimation of encouragement design intervention effects transported across sites. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 79(5):1509–1525, 2017.
-  Daniel Westreich, Jessie K Edwards, Catherine R Lesko, Elizabeth Stuart, and Stephen R Cole. Transportability of trial results using inverse odds of sampling weights. American Journal of Epidemiology, 2017.
-  M Olschewski, H Scheurlen, et al. Comprehensive cohort study: an alternative to randomized consent design in a breast preservation trial. Methods Archive, 24:131–134, 1985.
-  Louis D Fiore and Philip W Lavori. Integrating randomized comparative effectiveness research with patient care. New England Journal of Medicine, 374(22):2152–2158, 2016.
On the application of probability theory to agricultural experiments. essay on principles. section 9.Statistical Science, 5(4):465–472, 1990.
-  Donald B Rubin. Estimating causal effects of treatments in randomized and nonrandomized studies. Journal of Educational Psychology, 66(5):688, 1974.
-  Issa J Dahabreh, Rodney Hayward, and David M Kent. Using group data to treat individuals: understanding heterogeneous treatment effects in the age of precision medicine and patient-centred evidence. International Journal of Epidemiology, 45(6):2184–2193, 2016.
-  Issa J Dahabreh, Thomas A Trikalinos, David M Kent, and Christopher H Schmid. Heterogeneity of treatment effects. Methods in Comparative Effectiveness Research, page 227, 2017.
-  Issa J Dahabreh, Sarah E Robertson, Elizabeth A Stuart, and Miguel A Hernán. Extending inferences from randomized participants to all eligible individuals using trials nested within cohort studies. arXiv preprint arXiv:1709.04589, 2017.
-  James M Robins. A new approach to causal inference in mortality studies with a sustained exposure period—application to control of the healthy worker survivor effect. Mathematical Modelling, 7(9):1393–1512, 1986.
-  Daniel G Horvitz and Donovan J Thompson. A generalization of sampling without replacement from a finite universe. Journal of the American Statistical Association, 47(260):663–685, 1952.
-  J Hájek. Comment on “An essay on the logical foundations of survey sampling by D. Basu”. In V P Godambe and D A Sprott, editors, Foundations of Statistical Inference. 1971.
-  James M Robins, Mariela Sued, Quanhong Lei-Gomez, and Andrea Rotnitzky. Comment: Performance of double-robust estimators when “inverse probability” weights are highly variable. Statistical Science, pages 544–559, 2007.
-  James M Robins, Andrea Rotnitzky, and Lue Ping Zhao. Estimation of regression coefficients when some regressors are not always observed. Journal of the American Statistical Association, 89(427):846–866, 1994.
-  James M Robins and Andrea Rotnitzky. Semiparametric efficiency in multivariate regression models with missing data. Journal of the American Statistical Association, 90(429):122–129, 1995.
-  James M Robins and Andrea Rotnitzky. Comments. Statistica Sinica, 11(4):920–936, 2001.
-  Christian Gourieroux, Alain Monfort, and Alain Trognon. Pseudo maximum likelihood methods: Theory. Econometrica: Journal of the Econometric Society, pages 681–700, 1984.
-  Jeffrey M Wooldridge. Inverse probability weighted estimation for general missing data problems. Journal of Econometrics, 141(2):1281–1301, 2007.
-  Anastasios Tsiatis. Semiparametric theory and missing data. Springer Science & Business Media, 2007.
-  Joseph DY Kang and Joseph L Schafer. Demystifying double robustness: A comparison of alternative strategies for estimating a population mean from incomplete data. Statistical Science, pages 523–539, 2007.
-  Leonard A Stefanski and Dennis D Boos. The calculus of m-estimation. The American Statistician, 56(1):29–38, 2002.
-  Jared K Lunceford and Marie Davidian. Stratification and weighting via the propensity score in estimation of causal treatment effects: a comparative study. Statistics in Medicine, 23(19):2937–2960, 2004.
-  Ziyue Chen and Eloise Kaizar. On variance estimation for generalizing from a trial to a target population. arXiv preprint arXiv:1704.07789, 2017.
-  Bradley Efron and Robert J Tibshirani. An introduction to the bootstrap. CRC press, 1994.
-  Bradley Efron. The efficiency of logistic regression compared to normal discriminant analysis. Journal of the American Statistical Association, 70(352):892–898, 1975.
-  R Core Team. R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria, 2015.
-  Bradley C Saul and Michael G Hudgens. The calculus of M-estimation in R with geex. arXiv preprint arXiv:1709.01413, 2017.
-  Bernard R Chaitman, Thomas J Ryan, Richard A Kronmal, Eric D Foster, Peter L Frommer, Thomas Killip, CASS Investigators, et al. Coronary artery surgery study (CASS): comparability of 10 year survival in randomized and randomizable patients. Journal of the American College of Cardiology, 16(5):1071–1078, 1990.
-  Manfred Olschewski, Martin Schumacher, and Kathryn B Davis. Analysis of randomized and nonrandomized patients in clinical trials using the comprehensive cohort follow-up study design. Controlled Clinical Trials, 13(3):226–239, 1992.
-  J William, R Russell, T Nicholas, et al. Coronary artery surgery study (CASS): a randomized trial of coronary artery bypass surgery. Circulation, 68(5):939–950, 1983.
-  CASS Principal Investigators. Coronary artery surgery study (CASS): a randomized trial of coronary artery bypass surgery: comparability of entry characteristics and survival in randomized patients and nonrandomized patients meeting randomization criteria. Journal of the American College of Cardiology, 3(1):114–128, 1984.
-  Frank Harrell. Regression modeling strategies: with applications to linear models, logistic and ordinal regression, and survival analysis. Springer, 2015.
-  Stephen R Cole and Miguel A Hernán. Constructing inverse probability weights for marginal structural models. American journal of epidemiology, 168(6):656–664, 2008.
-  M Alan Brookhart and Mark J Van Der Laan. A semiparametric model selection criterion with applications to the marginal structural model. Computational statistics & data analysis, 50(2):475–498, 2006.
-  Stijn Vansteelandt, Maarten Bekaert, and Gerda Claeskens. On model selection and model misspecification in causal inference. Statistical methods in medical research, 21(1):7–30, 2012.
-  Maya L Petersen, Kristin E Porter, Susan Gruber, Yue Wang, and Mark J van der Laan. Diagnosing and responding to violations in the positivity assumption. Statistical methods in medical research, 21(1):31–54, 2012.
Elias Bareinboim and Judea Pearl.
Meta-transportability of causal effects: A formal approach.
Proceedings of the 16th International Conference on Artificial Intelligence and Statistics (AISTATS), pages 135–143, 2013.
-  Judea Pearl, Elias Bareinboim, et al. External validity: from do-calculus to transportability across populations. Statistical Science, 29(4):579–595, 2014.
-  Elias Bareinboim and Judea Pearl. Transportability of causal effects: Completeness results. In AAAI, pages 698–704, 2012.
-  Andrea Rotnitzky, James M Robins, and Daniel O Scharfstein. Semiparametric regression for repeated outcomes with nonignorable nonresponse. Journal of the American Statistical Association, 93(444):1321–1339, 1998.
-  James M Robins, Andrea Rotnitzky, and Daniel O Scharfstein. Sensitivity analysis for selection bias and unmeasured confounding in missing data and causal inference models. In Statistical models in epidemiology, the environment, and clinical trials, pages 1–94. Springer, 2000.
-  MA Hernán. Discussion of “Perils and potentials of self-selected entry to epidemiological studies and surveys”. Journal of the Royal Statistical Society: Series A (Statistics in Society), 179(2):346–347, 2016.
-  Heejung Bang and James M Robins. Doubly robust estimation in missing data and causal inference models. Biometrics, 61(4):962–973, 2005.
James M Robins and Ya’acov Ritov.
Toward a curse of dimensionality appropriate (CODA) asymptotic theory for semi-parametric models.Statistics in Medicine, 16(3):285–319, 1997.
-  Ashley I Naimi and Edward H Kennedy. Nonparametric double robustness. arXiv preprint arXiv:1711.07137, 2017.
-  Weihua Cao, Anastasios A Tsiatis, and Marie Davidian. Improving efficiency and robustness of the doubly robust estimator for a population mean with incomplete data. Biometrika, 96(3):723–734, 2009.
-  Karel Vermeulen and Stijn Vansteelandt. Bias-reduced doubly robust estimation. Journal of the American Statistical Association, 110(511):1024–1036, 2015.
|Age, years||50.9 (7.7)||51.4 (7.2)||50.9 (7.4)|
|Angina||None||195 (20.4)||83 (22.6)||81 (22.3)|
|Present||760 (79.6)||285 (77.4)||282 (77.7)|
|History of MI||No||406 (42.5)||159 (43.2)||135 (37.2)|
|Yes||549 (57.5)||209 (56.8)||228 (62.8)|
|LAD % obstruction||39.1 (38.7)||36.4 (38.0)||34.9 (37.0)|
|Left ventricular score||7.1 (2.7)||7.4 (2.9)||7.3 (2.8)|
|Diseased vessels||0||347 (36.3)||146 (39.7)||133 (36.6)|
|608 (63.7)||222 (60.3)||230 (63.4)|
|Ejection fraction, %||60.2 (12.3)||60.9 (13.1)||59.8 (12.8)|
CASS = Coronary Artery Surgery Study; LAD = left anterior descending coronary artery; MI = myocardial infarction; SD = standard deviation.
|Risk difference||Risk ratio|
|Trial-only||17.4% (13.6%, 21.4%)||20.4% (16.3%, 24.6%)||-3.0% (-8.7%, 2.7%)||0.85 (0.62, 1.15)|
|OR||18.9% (13.9%, 22.7%)||20.1% (15.9%, 24.5%)||-1.3% (-7.9%, 4.2%)||0.94 (0.65, 1.24)|
|IOW1||18.2% (13.9%, 22.7%)||20.1% (15.9%, 24.4%)||-1.9% (-7.8%, 4.2%)||0.91 (0.66, 1.24)|
|IOW2||18.2% (14.6%, 23.5%)||20.1% (16.0%, 24.4%)||-1.9% (-7.2%, 4.8%)||0.91 (0.69, 1.28)|
|DR1||18.7% (14.5%, 23.3%)||20.1% (16.0%, 24.4%)||-1.4% (-7.3%, 4.7%)||0.93 (0.68, 1.27)|
|DR2||18.7% (14.5%, 23.3%)||20.1% (16.0%, 24.4%)||-1.4% (-7.3%, 4.7%)||0.93 (0.68, 1.27)|
|DR3||18.7% (14.4%, 23.2%)||20.0% (15.9%, 24.3%)||-1.4% (-7.3%, 4.6%)||0.93 (0.68, 1.27)|
In this Appendix we collect results regarding the consistency and double robustness properties of estimators discussed in the tutorial.
Under the assumptions in Section 3, the potential outcome mean among non-participants can be identified by the observed data:
Consistency and double robustness
In what follows, we assume that the model for the probability of treatment among trial participants is correctly specified (in fact, the “true” probability can be used instead).
When the model for the expectation of the outcome is correctly specified,
Consistency of follows because the numerator of the last expression above can be re-written as
where the last equality follows from the identification result above.
When the model for the probability of trial participation is correctly specified,
Consistency of follows because the numerator of the last expression above can be re-written as
When the model for the probability of trial participation is correctly specified, IOW2 converges to the same limit as IOW1 because
Influence function for the observed data functional:
Write the observed data functional as , where is the observed data law. The first order influence function for is
where the 0 subscript indicates the “true” law. The above influence function implies the following in-sample one-step estimator for :
where is the estimated proportion of non-participants, and , , and are generic estimators for , , and , respectively. Of note, this result suggests that, unless one is willing to make assumptions beyond those in Section 3, estimators of should ignore treatment and outcome data in the sample of trial-eligible non-participants from the target population (i.e., individuals with ), even if available (see  for a similar observation). Estimator DR1 in the main text of the tutorial is obtained from , using parametric models to estimate conditional probabilities and expectations.
Double robustness of DR1:
Provided the limiting values for the working models exist (even if the models are misspecified), DR1 converges to
We now consider two cases with respect to potential misspecification of the working models for the probability of treatment and the expectation of the outcome.
- correctly specified; incorrectly specified:
following the argument for estimator IOW1, the first term in the bracketed part of (A.2) converges to ; by iterated expectation, the sum of the other two terms converges to 0.
- correctly specified; incorrectly specified:
following the argument for the OM estimator, the last term in the bracketed part of (A.2) converges to ; by iterated expectation, the sum of the other two terms converges to 0.
When the model for the probability of trial participation is correctly specified, DR2 converges to the same limit as DR1 because of (A.1). When the model for the expectation of the outcome is correctly specified, DR2 converges to the same limit as DR1 because, by iterated expectation, the first term of the estimator in equation (5) of the main text converges to 0.
Double robustness of DR3:
Complete simulation results
transportability_odds, Date: August 24, 2019 Revision: 15.0