Variable selection for transportability

12/10/2019 ∙ by Megha L. Mehrotra, et al. ∙ 0

Transportability provides a principled framework to address the problem of applying study results to new populations. Here, we consider the problem of selecting variables to include in transport estimators. We provide a brief overview of the transportability framework and illustrate that while selection diagrams are a vital first step in variable selection, these graphs alone identify a sufficient but not strictly necessary set of variables for generating an unbiased transport estimate. Next, we conduct a simulation experiment assessing the impact of including unnecessary variables on the performance of the parametric g-computation transport estimator. Our results highlight that the types of variables included can affect the bias, variance, and mean squared error of the estimates. We find that addition of variables that are not causes of the outcome but whose distributions differ between the source and target populations can increase the variance and mean squared error of the transported estimates. On the other hand, inclusion of variables that are causes of the outcome (regardless of whether they modify the causal contrast of interest or differ in distribution between the populations) reduces the variance of the estimates without increasing the bias. Finally, exclusion of variables that cause the outcome but do not modify the causal contrast of interest does not increase bias. These findings suggest that variable selection approaches for transport should prioritize identifying and including all causes of the outcome in the study population rather than focusing on variables whose distribution may differ between the study sample and target population.



There are no comments yet.


page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.