Variable selection for transportability

by   Megha L. Mehrotra, et al.

Transportability provides a principled framework to address the problem of applying study results to new populations. Here, we consider the problem of selecting variables to include in transport estimators. We provide a brief overview of the transportability framework and illustrate that while selection diagrams are a vital first step in variable selection, these graphs alone identify a sufficient but not strictly necessary set of variables for generating an unbiased transport estimate. Next, we conduct a simulation experiment assessing the impact of including unnecessary variables on the performance of the parametric g-computation transport estimator. Our results highlight that the types of variables included can affect the bias, variance, and mean squared error of the estimates. We find that addition of variables that are not causes of the outcome but whose distributions differ between the source and target populations can increase the variance and mean squared error of the transported estimates. On the other hand, inclusion of variables that are causes of the outcome (regardless of whether they modify the causal contrast of interest or differ in distribution between the populations) reduces the variance of the estimates without increasing the bias. Finally, exclusion of variables that cause the outcome but do not modify the causal contrast of interest does not increase bias. These findings suggest that variable selection approaches for transport should prioritize identifying and including all causes of the outcome in the study population rather than focusing on variables whose distribution may differ between the study sample and target population.


page 1

page 2

page 3

page 4


Synthetic estimation for the complier average causal effect

We propose an improved estimator of the complier average causal effect (...

Rao-Blackwellizing the Straight-Through Gumbel-Softmax Gradient Estimator

Gradient estimation in models with discrete latent variables is a challe...

Transporting a prediction model for use in a new target population

We consider methods for transporting a prediction model and assessing it...

On Associative Confounder Bias

Conditioning on some set of confounders that causally affect both treatm...

Doubly Robust Inference when Combining Probability and Non-probability Samples with High-dimensional Data

Non-probability samples become increasingly popular in survey statistics...

Estimation of finite population proportions for small areas: a statistical data integration approach

Empirical best prediction (EBP) is a well-known method for producing rel...

Shrinkage Estimators Dominating Some Naive Estimators of the Selected Entropy

Consider two populations characterized by independent random variables X...

Please sign up or login with your details

Forgot password? Click here to reset