Optimal-Design Domain-Adaptation for Exposure Prediction in Two-Stage Epidemiological Studies

by   Ron Sarafian, et al.

In the first stage of a two-stage study, the researcher uses a statistical model to impute the unobserved exposures. In the second stage, imputed exposures serve as covariates in epidemiological models. Imputation error in the first stage operate as measurement errors in the second stage, and thus bias exposure effect estimates. This study aims to improve the estimation of exposure effects by sharing information between the first and second stage. At the heart of our estimator is the observation that not all second-stage observations are equally important to impute. We thus borrow ideas from the optimal-experimental-design theory, to identify individuals of higher importance. We then improve the imputation of these individuals using ideas from the machine-learning literature of domain-adaptation. Our simulations confirm that the exposure effect estimates are more accurate than the current best practice. An empirical demonstration yields smaller estimates of PM effect on hyperglycemia risk, with tighter confidence bands. Sharing information between environmental scientist and epidemiologist improves health effect estimates. Our estimator is a principled approach for harnessing this information exchange, and may be applied to any two stage study.


page 1

page 2

page 3

page 4


Bias in multivariable Mendelian randomization studies due to measurement error on exposures

Multivariable Mendelian randomization estimates the causal effect of mul...

A Bayesian framework for incorporating exposure uncertainty into health analyses with application to air pollution and stillbirth

Studies of the relationships between environmental exposures and adverse...

Parameterising the effect of a continuous exposure using average derivative effects

The (weighted) average treatment effect is commonly used to quantify the...

Evaluation of approaches for accommodating interactions and non-linear terms in multiple imputation of incomplete three-level data

Three-level data structures arising from repeated measures on individual...

Statistical methods for biomarker data pooled from multiple nested case-control studies

Pooling biomarker data across multiple studies allows for examination of...

Statistical Methods for Selective Biomarker Testing

Biomarker is a critically important tool in modern clinical diagnosis, p...

Please sign up or login with your details

Forgot password? Click here to reset