Variable Selection for Multiply-imputed Data: A Bayesian Framework

10/31/2022
by   Jungang Zou, et al.
0

Multiple imputation is a widely used technique to handle missing data in large observational studies. For variable selection on multiply-imputed datasets, however, if we conduct selection on each imputed dataset separately, different sets of important variables may be obtained. MI-LASSO, one of the most popular solutions to this problem, regards the same variable across all separate imputed datasets as a group of variables and exploits Group-LASSO to yield a consistent variable selection across all the multiply-imputed datasets. In this paper, we extend the MI-LASSO model into Bayesian framework and utilize five different Bayesian MI-LASSO models to perform variable selection on multiply-imputed data. These five models consist of three shrinkage priors based and two discrete mixture prior based approaches. We conduct a simulation study investigating the practical characteristics of each model across various settings. We further demonstrate these methods via a case study using the multiply-imputed data from the University of Michigan Dioxin Exposure Study. The Python package BMIselect is hosted on Github under an Apache-2.0 license: https://github.com/zjg540066169/Bmiselect.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset