A comparison of strategies for selecting auxiliary variables for multiple imputation

03/30/2022
by   Rheanna M. Mainzer, et al.
0

Multiple imputation (MI) is a popular method for handling missing data. Auxiliary variables can be added to the imputation model(s) to improve MI estimates. However, the choice of which auxiliary variables to include in the imputation model is not always straightforward. Including too few may lead to important information being discarded, but including too many can cause problems with convergence of the estimation procedures for imputation models. Several data-driven auxiliary variable selection strategies have been proposed. This paper uses a simulation study and a case study to provide a comprehensive comparison of the performance of eight auxiliary variable selection strategies, with the aim of providing practical advice to users of MI. A complete case analysis and an MI analysis with all auxiliary variables included in the imputation model (the full model) were also performed for comparison. Our simulation study results suggest that the full model outperforms all auxiliary variable selection strategies, providing further support for adopting an inclusive auxiliary variable strategy where possible. Auxiliary variable selection using the Least Absolute Selection and Shrinkage Operator (LASSO) was the best performing auxiliary variable selection strategy overall and is a promising alternative when the full model fails. All MI analysis strategies that we were able to apply to the case study led to similar estimates.

READ FULL TEXT

page 17

page 19

research
10/31/2022

Variable Selection for Multiply-imputed Data: A Bayesian Framework

Multiple imputation is a widely used technique to handle missing data in...
research
08/21/2018

An ensemble learning method for variable selection: application to high dimensional data and missing values

Standard approaches for variable selection in linear models are not tail...
research
02/21/2019

An information criterion for auxiliary variable selection in incomplete data analysis

Statistical inference is considered for variables of interest, called pr...
research
04/23/2020

Influence of parallel computing strategies of iterative imputation of missing data: a case study on missForest

Machine learning iterative imputation methods have been well accepted by...
research
07/20/2021

Strategies for variable selection in large-scale healthcare database studies with missing covariate and outcome data

Prior work has shown that combining bootstrap imputation with tree-based...
research
04/27/2018

Sequential Optimization in Locally Important Dimensions

Optimizing a black-box function is challenging when the underlying funct...
research
01/14/2019

Supervised Learning for Multi-Block Incomplete Data

In the supervised high dimensional settings with a large number of varia...

Please sign up or login with your details

Forgot password? Click here to reset