High-dimensional imputation for the social sciences: a comparison of state-of-the-art methods

08/29/2022
by   Edoardo Costantini, et al.
0

Including a large number of predictors in the imputation model underlying a multiple imputation (MI) procedure is one of the most challenging tasks imputers face. A variety of high-dimensional MI techniques can help, but there has been limited research on their relative performance. In this study, we investigated a wide range of extant high-dimensional MI techniques that can handle a large number of predictors in the imputation models and general missing data patterns. We assessed the relative performance of seven high-dimensional MI methods with a Monte Carlo simulation study and a resampling study based on real survey data. The performance of the methods was defined by the degree to which they facilitate unbiased and confidence-valid estimates of the parameters of complete-data analysis models. We found that using regularized regression to select the predictors used in the MI model and using principal component analysis to reduce the dimensionality of auxiliary data produce the best results.

READ FULL TEXT
research
05/13/2019

Multiple imputation using dimension reduction techniques for high-dimensional data

Missing data present challenges in data analysis. Naive analyses such as...
research
06/30/2022

Solving the "many variables" problem in MICE with principal component regression

Multiple Imputation (MI) is one of the most popular approaches to addres...
research
11/17/2021

A Graph-based Imputation Method for Sparse Medical Records

Electronic Medical Records (EHR) are extremely sparse. Only a small prop...
research
09/04/2023

Supervised dimensionality reduction for multiple imputation by chained equations

Multivariate imputation by chained equations (MICE) is one of the most p...
research
07/13/2020

Imputation procedures in surveys using nonparametric and machine learning methods: an empirical comparison

Nonparametric and machine learning methods are flexible methods for obta...
research
07/21/2022

Missing Values and the Dimensionality of Expected Returns

Combining 100+ cross-sectional predictors requires either dropping 90 da...

Please sign up or login with your details

Forgot password? Click here to reset