Choosing Imputation Models

07/12/2021
by   Moritz Marbach, et al.
0

Imputing missing values is an important preprocessing step in data analysis, but the literature offers little guidance on how to choose between different imputation models. This letter suggests adopting the imputation model that generates a density of imputed values most similar to those of the observed values for an incomplete variable after balancing all other covariates. We recommend stable balancing weights as a practical approach to balance covariates whose distribution is expected to differ if the values are not missing completely at random. After balancing, discrepancy statistics can be used to compare the density of imputed and observed values. We illustrate the application of the suggested approach using simulated and real-world survey data from the American National Election Study, comparing popular imputation approaches including random forests, hot-deck, predictive mean matching, and multivariate normal imputation. An R package implementing the suggested approach accompanies this letter.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/13/2023

Imputation of missing data using multivariate Gaussian Linear Cluster-Weighted Modeling

Missing data arises when certain values are not recorded or observed for...
research
06/03/2021

Semi-supervised Conditional Density Estimation for Imputation and Classification of Incomplete Instances

Incomplete instances with various missing attributes in many real-world ...
research
04/14/2018

Simultaneous Edit and Imputation for Household Data with Structural Zeros

Multivariate categorical data nested within households often include rep...
research
08/27/2022

Graphical and numerical diagnostic tools to assess multiple imputation models by posterior predictive checking

Missing data are often dealt with multiple imputation. A crucial part of...
research
07/06/2020

Multiple Imputation with Massive Data: an Application to the Panel Study of Income Dynamics

Multiple imputation (MI) is a popular and well-established method for ha...
research
02/02/2023

Adjusting for Incomplete Baseline Covariates in Randomized Controlled Trials: A Cross-World Imputation Framework

In randomized controlled trials, adjusting for baseline covariates is of...
research
09/24/2020

MatchThem:: Matching and Weighting after Multiple Imputation

Balancing the distributions of the confounders across the exposure level...

Please sign up or login with your details

Forgot password? Click here to reset