Inference with Imputed Data: The Allure of Making Stuff Up

05/15/2022
by   Charles F. Manski, et al.
0

Incomplete observability of data generates an identification problem. There is no panacea for missing data. What one can learn about a population parameter depends on the assumptions one finds credible to maintain. The credibility of assumptions varies with the empirical setting. No specific assumptions can provide a realistic general solution to the problem of inference with missing data. Yet Rubin has promoted random multiple imputation (RMI) as a general way to deal with missing values in public-use data. This recommendation has been influential to empirical researchers who seek a simple fix to the nuisance of missing data. This paper adds to my earlier critiques of imputation. It provides a transparent assessment of the mix of Bayesian and frequentist thinking used by Rubin to argue for RMI. It evaluates random imputation to replace missing outcome or covariate data when the objective is to learn a conditional expectation. It considers steps that might help combat the allure of making stuff up.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/22/2021

Misguided Use of Observed Covariates to Impute Missing Covariates in Conditional Prediction: A Shrinkage Problem

Researchers regularly perform conditional prediction using imputed value...
research
10/18/2018

Determining the Number of Components in PLS Regression on Incomplete Data

Partial least squares regression---or PLS---is a multivariate method in ...
research
05/04/2018

Population-calibrated multiple imputation for a binary/categorical covariate in categorical regression models

Multiple imputation (MI) has become popular for analyses with missing da...
research
11/27/2020

Clustering with missing data: which equivalent for Rubin's rules?

Multiple imputation (MI) is a popular method for dealing with missing va...
research
04/04/2023

Learning from data with structured missingness

Missing data are an unavoidable complication in many machine learning ta...
research
02/06/2019

Weak consistency of the 1-nearest neighbor measure with applications to missing data and covariate shift

When data is partially missing at random, imputation and importance weig...
research
05/10/2023

Correlation visualization under missing values: a comparison between imputation and direct parameter estimation methods

Correlation matrix visualization is essential for understanding the rela...

Please sign up or login with your details

Forgot password? Click here to reset