Inference with Imputed Data: The Allure of Making Stuff Up
Incomplete observability of data generates an identification problem. There is no panacea for missing data. What one can learn about a population parameter depends on the assumptions one finds credible to maintain. The credibility of assumptions varies with the empirical setting. No specific assumptions can provide a realistic general solution to the problem of inference with missing data. Yet Rubin has promoted random multiple imputation (RMI) as a general way to deal with missing values in public-use data. This recommendation has been influential to empirical researchers who seek a simple fix to the nuisance of missing data. This paper adds to my earlier critiques of imputation. It provides a transparent assessment of the mix of Bayesian and frequentist thinking used by Rubin to argue for RMI. It evaluates random imputation to replace missing outcome or covariate data when the objective is to learn a conditional expectation. It considers steps that might help combat the allure of making stuff up.
READ FULL TEXT