Misguided Use of Observed Covariates to Impute Missing Covariates in Conditional Prediction: A Shrinkage Problem

02/22/2021
by   Charles F. Manski, et al.
0

Researchers regularly perform conditional prediction using imputed values of missing data. However, applications of imputation often lack a firm foundation in statistical theory. This paper originated when we were unable to find analysis substantiating claims that imputation of missing data has good frequentist properties when data are missing at random (MAR). We focused on the use of observed covariates to impute missing covariates when estimating conditional means of the form E(y|x, w). Here y is an outcome whose realizations are always observed, x is a covariate whose realizations are always observed, and w is a covariate whose realizations are sometimes unobserved. We examine the probability limit of simple imputation estimates of E(y|x, w) as sample size goes to infinity. We find that these estimates are not consistent when covariate data are MAR. To the contrary, the estimates suffer from a shrinkage problem. They converge to points intermediate between the conditional mean of interest, E(y|x, w), and the mean E(y|x) that conditions only on x. We use a type of genotype imputation to illustrate.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/04/2019

Leveraging Random Assignment in Multiple Imputation of Missing Covariates in Causal Studies

Baseline covariates in randomized experiments are often used in the esti...
research
10/14/2019

Measurement error as a missing data problem

This article focuses on measurement error in covariates in regression an...
research
09/24/2021

Correcting Conditional Mean Imputation for Censored Covariates and Improving Usability

Analysts are often confronted with censoring, wherein some variables are...
research
05/15/2022

Inference with Imputed Data: The Allure of Making Stuff Up

Incomplete observability of data generates an identification problem. Th...
research
01/19/2022

Bayesian Prediction with Covariates Subject to Detection Limits

Missing values in covariates due to censoring by signal interference or ...
research
05/15/2022

Imputations for High Missing Rate Data in Covariates via Semi-supervised Learning Approach

Advancements in data collection techniques and the heterogeneity of data...
research
10/16/2018

Statistical classification for partially observed functional data via filtering

This article deals with the problem of functional classification for L2-...

Please sign up or login with your details

Forgot password? Click here to reset