Correcting Conditional Mean Imputation for Censored Covariates and Improving Usability

09/24/2021
by   Sarah C. Lotspeich, et al.
0

Analysts are often confronted with censoring, wherein some variables are not observed at their true value, but rather at a value that is known to fall above or below that truth. While much attention has been given to the analysis of censored outcomes, contemporary focus has shifted to censored covariates, as well. Missing data is often overcome using multiple imputation, which leverages the entire dataset by replacing missing values with informed placeholders, and this method can be modified for censored data by also incorporating partial information from censored values. One such modification involves replacing censored covariates with their conditional means given other fully observed information, such as the censored value or additional covariates. So-called conditional mean imputation approaches were proposed for censored covariates in Atem et al. [2017], Atem et al.[2019a], and Atem et al. [2019b]. These methods are robust to additional parametric assumptions on the censored covariate and utilize all available data, which is appealing. As we worked to implement these methods, however, we discovered that these three manuscripts provide nonequivalent formulas and, in fact, none is the correct formula for the conditional mean. Herein, we derive the correct form of the conditional mean and demonstrate the impact of the incorrect formulas on the imputed values and statistical inference. Under several settings considered, using an incorrect formula is seen to seriously bias parameter estimation in simple linear regression. Lastly, we provide user-friendly R software, the imputeCensoRd package, to enable future researchers to tackle censored covariates in their data.

READ FULL TEXT
research
02/22/2021

Misguided Use of Observed Covariates to Impute Missing Covariates in Conditional Prediction: A Shrinkage Problem

Researchers regularly perform conditional prediction using imputed value...
research
10/14/2019

Measurement error as a missing data problem

This article focuses on measurement error in covariates in regression an...
research
01/24/2020

Imputation for High-Dimensional Linear Regression

We study high-dimensional regression with missing entries in the covaria...
research
09/10/2022

Escaping the trap: Replacing the trapezoidal rule to better impute censored covariates with their conditional means

Clinical trials to test experimental treatments for Huntington's disease...
research
05/15/2022

Imputations for High Missing Rate Data in Covariates via Semi-supervised Learning Approach

Advancements in data collection techniques and the heterogeneity of data...
research
03/29/2019

Statistical matching of non-Gaussian data

The statistical matching problem is a data integration problem with stru...
research
09/13/2019

Flow Models for Arbitrary Conditional Likelihoods

Understanding the dependencies among features of a dataset is at the cor...

Please sign up or login with your details

Forgot password? Click here to reset