Imputation-Free Learning from Incomplete Observations

07/05/2021
by   Qitong Gao, et al.
0

Although recent works have developed methods that can generate estimations (or imputations) of the missing entries in a dataset to facilitate downstream analysis, most depend on assumptions that may not align with real-world applications and could suffer from poor performance in subsequent tasks. This is particularly true if the data have large missingness rates or a small population. More importantly, the imputation error could be propagated into the prediction step that follows, causing the gradients used to train the prediction models to be biased. Consequently, in this work, we introduce the importance guided stochastic gradient descent (IGSGD) method to train multilayer perceptrons (MLPs) and long short-term memories (LSTMs) to directly perform inference from inputs containing missing values without imputation. Specifically, we employ reinforcement learning (RL) to adjust the gradients used to train the models via back-propagation. This not only reduces bias but allows the model to exploit the underlying information behind missingness patterns. We test the proposed approach on real-world time-series (i.e., MIMIC-III), tabular data obtained from an eye clinic, and a standard dataset (i.e., MNIST), where our imputation-free predictions outperform the traditional two-step imputation-based predictions using state-of-the-art imputation methods.

READ FULL TEXT

page 2

page 22

research
11/18/2019

Bayesian Recurrent Framework for Missing Data Imputation and Prediction with Clinical Time Series

Real-world clinical time series data sets exhibit a high prevalence of m...
research
09/18/2020

Time-series Imputation and Prediction with Bi-Directional Generative Adversarial Networks

Multivariate time-series data are used in many classification and regres...
research
03/15/2022

Reconstructing Missing EHRs Using Time-Aware Within- and Cross-Visit Information for Septic Shock Early Prediction

Real-world Electronic Health Records (EHRs) are often plagued by a high ...
research
03/02/2020

Uncertainty-Aware Variational-Recurrent Imputation Network for Clinical Time Series

Electronic health records (EHR) consist of longitudinal clinical observa...
research
08/13/2022

GEDI: A Graph-based End-to-end Data Imputation Framework

Data imputation is an effective way to handle missing data, which is com...
research
10/17/2022

Efficient surrogate-assisted inference for patient-reported outcome measures with complex missing mechanism

Patient-reported outcome (PRO) measures are increasingly collected as a ...
research
10/01/2020

When to Impute? Imputation before and during cross-validation

Cross-validation (CV) is a technique used to estimate generalization err...

Please sign up or login with your details

Forgot password? Click here to reset