Leveraging variational autoencoders for multiple data imputation

09/30/2022
by   Breeshey Roskams-Hieter, et al.
0

Missing data persists as a major barrier to data analysis across numerous applications. Recently, deep generative models have been used for imputation of missing data, motivated by their ability to capture highly non-linear and complex relationships in the data. In this work, we investigate the ability of deep models, namely variational autoencoders (VAEs), to account for uncertainty in missing data through multiple imputation strategies. We find that VAEs provide poor empirical coverage of missing data, with underestimation and overconfident imputations, particularly for more extreme missing data values. To overcome this, we employ β-VAEs, which viewed from a generalized Bayes framework, provide robustness to model misspecification. Assigning a good value of β is critical for uncertainty calibration and we demonstrate how this can be achieved using cross-validation. In downstream tasks, we show how multiple imputation with β-VAEs can avoid false discoveries that arise as artefacts of imputation.

READ FULL TEXT
research
05/08/2017

Multiple Imputation Using Deep Denoising Autoencoders

Missing data is a well-recognized problem impacting all domains. State-o...
research
02/27/2019

Improving Missing Data Imputation with Deep Generative Models

Datasets with missing values are very common on industry applications, a...
research
04/17/2023

Fed-MIWAE: Federated Imputation of Incomplete Data via Deep Generative Models

Federated learning allows for the training of machine learning models on...
research
03/08/2019

Unsupervised Data Imputation via Variational Inference of Deep Subspaces

A wide range of systems exhibit high dimensional incomplete data. Accura...
research
02/10/2021

MAIN: Multihead-Attention Imputation Networks

The problem of missing data, usually absent incurated and competition-st...
research
12/14/2022

PulseImpute: A Novel Benchmark Task for Pulsative Physiological Signal Imputation

The promise of Mobile Health (mHealth) is the ability to use wearable se...
research
03/14/2021

Are deep learning models superior for missing data imputation in large surveys? Evidence from an empirical comparison

Multiple imputation (MI) is the state-of-the-art approach for dealing wi...

Please sign up or login with your details

Forgot password? Click here to reset