VAEs in the Presence of Missing Data

06/09/2020
by   Mark Collier, et al.
25

Real world datasets often contain entries with missing elements e.g. in a medical dataset, a patient is unlikely to have taken all possible diagnostic tests. Variational Autoencoders (VAEs) are popular generative models often used for unsupervised learning. Despite their widespread use it is unclear how best to apply VAEs to datasets with missing data. We develop a novel latent variable model of a corruption process which generates missing data, and derive a corresponding tractable evidence lower bound (ELBO). Our model is straightforward to implement, can handle both missing completely at random (MCAR) and missing not at random (MNAR) data, scales to high dimensional inputs and gives both the VAE encoder and decoder principled access to indicator variables for whether a data element is missing or not. On the MNIST and SVHN datasets we demonstrate improved marginal log-likelihood of observed data and better missing data imputation, compared to existing approaches.

READ FULL TEXT

page 3

page 8

page 9

research
01/11/2018

Autoencoders and Probabilistic Inference with Missing Data: An Exact Solution for The Factor Analysis Case

Latent variable models can be used to probabilistically "fill-in" missin...
research
03/12/2021

Medical data wrangling with sequential variational autoencoders

Medical data sets are usually corrupted by noise and missing data. These...
research
12/06/2018

MIWAE: Deep Generative Modelling and Imputation of Incomplete Data

We consider the problem of handling missing data with deep latent variab...
research
07/01/2016

Missing Data Estimation in High-Dimensional Datasets: A Swarm Intelligence-Deep Neural Network Approach

In this paper, we examine the problem of missing data in high-dimensiona...
research
12/01/2018

A Probabilistic Model of Cardiac Physiology and Electrocardiograms

An electrocardiogram (EKG) is a common, non-invasive test that measures ...
research
06/24/2018

Disentangled VAE Representations for Multi-Aspect and Missing Data

Many problems in machine learning and related application areas are fund...
research
01/20/2018

Missing at random: a stochastic process perspective

We offer a natural and extensible measure-theoretic treatment of missing...

Please sign up or login with your details

Forgot password? Click here to reset