Deep Generative Pattern-Set Mixture Models for Nonignorable Missingness

03/05/2021
by   Sahra Ghalebikesabi, et al.
14

We propose a variational autoencoder architecture to model both ignorable and nonignorable missing data using pattern-set mixtures as proposed by Little (1993). Our model explicitly learns to cluster the missing data into missingness pattern sets based on the observed data and missingness masks. Underpinning our approach is the assumption that the data distribution under missingness is probabilistically semi-supervised by samples from the observed data distribution. Our setup trades off the characteristics of ignorable and nonignorable missingness and can thus be applied to data of both types. We evaluate our method on a wide range of data sets with different types of missingness and achieve state-of-the-art imputation performance. Our model outperforms many common imputation algorithms, especially when the amount of missing data is high and the missingness mechanism is nonignorable.

READ FULL TEXT
research
04/24/2019

Nonparametric Pattern-Mixture Models for Inference with Missing Data

Pattern-mixture models provide a transparent approach for handling missi...
research
12/06/2018

MIWAE: Deep Generative Modelling and Imputation of Incomplete Data

We consider the problem of handling missing data with deep latent variab...
research
09/18/2023

Towards Better Modeling with Missing Data: A Contrastive Learning-based Visual Analytics Perspective

Missing data can pose a challenge for machine learning (ML) modeling. To...
research
03/08/2019

Unsupervised Data Imputation via Variational Inference of Deep Subspaces

A wide range of systems exhibit high dimensional incomplete data. Accura...
research
04/03/2021

Training Deep Normalizing Flow Models in Highly Incomplete Data Scenarios with Prior Regularization

Deep generative frameworks including GANs and normalizing flow models ha...
research
12/20/2021

Model-based Clustering with Missing Not At Random Data

In recent decades, technological advances have made it possible to colle...
research
07/10/2019

Time series cluster kernels to exploit informative missingness and incomplete label information

The time series cluster kernel (TCK) provides a powerful tool for analys...

Please sign up or login with your details

Forgot password? Click here to reset