Improving Missing Data Imputation with Deep Generative Models

02/27/2019
by   Ramiro D. Camino, et al.
0

Datasets with missing values are very common on industry applications, and they can have a negative impact on machine learning models. Recent studies introduced solutions to the problem of imputing missing values based on deep generative models. Previous experiments with Generative Adversarial Networks and Variational Autoencoders showed interesting results in this domain, but it is not clear which method is preferable for different use cases. The goal of this work is twofold: we present a comparison between missing data imputation solutions based on deep generative models, and we propose improvements over those methodologies. We run our experiments using known real life datasets with different characteristics, removing values at random and reconstructing them with several imputation techniques. Our results show that the presence or absence of categorical variables can alter the selection of the best model, and that some models are more stable than others after similar runs with different random number generator seeds.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/30/2022

Leveraging variational autoencoders for multiple data imputation

Missing data persists as a major barrier to data analysis across numerou...
research
10/31/2022

Diffusion models for missing value imputation in tabular data

Missing value imputation in machine learning is the task of estimating t...
research
08/05/2018

Missing Value Imputation Based on Deep Generative Models

Missing values widely exist in many real-world datasets, which hinders t...
research
02/13/2023

Variational Mixture of HyperGenerators for Learning Distributions Over Functions

Recent approaches build on implicit neural representations (INRs) to pro...
research
06/16/2020

Reconstruction of turbulent data with deep generative models for semantic inpainting from TURB-Rot database

We study the applicability of tools developed by the computer vision com...
research
11/12/2017

Medical Diagnosis From Laboratory Tests by Combining Generative and Discriminative Learning

A primary goal of computational phenotype research is to conduct medical...
research
06/21/2020

Missing Features Reconstruction Using a Wasserstein Generative Adversarial Imputation Network

Missing data is one of the most common preprocessing problems. In this p...

Please sign up or login with your details

Forgot password? Click here to reset