EGG-GAE: scalable graph neural networks for tabular data imputation

10/19/2022
by   Lev Telyatnikov, et al.
0

Missing data imputation (MDI) is crucial when dealing with tabular datasets across various domains. Autoencoders can be trained to reconstruct missing values, and graph autoencoders (GAE) can additionally consider similar patterns in the dataset when imputing new values for a given instance. However, previously proposed GAEs suffer from scalability issues, requiring the user to define a similarity metric among patterns to build the graph connectivity beforehand. In this paper, we leverage recent progress in latent graph imputation to propose a novel EdGe Generation Graph AutoEncoder (EGG-GAE) for missing data imputation that overcomes these two drawbacks. EGG-GAE works on randomly sampled mini-batches of the input data (hence scaling to larger datasets), and it automatically infers the best connectivity across the mini-batch for each architecture layer. We also experiment with several extensions, including an ensemble strategy for inference and the inclusion of what we call prototype nodes, obtaining significant improvements, both in terms of imputation error and final downstream accuracy, across multiple benchmarks and baselines.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/30/2020

Handling Missing Data with Graph Representation Learning

Machine learning with missing data has been approached in two different ...
research
05/06/2019

Missing Data Imputation with Adversarially-trained Graph Convolutional Networks

Missing data imputation (MDI) is a fundamental problem in many scientifi...
research
12/06/2022

Data Imputation with Iterative Graph Reconstruction

Effective data imputation demands rich latent “structure" discovery capa...
research
02/10/2021

MAIN: Multihead-Attention Imputation Networks

The problem of missing data, usually absent incurated and competition-st...
research
05/13/2020

Multiple Imputation for Biomedical Data using Monte Carlo Dropout Autoencoders

Due to complex experimental settings, missing values are common in biome...
research
06/20/2019

Efficient data augmentation using graph imputation neural networks

Recently, data augmentation in the semi-supervised regime, where unlabel...
research
10/06/2022

Comparison of Missing Data Imputation Methods using the Framingham Heart study dataset

Cardiovascular disease (CVD) is a class of diseases that involve the hea...

Please sign up or login with your details

Forgot password? Click here to reset