Categorical EHR Imputation with Generative Adversarial Nets

08/03/2021
by   Yinchong Yang, et al.
0

Electronic Health Records often suffer from missing data, which poses a major problem in clinical practice and clinical studies. A novel approach for dealing with missing data are Generative Adversarial Nets (GANs), which have been generating huge research interest in image generation and transformation. Recently, researchers have attempted to apply GANs to missing data generation and imputation for EHR data: a major challenge here is the categorical nature of the data. State-of-the-art solutions to the GAN-based generation of categorical data involve either reinforcement learning, or learning a bidirectional mapping between the categorical and the real latent feature space, so that the GANs only need to generate real-valued features. However, these methods are designed to generate complete feature vectors instead of imputing only the subsets of missing features. In this paper we propose a simple and yet effective approach that is based on previous work on GANs for data imputation. We first motivate our solution by discussing the reason why adversarial training often fails in case of categorical features. Then we derive a novel way to re-code the categorical features to stabilize the adversarial training. Based on experiments on two real-world EHR data with multiple settings, we show that our imputation approach largely improves the prediction accuracy, compared to more traditional data imputation approaches.

READ FULL TEXT
research
02/06/2023

ClueGAIN: Application of Transfer Learning On Generative Adversarial Imputation Nets (GAIN)

Many studies have attempted to solve the problem of missing data using v...
research
06/07/2018

GAIN: Missing Data Imputation using Generative Adversarial Nets

We propose a novel method for imputing missing data by adapting the well...
research
11/04/2020

Learning to Rank with Missing Data via Generative Adversarial Networks

We explore the role of Conditional Generative Adversarial Networks (GAN)...
research
02/08/2023

IRTCI: Item Response Theory for Categorical Imputation

Most datasets suffer from partial or complete missing values, which has ...
research
01/26/2022

Generative Trees: Adversarial and Copycat

While Generative Adversarial Networks (GANs) achieve spectacular results...
research
09/18/2023

Generating and Imputing Tabular Data via Diffusion and Flow-based Gradient-Boosted Trees

Tabular data is hard to acquire and is subject to missing values. This p...
research
05/05/2020

Deep convolutional generative adversarial networks for traffic data imputation encoding time series as images

Sufficient high-quality traffic data are a crucial component of various ...

Please sign up or login with your details

Forgot password? Click here to reset