Generating Multi-label Discrete Patient Records using Generative Adversarial Networks

03/19/2017
by   Edward Choi, et al.
0

Access to electronic health record (EHR) data has motivated computational advances in medical research. However, various concerns, particularly over privacy, can limit access to and collaborative use of EHR data. Sharing synthetic EHR data could mitigate risk. In this paper, we propose a new approach, medical Generative Adversarial Network (medGAN), to generate realistic synthetic patient records. Based on input real patient records, medGAN can generate high-dimensional discrete variables (e.g., binary and count features) via a combination of an autoencoder and generative adversarial networks. We also propose minibatch averaging to efficiently avoid mode collapse, and increase the learning efficiency with batch normalization and shortcut connections. To demonstrate feasibility, we showed that medGAN generates synthetic patient records that achieve comparable performance to real data on many experiments including distribution statistics, predictive modeling tasks and a medical expert review. We also empirically observe a limited privacy risk in both identity and attribute disclosure using medGAN.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/08/2023

MedDiff: Generating Electronic Health Records using Accelerated Denoising Diffusion Model

Due to patient privacy protection concerns, machine learning research in...
research
07/04/2018

Generating Synthetic but Plausible Healthcare Record Datasets

Generating datasets that "look like" given real ones is an interesting t...
research
11/11/2017

Disease Prediction from Electronic Health Records Using Generative Adversarial Networks

Electronic health records (EHRs) have contributed to the computerization...
research
09/06/2017

Boosting Deep Learning Risk Prediction with Generative Adversarial Networks for Electronic Health Records

The rapid growth of Electronic Health Records (EHRs), as well as the acc...
research
12/06/2018

Generation of Synthetic Electronic Medical Record Text

Machine learning (ML) and Natural Language Processing (NLP) have achieve...
research
10/02/2019

Ward2ICU: A Vital Signs Dataset of Inpatients from the General Ward

We present a proxy dataset of vital signs with class labels indicating p...
research
04/04/2023

Synthesize Extremely High-dimensional Longitudinal Electronic Health Records via Hierarchical Autoregressive Language Model

Synthetic electronic health records (EHRs) that are both realistic and p...

Please sign up or login with your details

Forgot password? Click here to reset