Generating Synthetic but Plausible Healthcare Record Datasets

07/04/2018
by   Laura Aviñó, et al.
0

Generating datasets that "look like" given real ones is an interesting tasks for healthcare applications of ML and many other fields of science and engineering. In this paper we propose a new method of general application to binary datasets based on a method for learning the parameters of a latent variable moment that we have previously used for clustering patient datasets. We compare our method with a recent proposal (MedGan) based on generative adversarial methods and find that the synthetic datasets we generate are globally more realistic in at least two senses: real and synthetic instances are harder to tell apart by Random Forests, and the MMD statistic. The most likely explanation is that our method does not suffer from the "mode collapse" which is an admitted problem of GANs. Additionally, the generative models we generate are easy to interpret, unlike the rather obscure GANs. Our experiments are performed on two patient datasets containing ICD-9 diagnostic codes: the publicly available MIMIC-III dataset and a dataset containing admissions for congestive heart failure during 7 years at Hospital de Sant Pau in Barcelona.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/19/2017

Generating Multi-label Discrete Patient Records using Generative Adversarial Networks

Access to electronic health record (EHR) data has motivated computationa...
research
07/05/2019

Evaluating the distribution learning capabilities of GANs

We evaluate the distribution learning capabilities of generative adversa...
research
07/17/2018

Airline Passenger Name Record Generation using Generative Adversarial Networks

Passenger Name Records (PNRs) are at the heart of the travel industry. C...
research
12/07/2021

Synthetic Acute Hypotension and Sepsis Datasets Based on MIMIC-III and Published as Part of the Health Gym Project

These two synthetic datasets comprise vital signs, laboratory test resul...
research
05/31/2018

On GANs and GMMs

A longstanding problem in machine learning is to find unsupervised metho...
research
11/30/2022

Generating Realistic Synthetic Relational Data through Graph Variational Autoencoders

Synthetic data generation has recently gained widespread attention as a ...

Please sign up or login with your details

Forgot password? Click here to reset