Synthesising Electronic Health Records: Cystic Fibrosis Patient Group

01/14/2022
by   Emily Muller, et al.
13

Class imbalance can often degrade predictive performance of supervised learning algorithms. Balanced classes can be obtained by oversampling exact copies, with noise, or interpolation between nearest neighbours (as in traditional SMOTE methods). Oversampling tabular data using augmentation, as is typical in computer vision tasks, can be achieved with deep generative models. Deep generative models are effective data synthesisers due to their ability to capture complex underlying distributions. Synthetic data in healthcare can enhance interoperability between healthcare providers by ensuring patient privacy. Equipped with large synthetic datasets which do well to represent small patient groups, machine learning in healthcare can address the current challenges of bias and generalisability. This paper evaluates synthetic data generators ability to synthesise patient electronic health records. We test the utility of synthetic data for patient outcome classification, observing increased predictive performance when augmenting imbalanced datasets with synthetic data.

READ FULL TEXT

page 5

page 8

research
05/09/2023

Leveraging Generative AI Models for Synthetic Data Generation in Healthcare: Balancing Research and Privacy

The widespread adoption of electronic health records and digital healthc...
research
10/16/2022

Evaluation of the Synthetic Electronic Health Records

Generative models have been found effective for data synthesis due to th...
research
05/25/2023

Ensemble Synthetic EHR Generation for Increasing Subpopulation Model's Performance

Electronic health records (EHR) often contain different rates of represe...
research
06/20/2022

Convex space learning improves deep-generative oversampling for tabular imbalanced classification on smaller datasets

Data is commonly stored in tabular format. Several fields of research (e...
research
05/31/2022

A Kernelised Stein Statistic for Assessing Implicit Generative Models

Synthetic data generation has become a key ingredient for training machi...
research
01/03/2021

Synthetic Embedding-based Data Generation Methods for Student Performance

Given the inherent class imbalance issue within student performance data...
research
03/11/2020

Deep generative models in DataSHIELD

The best way to calculate statistics from medical data is to use the dat...

Please sign up or login with your details

Forgot password? Click here to reset