Generation of Synthetic Electronic Health Records Using a Federated GAN

09/06/2021
by   John Weldon, et al.
19

Sensitive medical data is often subject to strict usage constraints. In this paper, we trained a generative adversarial network (GAN) on real-world electronic health records (EHR). It was then used to create a data-set of "fake" patients through synthetic data generation (SDG) to circumvent usage constraints. This real-world data was tabular, binary, intensive care unit (ICU) patient diagnosis data. The entire data-set was split into separate data silos to mimic real-world scenarios where multiple ICU units across different hospitals may have similarly structured data-sets within their own organisations but do not have access to each other's data-sets. We implemented federated learning (FL) to train separate GANs locally at each organisation, using their unique data silo and then combining the GANs into a single central GAN, without any siloed data ever being exposed. This global, central GAN was then used to generate the synthetic patients data-set. We performed an evaluation of these synthetic patients with statistical measures and through a structured review by a group of medical professionals. It was shown that there was no significant reduction in the quality of the synthetic EHR when we moved between training a single central model and training on separate data silos with individual models before combining them into a central model. This was true for both the statistical evaluation (Root Mean Square Error (RMSE) of 0.0154 for single-source vs. RMSE of 0.0169 for dual-source federated) and also for the medical professionals' evaluation (no quality difference between EHR generated from a single source and EHR generated from multiple sources).

READ FULL TEXT
research
12/22/2021

Generating Synthetic Mixed-type Longitudinal Electronic Health Records for Artificial Intelligent Applications

The recent availability of electronic health records (EHRs) have provide...
research
05/03/2021

Synthesizing time-series wound prognosis factors from electronic medical records using generative adversarial networks

Wound prognostic models not only provide an estimate of wound healing ti...
research
07/02/2022

Backdoor Attack is A Devil in Federated GAN-based Medical Image Synthesis

Deep Learning-based image synthesis techniques have been applied in heal...
research
05/22/2023

Federated Learning of Medical Concepts Embedding using BEHRT

Electronic Health Records (EHR) data contains medical records such as di...
research
01/20/2022

Conditional Generation of Medical Time Series for Extrapolation to Underrepresented Populations

The widespread adoption of electronic health records (EHRs) and subseque...
research
05/25/2023

Ensemble Synthetic EHR Generation for Increasing Subpopulation Model's Performance

Electronic health records (EHR) often contain different rates of represe...
research
03/11/2020

Deep generative models in DataSHIELD

The best way to calculate statistics from medical data is to use the dat...

Please sign up or login with your details

Forgot password? Click here to reset