Synthetic Health-related Longitudinal Data with Mixed-type Variables Generated using Diffusion Models

03/22/2023
by   Nicholas I-Hsien Kuo, et al.
0

This paper presents a novel approach to simulating electronic health records (EHRs) using diffusion probabilistic models (DPMs). Specifically, we demonstrate the effectiveness of DPMs in synthesising longitudinal EHRs that capture mixed-type variables, including numeric, binary, and categorical variables. To our knowledge, this represents the first use of DPMs for this purpose. We compared our DPM-simulated datasets to previous state-of-the-art results based on generative adversarial networks (GANs) for two clinical applications: acute hypotension and human immunodeficiency virus (ART for HIV). Given the lack of similar previous studies in DPMs, a core component of our work involves exploring the advantages and caveats of employing DPMs across a wide range of aspects. In addition to assessing the realism of the synthetic datasets, we also trained reinforcement learning (RL) agents on the synthetic data to evaluate their utility for supporting the development of downstream machine learning models. Finally, we estimated that our DPM-simulated datasets are secure and posed a low patient exposure risk for public access.

READ FULL TEXT

page 17

page 32

research
02/28/2023

Synthesizing Mixed-type Electronic Health Records using Diffusion Models

Electronic Health Records (EHRs) contain sensitive patient information, ...
research
03/12/2022

The Health Gym: Synthetic Health-Related Datasets for the Development of Reinforcement Learning Algorithms

In recent years, the machine learning research community has benefited t...
research
12/07/2021

Synthetic Acute Hypotension and Sepsis Datasets Based on MIMIC-III and Published as Part of the Health Gym Project

These two synthetic datasets comprise vital signs, laboratory test resul...
research
03/14/2022

A review of Generative Adversarial Networks for Electronic Health Records: applications, evaluation measures and data sources

Electronic Health Records (EHRs) are a valuable asset to facilitate clin...
research
12/22/2021

Generating Synthetic Mixed-type Longitudinal Electronic Health Records for Artificial Intelligent Applications

The recent availability of electronic health records (EHRs) have provide...
research
03/07/2022

DATGAN: Integrating expert knowledge into deep learning for synthetic tabular data

Synthetic data can be used in various applications, such as correcting b...
research
05/25/2023

Ensemble Synthetic EHR Generation for Increasing Subpopulation Model's Performance

Electronic health records (EHR) often contain different rates of represe...

Please sign up or login with your details

Forgot password? Click here to reset