The Health Gym: Synthetic Health-Related Datasets for the Development of Reinforcement Learning Algorithms

03/12/2022
by   Nicholas I-Hsien Kuo, et al.
0

In recent years, the machine learning research community has benefited tremendously from the availability of openly accessible benchmark datasets. Clinical data are usually not openly available due to their highly confidential nature. This has hampered the development of reproducible and generalisable machine learning applications in health care. Here we introduce the Health Gym - a growing collection of highly realistic synthetic medical datasets that can be freely accessed to prototype, evaluate, and compare machine learning algorithms, with a specific focus on reinforcement learning. The three synthetic datasets described in this paper present patient cohorts with acute hypotension and sepsis in the intensive care unit, and people with human immunodeficiency virus (HIV) receiving antiretroviral therapy in ambulatory care. The datasets were created using a novel generative adversarial network (GAN). The distributions of variables, and correlations between variables and trends over time in the synthetic datasets mirror those in the real datasets. Furthermore, the risk of sensitive information disclosure associated with the public distribution of the synthetic datasets is estimated to be very low.

READ FULL TEXT
research
12/07/2021

Synthetic Acute Hypotension and Sepsis Datasets Based on MIMIC-III and Published as Part of the Health Gym Project

These two synthetic datasets comprise vital signs, laboratory test resul...
research
05/27/2020

Generative Adversarial Networks Applied to Observational Health Data

Having been collected for its primary purpose in patient care, Observati...
research
03/22/2023

Synthetic Health-related Longitudinal Data with Mixed-type Variables Generated using Diffusion Models

This paper presents a novel approach to simulating electronic health rec...
research
01/30/2021

Synthetic Dataset Generation of Driver Telematics

This article describes techniques employed in the production of a synthe...
research
09/14/2020

Synbols: Probing Learning Algorithms with Synthetic Datasets

Progress in the field of machine learning has been fueled by the introdu...
research
10/02/2019

Ward2ICU: A Vital Signs Dataset of Inpatients from the General Ward

We present a proxy dataset of vital signs with class labels indicating p...
research
08/02/2021

ricu: R's Interface to Intensive Care Data

Providing computational infrastructure for handling diverse intensive ca...

Please sign up or login with your details

Forgot password? Click here to reset