Synthetic Data in Healthcare

04/06/2023
by   Daniel McDuff, et al.
0

Synthetic data are becoming a critical tool for building artificially intelligent systems. Simulators provide a way of generating data systematically and at scale. These data can then be used either exclusively, or in conjunction with real data, for training and testing systems. Synthetic data are particularly attractive in cases where the availability of “real” training examples might be a bottleneck. While the volume of data in healthcare is growing exponentially, creating datasets for novel tasks and/or that reflect a diverse set of conditions and causal relationships is not trivial. Furthermore, these data are highly sensitive and often patient specific. Recent research has begun to illustrate the potential for synthetic data in many areas of medicine, but no systematic review of the literature exists. In this paper, we present the cases for physical and statistical simulations for creating data and the proposed applications in healthcare and medicine. We discuss that while synthetics can promote privacy, equity, safety and continual and causal learning, they also run the risk of introducing flaws, blind spots and propagating or exaggerating biases.

READ FULL TEXT

page 3

page 6

research
05/09/2023

Leveraging Generative AI Models for Synthetic Data Generation in Healthcare: Balancing Research and Privacy

The widespread adoption of electronic health records and digital healthc...
research
12/08/2020

Synthetic Data: Opening the data floodgates to enable faster, more directed development of machine learning methods

Many ground-breaking advancements in machine learning can be attributed ...
research
03/09/2022

Downstream Fairness Caveats with Synthetic Healthcare Data

This paper evaluates synthetically generated healthcare data for biases ...
research
02/05/2021

Measuring Utility and Privacy of Synthetic Genomic Data

Genomic data provides researchers with an invaluable source of informati...
research
07/21/2023

Using simulation to calibrate real data acquisition in veterinary medicine

This paper explores the innovative use of simulation environments to enh...
research
04/04/2023

30 Years of Synthetic Data

The idea to generate synthetic data as a tool for broadening access to s...
research
01/24/2023

Generating Multidimensional Clusters With Support Lines

Synthetic data is essential for assessing clustering techniques, complem...

Please sign up or login with your details

Forgot password? Click here to reset