Beyond Privacy: Navigating the Opportunities and Challenges of Synthetic Data

04/07/2023
by   Boris van Breugel, et al.
1

Generating synthetic data through generative models is gaining interest in the ML community and beyond. In the past, synthetic data was often regarded as a means to private data release, but a surge of recent papers explore how its potential reaches much further than this – from creating more fair data to data augmentation, and from simulation to text generated by ChatGPT. In this perspective we explore whether, and how, synthetic data may become a dominant force in the machine learning world, promising a future where datasets can be tailored to individual needs. Just as importantly, we discuss which fundamental challenges the community needs to overcome for wider relevance and application of synthetic data – the most important of which is quantifying how much we can trust any finding or prediction drawn from synthetic data.

READ FULL TEXT

page 3

page 4

page 5

page 7

page 8

research
07/01/2023

When Synthetic Data Met Regulation

In this paper, we argue that synthetic data produced by Differentially P...
research
05/16/2023

Synthetic data, real errors: how (not) to publish and use synthetic data

Generating synthetic data through generative models is gaining interest ...
research
10/27/2020

Improving Text Relationship Modeling with Artificial Data

Data augmentation uses artificially-created examples to support supervis...
research
11/02/2020

Synthetic Data Generation for Economists

As more tech companies engage in rigorous economic analyses, we are conf...
research
10/07/2021

Doing Data Right: How Lessons Learned Working with Conventional Data should Inform the Future of Synthetic Data for Recommender Systems

We present a case that the newly emerging field of synthetic data in the...
research
11/02/2022

Web-based Elicitation of Human Perception on mixup Data

Synthetic data is proliferating on the web and powering many advances in...
research
07/09/2023

On the Challenges of Deploying Privacy-Preserving Synthetic Data in the Enterprise

Generative AI technologies are gaining unprecedented popularity, causing...

Please sign up or login with your details

Forgot password? Click here to reset