Machine Learning for Synthetic Data Generation: a Review

02/08/2023
by   Yingzhou Lu, et al.
0

Data plays a crucial role in machine learning. However, in real-world applications, there are several problems with data, e.g., data are of low quality; a limited number of data points lead to under-fitting of the machine learning model; it is hard to access the data due to privacy, safety and regulatory concerns. Synthetic data generation offers a promising new avenue, as it can be shared and used in ways that real-world data cannot. This paper systematically reviews the existing works that leverage machine learning models for synthetic data generation. Specifically, we discuss the synthetic data generation works from several perspectives: (i) applications, including computer vision, speech, natural language, healthcare, and business; (ii) machine learning methods, particularly neural network architectures and deep generative models; (iii) privacy and fairness issue. In addition, we identify the challenges and opportunities in this emerging field and suggest future research directions.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/09/2023

Leveraging Generative AI Models for Synthetic Data Generation in Healthcare: Balancing Research and Privacy

The widespread adoption of electronic health records and digital healthc...
research
01/11/2022

Fighting Money Laundering with Statistics and Machine Learning: An Introduction and Review

Money laundering is a profound, global problem. Nonetheless, there is li...
research
05/24/2023

Generating Faithful Synthetic Data with Large Language Models: A Case Study in Computational Social Science

Large Language Models (LLMs) have democratized synthetic data generation...
research
04/13/2022

Enabling Synthetic Data adoption in regulated domains

The switch from a Model-Centric to a Data-Centric mindset is putting emp...
research
07/28/2022

Sequential Models in the Synthetic Data Vault

The goal of this paper is to describe a system for generating synthetic ...
research
12/06/2017

Separating Reflection and Transmission Images in the Wild

The reflections caused by common semi-reflectors, such as glass windows,...
research
12/08/2020

Synthetic Data: Opening the data floodgates to enable faster, more directed development of machine learning methods

Many ground-breaking advancements in machine learning can be attributed ...

Please sign up or login with your details

Forgot password? Click here to reset