Synthetic Data for Model Selection

05/03/2021
by   Matan Fintz, et al.
195

Recent improvements in synthetic data generation make it possible to produce images that are highly photorealistic and indistinguishable from real ones. Furthermore, synthetic generation pipelines have the potential to generate an unlimited number of images. The combination of high photorealism and scale turn the synthetic data into a promising candidate for potentially improving various machine learning (ML) pipelines. Thus far, a large body of research in this field has focused on using synthetic images for training, by augmenting and enlarging training data. In contrast to using synthetic data for training, in this work we explore whether synthetic data can be beneficial for model selection. Considering the task of image classification, we demonstrate that when data is scarce, synthetic data can be used to replace the held out validation set, thus allowing to train on a larger dataset.

READ FULL TEXT

page 9

page 11

page 15

page 16

research
06/27/2023

On the Usefulness of Synthetic Tabular Data Generation

Despite recent advances in synthetic data generation, the scientific com...
research
12/04/2018

Learning Vine Copula Models For Synthetic Data Generation

A vine copula model is a flexible high-dimensional dependence model whic...
research
12/09/2022

Synthetic Data for Object Classification in Industrial Applications

One of the biggest challenges in machine learning is data collection. Tr...
research
02/19/2022

Improving the Level of Autism Discrimination through GraphRNN Link Prediction

Dataset is the key of deep learning in Autism disease research. However,...
research
04/01/2020

Objects of violence: synthetic data for practical ML in human rights investigations

We introduce a machine learning workflow to search for, identify, and me...
research
02/20/2020

Cluster Aware Mobility Encounter Dataset Enlargement

The recent emerging fields in data processing and manipulation has facil...
research
08/16/2020

AutoSimulate: (Quickly) Learning Synthetic Data Generation

Simulation is increasingly being used for generating large labelled data...

Please sign up or login with your details

Forgot password? Click here to reset