Bridging the Gap: Enhancing the Utility of Synthetic Data via Post-Processing Techniques

by   Andrea Lampis, et al.

Acquiring and annotating suitable datasets for training deep learning models is challenging. This often results in tedious and time-consuming efforts that can hinder research progress. However, generative models have emerged as a promising solution for generating synthetic datasets that can replace or augment real-world data. Despite this, the effectiveness of synthetic data is limited by their inability to fully capture the complexity and diversity of real-world data. To address this issue, we explore the use of Generative Adversarial Networks to generate synthetic datasets for training classifiers that are subsequently evaluated on real-world images. To improve the quality and diversity of the synthetic dataset, we propose three novel post-processing techniques: Dynamic Sample Filtering, Dynamic Dataset Recycle, and Expansion Trick. In addition, we introduce a pipeline called Gap Filler (GaFi), which applies these techniques in an optimal and coordinated manner to maximise classification accuracy on real-world data. Our experiments show that GaFi effectively reduces the gap with real-accuracy scores to an error of 2.03 1.78 respectively. These results represent a new state of the art in Classification Accuracy Score and highlight the effectiveness of post-processing techniques in improving the quality of synthetic datasets.


page 1

page 2

page 3

page 4


Reducing the Amount of Real World Data for Object Detector Training with Synthetic Data

A number of studies have investigated the training of neural networks wi...

DATED: Guidelines for Creating Synthetic Datasets for Engineering Design Applications

Exploiting the recent advancements in artificial intelligence, showcased...

PhysioGAN: Training High Fidelity Generative Model for Physiological Sensor Readings

Generative models such as the variational autoencoder (VAE) and the gene...

GLFF: Global and Local Feature Fusion for Face Forgery Detection

With the rapid development of deep generative models (such as Generative...

Exploring the Potential of AI-Generated Synthetic Datasets: A Case Study on Telematics Data with ChatGPT

This research delves into the construction and utilization of synthetic ...

Learning Neurosymbolic Generative Models via Program Synthesis

Significant strides have been made toward designing better generative mo...

GraphCleaner: Detecting Mislabelled Samples in Popular Graph Learning Benchmarks

Label errors have been found to be prevalent in popular text, vision, an...

Please sign up or login with your details

Forgot password? Click here to reset