Exploiting Synthetic Data for Data Imbalance Problems: Baselines from a Data Perspective

08/02/2023
by   Moon Ye-Bin, et al.
0

We live in a vast ocean of data, and deep neural networks are no exception to this. However, this data exhibits an inherent phenomenon of imbalance. This imbalance poses a risk of deep neural networks producing biased predictions, leading to potentially severe ethical and social consequences. To address these challenges, we believe that the use of generative models is a promising approach for comprehending tasks, given the remarkable advancements demonstrated by recent diffusion models in generating high-quality images. In this work, we propose a simple yet effective baseline, SYNAuG, that utilizes synthetic data as a preliminary step before employing task-specific algorithms to address data imbalance problems. This straightforward approach yields impressive performance on datasets such as CIFAR100-LT, ImageNet100-LT, UTKFace, and Waterbird, surpassing the performance of existing task-specific methods. While we do not claim that our approach serves as a complete solution to the problem of data imbalance, we argue that supplementing the existing data with synthetic data proves to be an effective and crucial preliminary step in addressing data imbalance concerns.

READ FULL TEXT
research
03/02/2023

Analyzing Effects of Fake Training Data on the Performance of Deep Learning Systems

Deep learning models frequently suffer from various problems such as cla...
research
01/03/2021

Synthetic Embedding-based Data Generation Methods for Student Performance

Given the inherent class imbalance issue within student performance data...
research
07/23/2020

SeismoGlow – Data augmentation for the class imbalance problem

In several application areas, such as medical diagnosis, spam filtering,...
research
09/26/2021

Synthetic Data Generation for Fraud Detection using GANs

Detecting money laundering in gambling is becoming increasingly challeng...
research
05/26/2023

TADA: Task-Agnostic Dialect Adapters for English

Large Language Models, the dominant starting point for Natural Language ...
research
05/30/2021

How effective are Graph Neural Networks in Fraud Detection for Network Data?

Graph-based Neural Networks (GNNs) are recent models created for learnin...

Please sign up or login with your details

Forgot password? Click here to reset