Evaluating the Utility of GAN Generated Synthetic Tabular Data for Class Balancing and Low Resource Settings

06/24/2023
by   Nagarjuna Chereddy, et al.
0

The present study aimed to address the issue of imbalanced data in classification tasks and evaluated the suitability of SMOTE, ADASYN, and GAN techniques in generating synthetic data to address the class imbalance and improve the performance of classification models in low-resource settings. The study employed the Generalised Linear Model (GLM) algorithm for class balancing experiments and the Random Forest (RF) algorithm for low-resource setting experiments to assess model performance under varying training data. The recall metric was the primary evaluation metric for all classification models. The results of the class balancing experiments showed that the GLM model trained on GAN-balanced data achieved the highest recall value. Similarly, in low-resource experiments, models trained on data enhanced with GAN-synthesized data exhibited better recall values than original data. These findings demonstrate the potential of GAN-generated synthetic data for addressing the challenge of imbalanced data in classification tasks and improving model performance in low-resource settings.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/26/2023

Is a prompt and a few samples all you need? Using GPT-4 for data augmentation in low-resource classification tasks

Obtaining and annotating data can be expensive and time-consuming, espec...
research
06/30/2023

The Effect of Balancing Methods on Model Behavior in Imbalanced Classification Problems

Imbalanced data poses a significant challenge in classification as model...
research
05/10/2022

The Importance of Context in Very Low Resource Language Modeling

This paper investigates very low resource language model pretraining, wh...
research
01/21/2022

To SMOTE, or not to SMOTE?

In imbalanced binary classification problems the objective metric is oft...
research
05/05/2020

Establishing Baselines for Text Classification in Low-Resource Languages

While transformer-based finetuning techniques have proven effective in t...
research
02/28/2019

Improving fraud prediction with incremental data balancing technique for massive data streams

The performance of classification algorithms with a massive and highly i...
research
11/18/2022

How to train your draGAN: A task oriented solution to imbalanced classification

The long-standing challenge of building effective classification models ...

Please sign up or login with your details

Forgot password? Click here to reset