How to train your draGAN: A task oriented solution to imbalanced classification

11/18/2022
by   Leon O. Guertler, et al.
7

The long-standing challenge of building effective classification models for small and imbalanced datasets has seen little improvement since the creation of the Synthetic Minority Over-sampling Technique (SMOTE) over 20 years ago. Though GAN based models seem promising, there has been a lack of purpose built architectures for solving the aforementioned problem, as most previous studies focus on applying already existing models. This paper proposes a unique, performance-oriented, data-generating strategy that utilizes a new architecture, coined draGAN, to generate both minority and majority samples. The samples are generated with the objective of optimizing the classification model's performance, rather than similarity to the real data. We benchmark our approach against state-of-the-art methods from the SMOTE family and competitive GAN based approaches on 94 tabular datasets with varying degrees of imbalance and linearity. Empirically we show the superiority of draGAN, but also highlight some of its shortcomings. All code is available on: https://github.com/LeonGuertler/draGAN.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/12/2020

Mitigating Dataset Imbalance via Joint Generation and Classification

Supervised deep learning methods are enjoying enormous success in many p...
research
08/06/2021

SMOTified-GAN for class imbalanced pattern classification problems

Class imbalance in a dataset is a major problem for classifiers that res...
research
07/19/2023

Improved Distribution Matching for Dataset Condensation

Dataset Condensation aims to condense a large dataset into a smaller one...
research
07/06/2021

GCN-Based Linkage Prediction for Face Clustering on Imbalanced Datasets: An Empirical Study

In recent years, benefiting from the expressive power of Graph Convoluti...
research
08/21/2020

Counterfactual-based minority oversampling for imbalanced classification

A key challenge of oversampling in imbalanced classification is that the...
research
06/24/2023

Evaluating the Utility of GAN Generated Synthetic Tabular Data for Class Balancing and Low Resource Settings

The present study aimed to address the issue of imbalanced data in class...
research
06/18/2021

RSG: A Simple but Effective Module for Learning Imbalanced Datasets

Imbalanced datasets widely exist in practice and area great challenge fo...

Please sign up or login with your details

Forgot password? Click here to reset