BSGAN: A Novel Oversampling Technique for Imbalanced Pattern Recognitions

05/16/2023
by   Md Manjurul Ahsan, et al.
0

Class imbalanced problems (CIP) are one of the potential challenges in developing unbiased Machine Learning (ML) models for predictions. CIP occurs when data samples are not equally distributed between the two or multiple classes. Borderline-Synthetic Minority Oversampling Techniques (SMOTE) is one of the approaches that has been used to balance the imbalance data by oversampling the minor (limited) samples. One of the potential drawbacks of existing Borderline-SMOTE is that it focuses on the data samples that lay at the border point and gives more attention to the extreme observations, ultimately limiting the creation of more diverse data after oversampling, and that is the almost scenario for the most of the borderline-SMOTE based oversampling strategies. As an effect, marginalization occurs after oversampling. To address these issues, in this work, we propose a hybrid oversampling technique by combining the power of borderline SMOTE and Generative Adversarial Network to generate more diverse data that follow Gaussian distributions. We named it BSGAN and tested it on four highly imbalanced datasets: Ecoli, Wine quality, Yeast, and Abalone. Our preliminary computational results reveal that BSGAN outperformed existing borderline SMOTE and GAN-based oversampling techniques and created a more diverse dataset that follows normal distribution after oversampling effect.

READ FULL TEXT

page 11

page 12

page 14

research
10/23/2022

Imbalanced Class Data Performance Evaluation and Improvement using Novel Generative Adversarial Network-based Approach: SSG and GBO

Class imbalance in a dataset is one of the major challenges that can sig...
research
08/06/2021

SMOTified-GAN for class imbalanced pattern classification problems

Class imbalance in a dataset is a major problem for classifiers that res...
research
08/07/2020

Oversampling Adversarial Network for Class-Imbalanced Fault Diagnosis

The collected data from industrial machines are often imbalanced, which ...
research
03/24/2021

A Novel Adaptive Minority Oversampling Technique for Improved Classification in Data Imbalanced Scenarios

Imbalance in the proportion of training samples belonging to different c...
research
03/27/2023

Evaluating XGBoost for Balanced and Imbalanced Data: Application to Fraud Detection

This paper evaluates XGboost's performance given different dataset sizes...
research
08/06/2023

Prototypes-oriented Transductive Few-shot Learning with Conditional Transport

Transductive Few-Shot Learning (TFSL) has recently attracted increasing ...
research
09/28/2020

Balancing thermal comfort datasets: We GAN, but should we?

Thermal comfort assessment for the built environment has become more ava...

Please sign up or login with your details

Forgot password? Click here to reset