GenSample: A Genetic Algorithm for Oversampling in Imbalanced Datasets

10/23/2019
by   Vishwa Karia, et al.
0

Imbalanced datasets are ubiquitous. Classification performance on imbalanced datasets is generally poor for the minority class as the classifier cannot learn decision boundaries well. However, in sensitive applications like fraud detection, medical diagnosis, and spam identification, it is extremely important to classify the minority instances correctly. In this paper, we present a novel technique based on genetic algorithms, GenSample, for oversampling the minority class in imbalanced datasets. GenSample decides the rate of oversampling a minority example by taking into account the difficulty in learning that example, along with the performance improvement achieved by oversampling it. This technique terminates the oversampling process when the performance of the classifier begins to deteriorate. Consequently, it produces synthetic data only as long as a performance boost is obtained. The algorithm was tested on 9 real-world imbalanced datasets of varying sizes and imbalance ratios. It achieved the highest F-Score on 8 out of 9 datasets, confirming its ability to better handle imbalanced data compared to other existing methodologies.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/21/2022

Imbalanced Classification via Explicit Gradient Learning From Augmented Data

Learning from imbalanced data is one of the most significant challenges ...
research
04/20/2022

Neurochaos Feature Transformation and Classification for Imbalanced Learning

Learning from limited and imbalanced data is a challenging problem in th...
research
08/05/2015

Empirical Similarity for Absent Data Generation in Imbalanced Classification

When the training data in a two-class classification problem is overwhel...
research
08/05/2023

Generalized Oversampling for Learning from Imbalanced datasets and Associated Theory

In supervised learning, it is quite frequent to be confronted with real ...
research
09/29/2020

Weakly Supervised-Based Oversampling for High Imbalance and High Dimensionality Data Classification

With the abundance of industrial datasets, imbalanced classification has...
research
09/03/2020

MixBoost: Synthetic Oversampling with Boosted Mixup for Handling Extreme Imbalance

Training a classification model on a dataset where the instances of one ...
research
08/22/2019

LoRAS: An oversampling approach for imbalanced datasets

The Synthetic Minority Oversampling TEchnique (SMOTE) is widely-used for...

Please sign up or login with your details

Forgot password? Click here to reset