LoRAS: An oversampling approach for imbalanced datasets

08/22/2019
by   Saptarshi Bej, et al.
0

The Synthetic Minority Oversampling TEchnique (SMOTE) is widely-used for the analysis of imbalanced datasets.It is known that SMOTE frequently over-generalizes the minority class, leading to misclassifications for the majority class, and effecting the overall balance of the model. In this article, we present an approach that overcomes this limitation of SMOTE, employing Localized Random Affine Shadowsampling (LoRAS) to oversample from an approximated data manifold of the minority class. We benchmarked our LoRAS algorithm with 28 publicly available datasets and show that that drawing samples from an approximated data manifold of the minority class is the key to successful oversampling. We compared the performance of LoRAS, SMOTE, and several SMOTE extensions and observed that for imbalanced datasets LoRAS, on average generates better Machine Learning (ML) models in terms of F1-score and Balanced Accuracy. Moreover, to explain the success of the algorithm, we have constructed a mathematical framework to prove that LoRAS is a more effective oversampling technique since it provides a better estimate to mean of the underlying local data distribution of the minority class data space.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/15/2021

A multi-schematic classifier-independent oversampling approach for imbalanced datasets

Over 85 oversampling algorithms, mostly extensions of the SMOTE algorith...
research
10/17/2019

WOTBoost: Weighted Oversampling Technique in Boosting for imbalanced learning

Machine learning classifiers often stumble over imbalanced datasets wher...
research
11/09/2021

A Topological Data Analysis Based Classifier

Topological Data Analysis (TDA) is an emergent field that aims to discov...
research
05/09/2021

GMOTE: Gaussian based minority oversampling technique for imbalanced classification adapting tail probability of outliers

Classification of imbalanced data is one of the common problems in the r...
research
10/23/2019

GenSample: A Genetic Algorithm for Oversampling in Imbalanced Datasets

Imbalanced datasets are ubiquitous. Classification performance on imbala...
research
08/18/2019

Neural Network Based Undersampling Techniques

Class imbalance problem is commonly faced while developing machine learn...
research
04/27/2023

Adaptive manifold for imbalanced transductive few-shot learning

Transductive few-shot learning algorithms have showed substantially supe...

Please sign up or login with your details

Forgot password? Click here to reset