DeepAI AI Chat
Log In Sign Up

LoRAS: An oversampling approach for imbalanced datasets

by   Saptarshi Bej, et al.
University of Rostock

The Synthetic Minority Oversampling TEchnique (SMOTE) is widely-used for the analysis of imbalanced datasets.It is known that SMOTE frequently over-generalizes the minority class, leading to misclassifications for the majority class, and effecting the overall balance of the model. In this article, we present an approach that overcomes this limitation of SMOTE, employing Localized Random Affine Shadowsampling (LoRAS) to oversample from an approximated data manifold of the minority class. We benchmarked our LoRAS algorithm with 28 publicly available datasets and show that that drawing samples from an approximated data manifold of the minority class is the key to successful oversampling. We compared the performance of LoRAS, SMOTE, and several SMOTE extensions and observed that for imbalanced datasets LoRAS, on average generates better Machine Learning (ML) models in terms of F1-score and Balanced Accuracy. Moreover, to explain the success of the algorithm, we have constructed a mathematical framework to prove that LoRAS is a more effective oversampling technique since it provides a better estimate to mean of the underlying local data distribution of the minority class data space.


page 1

page 2

page 3

page 4


A multi-schematic classifier-independent oversampling approach for imbalanced datasets

Over 85 oversampling algorithms, mostly extensions of the SMOTE algorith...

WOTBoost: Weighted Oversampling Technique in Boosting for imbalanced learning

Machine learning classifiers often stumble over imbalanced datasets wher...

A Topological Data Analysis Based Classifier

Topological Data Analysis (TDA) is an emergent field that aims to discov...

GenSample: A Genetic Algorithm for Oversampling in Imbalanced Datasets

Imbalanced datasets are ubiquitous. Classification performance on imbala...

Adaptive manifold for imbalanced transductive few-shot learning

Transductive few-shot learning algorithms have showed substantially supe...

Neural Network Based Undersampling Techniques

Class imbalance problem is commonly faced while developing machine learn...