A multi-schematic classifier-independent oversampling approach for imbalanced datasets

07/15/2021
by   Saptarshi Bej, et al.
6

Over 85 oversampling algorithms, mostly extensions of the SMOTE algorithm, have been built over the past two decades, to solve the problem of imbalanced datasets. However, it has been evident from previous studies that different oversampling algorithms have different degrees of efficiency with different classifiers. With numerous algorithms available, it is difficult to decide on an oversampling algorithm for a chosen classifier. Here, we overcome this problem with a multi-schematic and classifier-independent oversampling approach: ProWRAS(Proximity Weighted Random Affine Shadowsampling). ProWRAS integrates the Localized Random Affine Shadowsampling (LoRAS)algorithm and the Proximity Weighted Synthetic oversampling (ProWSyn) algorithm. By controlling the variance of the synthetic samples, as well as a proximity-weighted clustering system of the minority classdata, the ProWRAS algorithm improves performance, compared to algorithms that generate synthetic samples through modelling high dimensional convex spaces of the minority class. ProWRAS has four oversampling schemes, each of which has its unique way to model the variance of the generated data. Most importantly, the performance of ProWRAS with proper choice of oversampling schemes, is independent of the classifier used. We have benchmarked our newly developed ProWRAS algorithm against five sate-of-the-art oversampling models and four different classifiers on 20 publicly available datasets. ProWRAS outperforms other oversampling algorithms in a statistically significant way, in terms of both F1-score and Kappa-score. Moreover, we have introduced a novel measure for classifier independence I-score, and showed quantitatively that ProWRAS performs better, independent of the classifier used. In practice, ProWRAS customizes synthetic sample generation according to a classifier of choice and thereby reduces benchmarking efforts.

READ FULL TEXT

page 3

page 8

page 13

page 19

research
08/22/2019

LoRAS: An oversampling approach for imbalanced datasets

The Synthetic Minority Oversampling TEchnique (SMOTE) is widely-used for...
research
11/02/2018

Clustering and Learning from Imbalanced Data

A learning classifier must outperform a trivial solution, in case of imb...
research
11/24/2020

Minimum Variance Embedded Auto-associative Kernel Extreme Learning Machine for One-class Classification

One-class classification (OCC) needs samples from only a single class to...
research
09/11/2021

A Novel Intrinsic Measure of Data Separability

In machine learning, the performance of a classifier depends on both the...
research
06/09/2020

Towards an Intrinsic Definition of Robustness for a Classifier

The robustness of classifiers has become a question of paramount importa...
research
04/20/2022

Neurochaos Feature Transformation and Classification for Imbalanced Learning

Learning from limited and imbalanced data is a challenging problem in th...

Please sign up or login with your details

Forgot password? Click here to reset