Clustering and Learning from Imbalanced Data

11/02/2018
by   Naman D. Singh, et al.
0

A learning classifier must outperform a trivial solution, in case of imbalanced data, this condition usually does not hold true. To overcome this problem, we propose a novel data level resampling method - Clustering Based Oversampling for improved learning from class imbalanced datasets. The essential idea behind the proposed method is to use the distance between a minority class sample and its respective cluster centroid to infer the number of new sample points to be generated for that minority class sample. The proposed algorithm has very less dependence on the technique used for finding cluster centroids and does not effect the majority class learning in any way. It also improves learning from imbalanced data by incorporating the distribution structure of minority class samples in generation of new data samples. The newly generated minority class data is handled in a way as to prevent outlier production and overfitting. Implementation analysis on different datasets using deep neural networks as the learning classifier shows the effectiveness of this method as compared to other synthetic data resampling techniques across several evaluation metrics.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/24/2021

A Novel Adaptive Minority Oversampling Technique for Improved Classification in Data Imbalanced Scenarios

Imbalance in the proportion of training samples belonging to different c...
research
08/05/2015

Empirical Similarity for Absent Data Generation in Imbalanced Classification

When the training data in a two-class classification problem is overwhel...
research
10/09/2020

Measuring What Counts: The case of Rumour Stance Classification

Stance classification can be a powerful tool for understanding whether a...
research
07/15/2021

A multi-schematic classifier-independent oversampling approach for imbalanced datasets

Over 85 oversampling algorithms, mostly extensions of the SMOTE algorith...
research
02/04/2022

Stop Oversampling for Class Imbalance Learning: A Critical Review

For the last two decades, oversampling has been employed to overcome the...
research
12/06/2018

RDEC: Integrating Regularization into Deep Embedded Clustering for Imbalanced Datasets

Clustering is a fundamental machine learning task and can be used in man...
research
04/28/2022

Improving the Robustness of Federated Learning for Severely Imbalanced Datasets

With the ever increasing data deluge and the success of deep neural netw...

Please sign up or login with your details

Forgot password? Click here to reset