SMOTE: Synthetic Minority Over-sampling Technique

06/09/2011
by   N. V. Chawla, et al.
0

An approach to the construction of classifiers from imbalanced datasets is described. A dataset is imbalanced if the classification categories are not approximately equally represented. Often real-world data sets are predominately composed of "normal" examples with only a small percentage of "abnormal" or "interesting" examples. It is also the case that the cost of misclassifying an abnormal (interesting) example as a normal example is often much higher than the cost of the reverse error. Under-sampling of the majority (normal) class has been proposed as a good means of increasing the sensitivity of a classifier to the minority class. This paper shows that a combination of our method of over-sampling the minority (abnormal) class and under-sampling the majority (normal) class can achieve better classifier performance (in ROC space) than only under-sampling the majority class. This paper also shows that a combination of our method of over-sampling the minority class and under-sampling the majority class can achieve better classifier performance (in ROC space) than varying the loss ratios in Ripper or class priors in Naive Bayes. Our method of over-sampling the minority class involves creating synthetic minority class examples. Experiments are performed using C4.5, Ripper and a Naive Bayes classifier. The method is evaluated using the area under the Receiver Operating Characteristic curve (AUC) and the ROC convex hull strategy.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/17/2019

WOTBoost: Weighted Oversampling Technique in Boosting for imbalanced learning

Machine learning classifiers often stumble over imbalanced datasets wher...
research
02/21/2022

Imbalanced Classification via Explicit Gradient Learning From Augmented Data

Learning from imbalanced data is one of the most significant challenges ...
research
02/10/2020

UGRWO-Sampling: A modified random walk under-sampling approach based on graphs to imbalanced data classification

In this paper, we propose a new RWO-Sampling (Random Walk Over-Sampling)...
research
11/19/2017

Lung nodule classification by THE combination of Fusion classifier and Cascaded Convolutional Neural Networks

Lung nodule classification is a class imbalanced problem, as nodules are...
research
12/30/2021

The SAMME.C2 algorithm for severely imbalanced multi-class classification

Classification predictive modeling involves the accurate assignment of o...
research
07/08/2020

Remix: Rebalanced Mixup

Deep image classifiers often perform poorly when training data are heavi...
research
06/16/2016

ACDC: α-Carving Decision Chain for Risk Stratification

In many healthcare settings, intuitive decision rules for risk stratific...

Please sign up or login with your details

Forgot password? Click here to reset