To SMOTE, or not to SMOTE?

01/21/2022
by   Yotam Elor, et al.
0

In imbalanced binary classification problems the objective metric is often non-symmetric and associates a higher penalty with the minority samples. On the other hand, the loss function used for training is usually symmetric - equally penalizing majority and minority samples. Balancing schemes, that augment the data to be more balanced before training the model, were proposed to address this discrepancy and were shown to improve prediction performance empirically on tabular data. However, recent studies of consistent classifiers suggest that the metric discrepancy might not hinder prediction performance. In light of these recent theoretical results, we carefully revisit the empirical study of balancing tabular data. Our extensive experiments, on 73 datasets, show that generally, in accordance with theory, best prediction is achieved by using a strong consistent classifier and balancing is not beneficial. We further identity several scenarios for which balancing is effective and observe that prior studies mainly focus on these settings.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/06/2021

Influence-Balanced Loss for Imbalanced Visual Classification

In this paper, we propose a balancing training method to address problem...
research
06/30/2023

The Effect of Balancing Methods on Model Behavior in Imbalanced Classification Problems

Imbalanced data poses a significant challenge in classification as model...
research
06/24/2023

Evaluating the Utility of GAN Generated Synthetic Tabular Data for Class Balancing and Low Resource Settings

The present study aimed to address the issue of imbalanced data in class...
research
04/30/2023

Class-Balancing Diffusion Models

Diffusion-based models have shown the merits of generating high-quality ...
research
03/24/2018

Balanced Random Survival Forests for Extremely Unbalanced, Right Censored Data

Accuracies of survival models for life expectancy prediction as well as ...
research
02/02/2022

Normalise for Fairness: A Simple Normalisation Technique for Fairness in Regression Machine Learning Problems

Algorithms and Machine Learning (ML) are increasingly affecting everyday...
research
04/06/2021

Balancing Predictive Relevance of Ligand Biochemical Activities

In this paper, we present a technique for balancing predictive relevance...

Please sign up or login with your details

Forgot password? Click here to reset