The Effect of Balancing Methods on Model Behavior in Imbalanced Classification Problems

06/30/2023
by   Adrian Stando, et al.
0

Imbalanced data poses a significant challenge in classification as model performance is affected by insufficient learning from minority classes. Balancing methods are often used to address this problem. However, such techniques can lead to problems such as overfitting or loss of information. This study addresses a more challenging aspect of balancing methods - their impact on model behavior. To capture these changes, Explainable Artificial Intelligence tools are used to compare models trained on datasets before and after balancing. In addition to the variable importance method, this study uses the partial dependence profile and accumulated local effects techniques. Real and simulated datasets are tested, and an open-source Python package edgaro is developed to facilitate this analysis. The results obtained show significant changes in model behavior due to balancing methods, which can lead to biased models toward a balanced distribution. These findings confirm that balancing analysis should go beyond model performance comparisons to achieve higher reliability of machine learning models. Therefore, we propose a new method performance gain plot for informed data balancing strategy to make an optimal selection of balancing method by analyzing the measure of change in model behavior versus performance gain.

READ FULL TEXT

page 6

page 7

page 8

page 11

research
06/24/2023

Evaluating the Utility of GAN Generated Synthetic Tabular Data for Class Balancing and Low Resource Settings

The present study aimed to address the issue of imbalanced data in class...
research
10/06/2021

Influence-Balanced Loss for Imbalanced Visual Classification

In this paper, we propose a balancing training method to address problem...
research
01/21/2022

To SMOTE, or not to SMOTE?

In imbalanced binary classification problems the objective metric is oft...
research
04/06/2021

Balancing Predictive Relevance of Ligand Biochemical Activities

In this paper, we present a technique for balancing predictive relevance...
research
09/22/2021

Vehicle Behavior Prediction and Generalization Using Imbalanced Learning Techniques

The use of learning-based methods for vehicle behavior prediction is a p...
research
06/30/2023

Dataset balancing can hurt model performance

Machine learning from training data with a skewed distribution of exampl...
research
12/09/2011

Information and Search in Computer Chess

The article describes a model of chess based on information theory. A ma...

Please sign up or login with your details

Forgot password? Click here to reset