Partial Resampling of Imbalanced Data

07/11/2022
by   Firuz Kamalov, et al.
0

Imbalanced data is a frequently encountered problem in machine learning. Despite a vast amount of literature on sampling techniques for imbalanced data, there is a limited number of studies that address the issue of the optimal sampling ratio. In this paper, we attempt to fill the gap in the literature by conducting a large scale study of the effects of sampling ratio on classification accuracy. We consider 10 popular sampling methods and evaluate their performance over a range of ratios based on 20 datasets. The results of the numerical experiments suggest that the optimal sampling ratio is between 0.7 and 0.8 albeit the exact ratio varies depending on the dataset. Furthermore, we find that while factors such the original imbalance ratio or the number of features do not play a discernible role in determining the optimal ratio, the number of samples in the dataset may have a tangible effect.

READ FULL TEXT

page 8

page 12

page 14

research
08/20/2022

A Novel Hybrid Sampling Framework for Imbalanced Learning

Class imbalance is a frequently occurring scenario in classification tas...
research
06/02/2023

A systematic literature review on the code smells datasets and validation mechanisms

The accuracy reported for code smell-detecting tools varies depending on...
research
10/17/2019

KDE sampling for imbalanced class distribution

Imbalanced response variable distribution is not an uncommon occurrence ...
research
01/29/2019

Bayes Imbalance Impact Index: A Measure of Class Imbalanced Dataset for Classification Problem

Recent studies have shown that imbalance ratio is not the only cause of ...
research
08/25/2022

An Empirical Analysis of the Efficacy of Different Sampling Techniques for Imbalanced Classification

Learning from imbalanced data is a challenging task. Standard classifica...
research
06/20/2021

On Sampling Top-K Recommendation Evaluation

Recently, Rendle has warned that the use of sampling-based top-k metrics...
research
06/12/2021

Study of sampling methods in sentiment analysis of imbalanced data

This work investigates the application of sampling methods for sentiment...

Please sign up or login with your details

Forgot password? Click here to reset