Rethinking Class Imbalance in Machine Learning

05/06/2023
by   Ou Wu, et al.
0

Imbalance learning is a subfield of machine learning that focuses on learning tasks in the presence of class imbalance. Nearly all existing studies refer to class imbalance as a proportion imbalance, where the proportion of training samples in each class is not balanced. The ignorance of the proportion imbalance will result in unfairness between/among classes and poor generalization capability. Previous literature has presented numerous methods for either theoretical/empirical analysis or new methods for imbalance learning. This study presents a new taxonomy of class imbalance in machine learning with a broader scope. Four other types of imbalance, namely, variance, distance, neighborhood, and quality imbalances between/among classes, which may exist in machine learning tasks, are summarized. Two different levels of imbalance including global and local are also presented. Theoretical analysis is used to illustrate the significant impact of the new imbalance types on learning fairness. Moreover, our taxonomy and theoretical conclusions are used to analyze the shortcomings of several classical methods. As an example, we propose a new logit perturbation-based imbalance learning loss when proportion, variance, and distance imbalances exist simultaneously. Several classical losses become the special case of our proposed method. Meta learning is utilized to infer the hyper-parameters related to the three types of imbalance. Experimental results on several benchmark corpora validate the effectiveness of the proposed method.

READ FULL TEXT

page 1

page 9

research
09/09/2021

An Experimental Study of Class Imbalance in Federated Learning

Federated learning is a distributed machine learning paradigm that train...
research
07/01/2022

Characterizing the Effect of Class Imbalance on the Learning Dynamics

Data imbalance is a common problem in the machine learning literature th...
research
07/26/2021

Compensation Learning

Weighting strategy prevails in machine learning. For example, a common a...
research
12/22/2020

A Survey of Methods for Managing the Classification and Solution of Data Imbalance Problem

The problem of class imbalance is extensive for focusing on numerous app...
research
08/31/2019

Imbalance Problems in Object Detection: A Review

In this paper, we present a comprehensive review of the imbalance proble...
research
06/02/2019

Radial-Based Undersampling for Imbalanced Data Classification

Data imbalance remains one of the most widespread problems affecting con...

Please sign up or login with your details

Forgot password? Click here to reset