Foundations of data imbalance and solutions for a data democracy

07/30/2021
by   Ajay Kulkarni, et al.
0

Dealing with imbalanced data is a prevalent problem while performing classification on the datasets. Many times, this problem contributes to bias while making decisions or implementing policies. Thus, it is vital to understand the factors which cause imbalance in the data (or class imbalance). Such hidden biases and imbalances can lead to data tyranny and a major challenge to a data democracy. In this chapter, two essential statistical elements are resolved: the degree of class imbalance and the complexity of the concept; solving such issues helps in building the foundations of a data democracy. Furthermore, statistical measures which are appropriate in these scenarios are discussed and implemented on a real-life dataset (car insurance claims). In the end, popular data-level methods such as random oversampling, random undersampling, synthetic minority oversampling technique, Tomek link, and others are implemented in Python, and their performance is compared.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/25/2022

An Empirical Analysis of the Efficacy of Different Sampling Techniques for Imbalanced Classification

Learning from imbalanced data is a challenging task. Standard classifica...
research
01/29/2019

Bayes Imbalance Impact Index: A Measure of Class Imbalanced Dataset for Classification Problem

Recent studies have shown that imbalance ratio is not the only cause of ...
research
11/27/2022

ReGrAt: Regularization in Graphs using Attention to handle class imbalance

Node classification is an important task to solve in graph-based learnin...
research
06/23/2020

Classification Performance Metric for Imbalance Data Based on Recall and Selectivity Normalized in Class Labels

In the classification of a class imbalance dataset, the performance meas...
research
04/08/2021

On Telecommunication Service Imbalance and Infrastructure Resource Deployment

The digital divide restricting the access of people living in developing...
research
08/05/2019

Imbalance-XGBoost: Leveraging Weighted and Focal Losses for Binary Label-Imbalanced Classification with XGBoost

The paper presents Imbalance-XGBoost, a Python package that combines the...
research
05/13/2021

Addressing Fairness, Bias and Class Imbalance in Machine Learning: the FBI-loss

Resilience to class imbalance and confounding biases, together with the ...

Please sign up or login with your details

Forgot password? Click here to reset