Fair Oversampling Technique using Heterogeneous Clusters

05/23/2023
by   Ryosuke Sonoda, et al.
0

Class imbalance and group (e.g., race, gender, and age) imbalance are acknowledged as two reasons in data that hinder the trade-off between fairness and utility of machine learning classifiers. Existing techniques have jointly addressed issues regarding class imbalance and group imbalance by proposing fair over-sampling techniques. Unlike the common oversampling techniques, which only address class imbalance, fair oversampling techniques significantly improve the abovementioned trade-off, as they can also address group imbalance. However, if the size of the original clusters is too small, these techniques may cause classifier overfitting. To address this problem, we herein develop a fair oversampling technique using data from heterogeneous clusters. The proposed technique generates synthetic data that have class-mix features or group-mix features to make classifiers robust to overfitting. Moreover, we develop an interpolation method that can enhance the validity of generated synthetic data by considering the original cluster distribution and data noise. Finally, we conduct experiments on five realistic datasets and three classifiers, and the experimental results demonstrate the effectiveness of the proposed technique in terms of fairness and utility.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/13/2022

Towards A Holistic View of Bias in Machine Learning: Bridging Algorithmic Fairness and Imbalanced Learning

Machine learning (ML) is playing an increasingly important role in rende...
research
07/12/2020

Ensuring Fairness Beyond the Training Data

We initiate the study of fair classifiers that are robust to perturbatio...
research
09/03/2019

Avoiding Resentment Via Monotonic Fairness

Classifiers that achieve demographic balance by explicitly using protect...
research
03/05/2019

Copying Machine Learning Classifiers

We study model-agnostic copies of machine learning classifiers. We devel...
research
10/24/2022

FairGen: Fair Synthetic Data Generation

With the rising adoption of Machine Learning across the domains like ban...
research
06/23/2021

Fairness in Cardiac MR Image Analysis: An Investigation of Bias Due to Data Imbalance in Deep Learning Based Segmentation

The subject of "fairness" in artificial intelligence (AI) refers to asse...
research
05/13/2021

Addressing Fairness, Bias and Class Imbalance in Machine Learning: the FBI-loss

Resilience to class imbalance and confounding biases, together with the ...

Please sign up or login with your details

Forgot password? Click here to reset