Can Active Learning Preemptively Mitigate Fairness Issues?

Dataset bias is one of the prevailing causes of unfairness in machine learning. Addressing fairness at the data collection and dataset preparation stages therefore becomes an essential part of training fairer algorithms. In particular, active learning (AL) algorithms show promise for the task by drawing importance to the most informative training samples. However, the effect and interaction between existing AL algorithms and algorithmic fairness remain under-explored. In this paper, we study whether models trained with uncertainty-based AL heuristics such as BALD are fairer in their decisions with respect to a protected class than those trained with identically independently distributed (i.i.d.) sampling. We found a significant improvement on predictive parity when using BALD, while also improving accuracy compared to i.i.d. sampling. We also explore the interaction of algorithmic fairness methods such as gradient reversal (GRAD) and BALD. We found that, while addressing different fairness issues, their interaction further improves the results on most benchmarks and metrics we explored.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/15/2022

More Data Can Lead Us Astray: Active Data Acquisition in the Presence of Label Bias

An increased awareness concerning risks of algorithmic bias has driven a...
research
01/06/2020

Fair Active Learning

Bias in training data and proxy attributes are probably the main reasons...
research
02/22/2021

Coping with Mistreatment in Fair Algorithms

Machine learning actively impacts our everyday life in almost all endeav...
research
06/14/2022

ABCinML: Anticipatory Bias Correction in Machine Learning Applications

The idealization of a static machine-learned model, trained once and dep...
research
12/13/2021

Addressing Bias in Active Learning with Depth Uncertainty Networks... or Not

Farquhar et al. [2021] show that correcting for active learning bias wit...
research
12/07/2020

Active Learning Methods for Efficient Hybrid Biophysical Variable Retrieval

Kernel-based machine learning regression algorithms (MLRAs) are potentia...
research
02/11/2023

Pushing the Accuracy-Group Robustness Frontier with Introspective Self-play

Standard empirical risk minimization (ERM) training can produce deep neu...

Please sign up or login with your details

Forgot password? Click here to reset