Deep Learning on a Healthy Data Diet: Finding Important Examples for Fairness

11/20/2022
by   Abdelrahman Zayed, et al.
0

Data-driven predictive solutions predominant in commercial applications tend to suffer from biases and stereotypes, which raises equity concerns. Prediction models may discover, use, or amplify spurious correlations based on gender or other protected personal characteristics, thus discriminating against marginalized groups. Mitigating gender bias has become an important research focus in natural language processing (NLP) and is an area where annotated corpora are available. Data augmentation reduces gender bias by adding counterfactual examples to the training dataset. In this work, we show that some of the examples in the augmented dataset can be not important or even harmful for fairness. We hence propose a general method for pruning both the factual and counterfactual examples to maximize the model's fairness as measured by the demographic parity, equality of opportunity, and equality of odds. The fairness achieved by our method surpasses that of data augmentation on three text classification datasets, using no more than half of the examples in the augmented dataset. Our experiments are conducted using models of varying sizes and pre-training settings.

READ FULL TEXT

page 5

page 6

page 7

page 10

research
06/21/2021

Does Robustness Improve Fairness? Approaching Fairness with Word Substitution Robustness Methods for Text Classification

Existing bias mitigation methods to reduce disparities in model outcomes...
research
05/22/2023

Should We Attend More or Less? Modulating Attention for Fairness

The abundance of annotated data in natural language processing (NLP) pos...
research
09/07/2022

Decoding Demographic un-fairness from Indian Names

Demographic classification is essential in fairness assessment in recomm...
research
06/08/2023

Are fairness metric scores enough to assess discrimination biases in machine learning?

This paper presents novel experiments shedding light on the shortcomings...
research
01/22/2018

Mitigating Unwanted Biases with Adversarial Learning

Machine learning is a tool for building models that accurately represent...
research
04/26/2020

Is Your Classifier Actually Biased? Measuring Fairness under Uncertainty with Bernstein Bounds

Most NLP datasets are not annotated with protected attributes such as ge...
research
09/16/2021

Balancing out Bias: Achieving Fairness Through Training Reweighting

Bias in natural language processing arises primarily from models learnin...

Please sign up or login with your details

Forgot password? Click here to reset