Counterfactual Fairness in Text Classification through Robustness

09/27/2018
by   Sahaj Garg, et al.
8

In this paper, we study counterfactual fairness in text classification, which asks the question: How would the prediction change if the sensitive attribute discussed in the example were something else? We offer a heuristic for measuring this particular form of fairness in text classifiers by substituting individual tokens pertaining to attributes (e.g. sexual orientation, race, and religion), and describe the relationship with other notions, including individual and group fairness. Further, we offer methods, including hard ablation, blindness, and counterfactual logit pairing, for optimizing this counterfactual fairness metric during model training, bridging the robustness literature and the fairness literature. Empirically, counterfactual logit pairing performs as well as hard ablation and blindness to sensitive tokens, but generalizes better to unseen tokens. Interestingly, we find that in practice, the methods do not significantly harm classifier performance, and have varying tradeoffs with group fairness. These approaches, both for measurement and optimization, provide a new path forward for addressing counterfactual fairness issues.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/08/2022

Counterfactual Multi-Token Fairness in Text Classification

The counterfactual token generation has been limited to perturbing only ...
research
06/28/2022

Flexible text generation for counterfactual fairness probing

A common approach for testing fairness issues in text-based classifiers ...
research
01/10/2022

Learning Fair Node Representations with Graph Counterfactual Fairness

Fair machine learning aims to mitigate the biases of model predictions a...
research
01/29/2019

Repairing without Retraining: Avoiding Disparate Impact with Counterfactual Distributions

When the average performance of a prediction model varies significantly ...
research
10/13/2022

Walk a Mile in Their Shoes: a New Fairness Criterion for Machine Learning

The old empathetic adage, “Walk a mile in their shoes,” asks that one im...
research
03/01/2023

Fairness Evaluation in Text Classification: Machine Learning Practitioner Perspectives of Individual and Group Fairness

Mitigating algorithmic bias is a critical task in the development and de...
research
10/24/2020

Fair Hate Speech Detection through Evaluation of Social Group Counterfactuals

Approaches for mitigating bias in supervised models are designed to redu...

Please sign up or login with your details

Forgot password? Click here to reset