InterFair: Debiasing with Natural Language Feedback for Fair Interpretable Predictions

Debiasing methods in NLP models traditionally focus on isolating information related to a sensitive attribute (like gender or race). We instead argue that a favorable debiasing method should use sensitive information 'fairly,' with explanations, rather than blindly eliminating it. This fair balance is often subjective and can be challenging to achieve algorithmically. We show that an interactive setup with users enabled to provide feedback can achieve a better and fair balance between task performance and bias mitigation, supported by faithful explanations.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/14/2022

Controlling Bias Exposure for Fair Interpretable Predictions

Recent work on reducing bias in NLP models usually focuses on protecting...
research
10/19/2021

fairadapt: Causal Reasoning for Fair Data Pre-processing

Machine learning algorithms are useful for various predictions tasks, bu...
research
04/18/2021

Improving Neural Model Performance through Natural Language Feedback on Their Explanations

A class of explainable NLP models for reasoning tasks support their deci...
research
11/15/2020

FAIR: Fair Adversarial Instance Re-weighting

With growing awareness of societal impact of artificial intelligence, fa...
research
02/20/2018

Teaching Categories to Human Learners with Visual Explanations

We study the problem of computer-assisted teaching with explanations. Co...
research
09/18/2023

What is a Fair Diffusion Model? Designing Generative Text-To-Image Models to Incorporate Various Worldviews

Generative text-to-image (GTI) models produce high-quality images from s...
research
10/09/2020

CryptoCredit: Securely Training Fair Models

When developing models for regulated decision making, sensitive features...

Please sign up or login with your details

Forgot password? Click here to reset