How Gender Debiasing Affects Internal Model Representations, and Why It Matters

04/14/2022
by   Hadas Orgad, et al.
0

Common studies of gender bias in NLP focus either on extrinsic bias measured by model performance on a downstream task or on intrinsic bias found in models' internal representations. However, the relationship between extrinsic and intrinsic bias is relatively unknown. In this work, we illuminate this relationship by measuring both quantities together: we debias a model during downstream fine-tuning, which reduces extrinsic bias, and measure the effect on intrinsic bias, which is operationalized as bias extractability with information-theoretic probing. Through experiments on two tasks and multiple bias metrics, we show that our intrinsic bias metric is a better indicator of debiasing than (a contextual adaptation of) the standard WEAT metric, and can also expose cases of superficial debiasing. Our framework provides a comprehensive perspective on bias in NLP models, which can be applied to deploy NLP systems in a more informed manner. Our code will be made publicly available.

READ FULL TEXT

page 6

page 7

page 16

page 17

page 20

page 21

page 25

research
12/31/2020

Intrinsic Bias Metrics Do Not Correlate with Application Bias

Natural Language Processing (NLP) systems learn harmful societal biases ...
research
10/20/2022

Choose Your Lenses: Flaws in Gender Bias Evaluation

Considerable efforts to measure and mitigate gender bias in recent years...
research
09/13/2023

In-Contextual Bias Suppression for Large Language Models

Despite their impressive performance in a wide range of NLP tasks, Large...
research
04/08/2022

Fair and Argumentative Language Modeling for Computational Argumentation

Although much work in NLP has focused on measuring and mitigating stereo...
research
11/08/2018

Labeling Bias in Galaxy Morphologies

We present a metric to quantify systematic labeling bias in galaxy morph...
research
04/08/2023

Bipol: A Novel Multi-Axes Bias Evaluation Metric with Explainability for NLP

We introduce bipol, a new metric with explainability, for estimating soc...
research
06/02/2020

A Multi-Task Comparator Framework for Kinship Verification

Approaches for kinship verification often rely on cosine distances betwe...

Please sign up or login with your details

Forgot password? Click here to reset