Choose Your Lenses: Flaws in Gender Bias Evaluation

10/20/2022
by   Hadas Orgad, et al.
0

Considerable efforts to measure and mitigate gender bias in recent years have led to the introduction of an abundance of tasks, datasets, and metrics used in this vein. In this position paper, we assess the current paradigm of gender bias evaluation and identify several flaws in it. First, we highlight the importance of extrinsic bias metrics that measure how a model's performance on some task is affected by gender, as opposed to intrinsic evaluations of model representations, which are less strongly connected to specific harms to people interacting with systems. We find that only a few extrinsic metrics are measured in most studies, although more can be measured. Second, we find that datasets and metrics are often coupled, and discuss how their coupling hinders the ability to obtain reliable conclusions, and how one may decouple them. We then investigate how the choice of the dataset and its composition, as well as the choice of the metric, affect bias measurement, finding significant variations across each of them. Finally, we propose several guidelines for more reliable gender bias evaluation.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/14/2022

How Gender Debiasing Affects Internal Model Representations, and Why It Matters

Common studies of gender bias in NLP focus either on extrinsic bias meas...
research
04/05/2019

Gender Bias in Contextualized Word Embeddings

In this paper, we quantify, analyze and mitigate gender bias exhibited i...
research
05/30/2022

Gender Bias in Password Managers

For the first time, we report gender bias in people's choice and use of ...
research
12/31/2020

Intrinsic Bias Metrics Do Not Correlate with Application Bias

Natural Language Processing (NLP) systems learn harmful societal biases ...
research
06/18/2023

Gender Bias in Transformer Models: A comprehensive survey

Gender bias in artificial intelligence (AI) has emerged as a pressing co...
research
11/14/2022

Does Debiasing Inevitably Degrade the Model Performance

Gender bias in language models has attracted sufficient attention becaus...
research
08/09/2023

Are Sex-based Physiological Differences the Cause of Gender Bias for Chest X-ray Diagnosis?

While many studies have assessed the fairness of AI algorithms in the me...

Please sign up or login with your details

Forgot password? Click here to reset