Debiasing isn't enough! – On the Effectiveness of Debiasing MLMs and their Social Biases in Downstream Tasks

10/06/2022
by   Masahiro Kaneko, et al.
4

We study the relationship between task-agnostic intrinsic and task-specific extrinsic social bias evaluation measures for Masked Language Models (MLMs), and find that there exists only a weak correlation between these two types of evaluation measures. Moreover, we find that MLMs debiased using different methods still re-learn social biases during fine-tuning on downstream tasks. We identify the social biases in both training instances as well as their assigned labels as reasons for the discrepancy between intrinsic and extrinsic bias evaluation measurements. Overall, our findings highlight the limitations of existing MLM bias evaluation measures and raise concerns on the deployment of MLMs in downstream applications using those measures.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/20/2023

A Survey on Fairness in Large Language Models

Large language models (LLMs) have shown powerful performance and develop...
research
01/28/2023

Comparing Intrinsic Gender Bias Evaluation Measures without using Human Annotated Examples

Numerous types of social biases have been identified in pre-trained lang...
research
10/26/2022

MABEL: Attenuating Gender Bias using Textual Entailment Data

Pre-trained language models encode undesirable social biases, which are ...
research
09/16/2023

The Impact of Debiasing on the Performance of Language Models in Downstream Tasks is Underestimated

Pre-trained language models trained on large-scale data have learned ser...
research
05/23/2022

Looking for a Handsome Carpenter! Debiasing GPT-3 Job Advertisements

The growing capability and availability of generative language models ha...
research
04/15/2021

Unmasking the Mask – Evaluating Social Biases in Masked Language Models

Masked Language Models (MLMs) have shown superior performances in numero...
research
06/08/2023

Bias Against 93 Stigmatized Groups in Masked Language Models and Downstream Sentiment Classification Tasks

The rapid deployment of artificial intelligence (AI) models demands a th...

Please sign up or login with your details

Forgot password? Click here to reset