StereoSet: Measuring stereotypical bias in pretrained language models

04/20/2020
by   Moin Nadeem, et al.
0

A stereotype is an over-generalized belief about a particular group of people, e.g., Asians are good at math or Asians are bad drivers. Such beliefs (biases) are known to hurt target groups. Since pretrained language models are trained on large real world data, they are known to capture stereotypical biases. In order to assess the adverse effects of these models, it is important to quantify the bias captured in them. Existing literature on quantifying bias evaluates pretrained language models on a small set of artificially constructed bias-assessing sentences. We present StereoSet, a large-scale natural dataset in English to measure stereotypical biases in four domains: gender, profession, race, and religion. We evaluate popular models like BERT, GPT-2, RoBERTa, and XLNet on our dataset and show that these models exhibit strong stereotypical biases. We also present a leaderboard with a hidden test set to track the bias of future language models at https://stereoset.mit.edu

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/12/2023

Measuring Gender Bias in West Slavic Language Models

Pre-trained language models have been known to perpetuate biases from th...
research
02/10/2023

FairPy: A Toolkit for Evaluation of Social Biases and their Mitigation in Large Language Models

Studies have shown that large pretrained language models exhibit biases ...
research
09/30/2020

CrowS-Pairs: A Challenge Dataset for Measuring Social Biases in Masked Language Models

Pretrained language models, especially masked language models (MLMs) hav...
research
07/07/2023

Evaluating Biased Attitude Associations of Language Models in an Intersectional Context

Language models are trained on large-scale corpora that embed implicit b...
research
03/10/2023

Overcoming Bias in Pretrained Models by Manipulating the Finetuning Dataset

Transfer learning is beneficial by allowing the expressive features of m...
research
01/14/2021

Persistent Anti-Muslim Bias in Large Language Models

It has been observed that large-scale language models capture undesirabl...
research
06/07/2023

Soft-prompt Tuning for Large Language Models to Evaluate Bias

Prompting large language models has gained immense popularity in recent ...

Please sign up or login with your details

Forgot password? Click here to reset