Fewer Errors, but More Stereotypes? The Effect of Model Size on Gender Bias

06/20/2022
by   Yarden Tal, et al.
0

The size of pretrained models is increasing, and so is their performance on a variety of NLP tasks. However, as their memorization capacity grows, they might pick up more social biases. In this work, we examine the connection between model size and its gender bias (specifically, occupational gender bias). We measure bias in three masked language model families (RoBERTa, DeBERTa, and T5) in two setups: directly using prompt based method, and using a downstream task (Winogender). We find on the one hand that larger models receive higher bias scores on the former task, but when evaluated on the latter, they make fewer gender errors. To examine these potentially conflicting results, we carefully investigate the behavior of the different models on Winogender. We find that while larger models outperform smaller ones, the probability that their mistakes are caused by gender bias is higher. Moreover, we find that the proportion of stereotypical errors compared to anti-stereotypical ones grows with the model size. Our findings highlight the potential risks that can arise from increasing model size.

READ FULL TEXT
research
06/18/2019

Measuring Bias in Contextualized Word Representations

Contextual word embeddings such as BERT have achieved state of the art p...
research
05/12/2021

Evaluating Gender Bias in Natural Language Inference

Gender-bias stereotypes have recently raised significant ethical concern...
research
07/21/2019

Using Word Embeddings to Examine Gender Bias in Dutch Newspapers, 1950-1990

Contemporary debates on filter bubbles and polarization in public and so...
research
11/22/2021

Investigating Cross-Linguistic Gender Bias in Hindi-English Across Domains

Measuring, evaluating and reducing Gender Bias has come to the forefront...
research
10/16/2018

Gender Bias in Nobel Prizes

Strikingly few Nobel laureates within medicine, natural and social scien...
research
06/01/2022

Assessing Group-level Gender Bias in Professional Evaluations: The Case of Medical Student End-of-Shift Feedback

Although approximately 50 female physicians tend to be underrepresented ...
research
09/24/2020

Type B Reflexivization as an Unambiguous Testbed for Multilingual Multi-Task Gender Bias

The one-sided focus on English in previous studies of gender bias in NLP...

Please sign up or login with your details

Forgot password? Click here to reset