Measuring Gender Bias in West Slavic Language Models

04/12/2023
by   Sandra Martinková, et al.
0

Pre-trained language models have been known to perpetuate biases from the underlying datasets to downstream tasks. However, these findings are predominantly based on monolingual language models for English, whereas there are few investigative studies of biases encoded in language models for languages beyond English. In this paper, we fill this gap by analysing gender bias in West Slavic language models. We introduce the first template-based dataset in Czech, Polish, and Slovak for measuring gender bias towards male, female and non-binary subjects. We complete the sentences using both mono- and multilingual language models and assess their suitability for the masked language modelling objective. Next, we measure gender bias encoded in West Slavic language models by quantifying the toxicity and genderness of the generated words. We find that these language models produce hurtful completions that depend on the subject's gender. Perhaps surprisingly, Czech, Slovak, and Polish language models produce more hurtful completions with men as subjects, which, upon inspection, we find is due to completions being related to violence, death, and sickness.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/16/2023

Politeness Stereotypes and Attack Vectors: Gender Stereotypes in Japanese and Korean Language Models

In efforts to keep up with the rapid progress and use of large language ...
research
04/15/2021

Quantifying Gender Bias Towards Politicians in Cross-Lingual Language Models

While the prevalence of large pre-trained language models has led to sig...
research
04/20/2020

StereoSet: Measuring stereotypical bias in pretrained language models

A stereotype is an over-generalized belief about a particular group of p...
research
09/08/2022

Efficient Gender Debiasing of Pre-trained Indic Language Models

The gender bias present in the data on which language models are pre-tra...
research
04/12/2023

Measuring Normative and Descriptive Biases in Language Models Using Census Data

We investigate in this paper how distributions of occupations with respe...
research
03/10/2022

Speciesist Language and Nonhuman Animal Bias in English Masked Language Models

Various existing studies have analyzed what social biases are inherited ...
research
05/12/2022

Using Natural Sentences for Understanding Biases in Language Models

Evaluation of biases in language models is often limited to syntheticall...

Please sign up or login with your details

Forgot password? Click here to reset