Under the Morphosyntactic Lens: A Multifaceted Evaluation of Gender Bias in Speech Translation

03/18/2022
by   Beatrice Savoldi, et al.
2

Gender bias is largely recognized as a problematic phenomenon affecting language technologies, with recent studies underscoring that it might surface differently across languages. However, most of current evaluation practices adopt a word-level focus on a narrow set of occupational nouns under synthetic conditions. Such protocols overlook key features of grammatical gender languages, which are characterized by morphosyntactic chains of gender agreement, marked on a variety of lexical items and parts-of-speech (POS). To overcome this limitation, we enrich the natural, gender-sensitive MuST-SHE corpus (Bentivogli et al., 2020) with two new linguistic annotation layers (POS and agreement chains), and explore to what extent different lexical categories and agreement phenomena are impacted by gender skews. Focusing on speech translation, we conduct a multifaceted evaluation on three language directions (English-French/Italian/Spanish), with models trained on varying amounts of data and different word segmentation techniques. By shedding light on model behaviours, gender bias, and its detection at several levels of granularity, our findings emphasize the value of dedicated analyses beyond aggregated overall results.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/28/2021

How to Split: the Effect of Word Segmentation on Gender Bias in Speech Translation

Having recognized gender bias as a major issue affecting current transla...
research
09/05/2019

Examining Gender Bias in Languages with Grammatical Gender

Recent studies have shown that word embeddings exhibit gender bias inher...
research
06/10/2020

Gender in Danger? Evaluating Speech Translation Technology on the MuST-SHE Corpus

Translating from languages without productive grammatical gender like En...
research
10/27/2020

Evaluating Gender Bias in Speech Translation

The scientific community is more and more aware of the necessity to embr...
research
08/05/2021

GENder-IT: An Annotated English-Italian Parallel Challenge Set for Cross-Linguistic Natural Gender Phenomena

Languages differ in terms of the absence or presence of gender features,...
research
10/29/2019

Quantifying the Semantic Core of Gender Systems

Many of the world's languages employ grammatical gender on the lexeme. F...

Please sign up or login with your details

Forgot password? Click here to reset