Gender Bias in Multilingual Embeddings and Cross-Lingual Transfer

05/02/2020
by   Jieyu Zhao, et al.
0

Multilingual representations embed words from many languages into a single semantic space such that words with similar meanings are close to each other regardless of the language. These embeddings have been widely used in various settings, such as cross-lingual transfer, where a natural language processing (NLP) model trained on one language is deployed to another language. While the cross-lingual transfer techniques are powerful, they carry gender bias from the source to target languages. In this paper, we study gender bias in multilingual embeddings and how it affects transfer learning for NLP applications. We create a multilingual dataset for bias analysis and propose several ways for quantifying bias in multilingual representations from both the intrinsic and extrinsic perspectives. Experimental results show that the magnitude of bias in the multilingual representations changes differently when we align the embeddings to different target spaces and that the alignment direction can also have an influence on the bias in transfer learning. We further provide recommendations for using the multilingual word representations for downstream tasks.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/12/2022

How Do Multilingual Encoders Learn Cross-lingual Representation?

NLP systems typically require support for more than one language. As dif...
research
04/30/2021

Cross-lingual hate speech detection based on multilingual domain-specific word embeddings

Automatic hate speech detection in online social networks is an importan...
research
11/19/2015

Transfer Learning for Speech and Language Processing

Transfer learning is a vital technique that generalizes models trained f...
research
07/02/2018

Modeling Language Variation and Universals: A Survey on Typological Linguistics for Natural Language Processing

Addressing the cross-lingual variation of grammatical structures and mea...
research
09/10/2021

A Simple and Effective Method To Eliminate the Self Language Bias in Multilingual Representations

Language agnostic and semantic-language information isolation is an emer...
research
10/06/2021

Using Optimal Transport as Alignment Objective for fine-tuning Multilingual Contextualized Embeddings

Recent studies have proposed different methods to improve multilingual w...
research
05/23/2023

Pixel Representations for Multilingual Translation and Data-efficient Cross-lingual Transfer

We introduce and demonstrate how to effectively train multilingual machi...

Please sign up or login with your details

Forgot password? Click here to reset