Debiasing Multilingual Word Embeddings: A Case Study of Three Indian Languages

07/21/2021
by   Srijan Bansal, et al.
0

In this paper, we advance the current state-of-the-art method for debiasing monolingual word embeddings so as to generalize well in a multilingual setting. We consider different methods to quantify bias and different debiasing approaches for monolingual as well as multilingual settings. We demonstrate the significance of our bias-mitigation approach on downstream NLP applications. Our proposed methods establish the state-of-the-art performance for debiasing multilingual embeddings for three Indian languages - Hindi, Bengali, and Telugu in addition to English. We believe that our work will open up new opportunities in building unbiased downstream NLP applications that are inherently dependent on the quality of the word embeddings used.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/11/2020

A Monolingual Approach to Contextualized Word Embeddings for Mid-Resource Languages

We use the multilingual OSCAR corpus, extracted from Common Crawl via la...
research
06/25/2017

Beyond Bilingual: Multi-sense Word Embeddings using Multilingual Context

Word embeddings, which represent a word as a point in a vector space, ha...
research
02/05/2016

Massively Multilingual Word Embeddings

We introduce new methods for estimating and evaluating embeddings of wor...
research
12/05/2019

Massive vs. Curated Word Embeddings for Low-Resourced Languages. The Case of Yorùbá and Twi

The success of several architectures to learn semantic representations f...
research
10/21/2022

Spectral Probing

Linguistic information is encoded at varying timescales (subwords, phras...
research
10/09/2017

Deep Learning Paradigm with Transformed Monolingual Word Embeddings for Multilingual Sentiment Analysis

The surge of social media use brings huge demand of multilingual sentime...

Please sign up or login with your details

Forgot password? Click here to reset