Log In Sign Up

Investigating Cross-Linguistic Gender Bias in Hindi-English Across Domains

by   Somya Khosla, et al.

Measuring, evaluating and reducing Gender Bias has come to the forefront with newer and improved language embeddings being released every few months. But could this bias vary from domain to domain? We see a lot of work to study these biases in various embedding models but limited work has been done to debias Indic languages. We aim to measure and study this bias in Hindi language, which is a higher-order language (gendered) with reference to English, a lower-order language. To achieve this, we study the variations across domains to quantify if domain embeddings allow us some insight into Gender bias for this pair of Hindi-English model. We will generate embeddings in four different corpora and compare results by implementing different metrics like with pre-trained State of the Art Indic-English translation model, which has performed better at many NLP tasks than existing models.


page 1

page 2

page 3

page 4


Efficient Gender Debiasing of Pre-trained Indic Language Models

The gender bias present in the data on which language models are pre-tra...

A diachronic evaluation of gender asymmetry in euphemism

The use of euphemisms is a known driver of language change. It has been ...

Evaluating Gender Bias in Hindi-English Machine Translation

With language models being deployed increasingly in the real world, it i...

Type B Reflexivization as an Unambiguous Testbed for Multilingual Multi-Task Gender Bias

The one-sided focus on English in previous studies of gender bias in NLP...

Professional Presentation and Projected Power: A Case Study of Implicit Gender Information in English CVs

Gender discrimination in hiring is a pertinent and persistent bias in so...

The Birth of Bias: A case study on the evolution of gender bias in an English language model

Detecting and mitigating harmful biases in modern language models are wi...

Men Also Like Shopping: Reducing Gender Bias Amplification using Corpus-level Constraints

Language is increasingly being used to define rich visual recognition pr...