Efficient Gender Debiasing of Pre-trained Indic Language Models

09/08/2022
by   Neeraja Kirtane, et al.
12

The gender bias present in the data on which language models are pre-trained gets reflected in the systems that use these models. The model's intrinsic gender bias shows an outdated and unequal view of women in our culture and encourages discrimination. Therefore, in order to establish more equitable systems and increase fairness, it is crucial to identify and mitigate the bias existing in these models. While there is a significant amount of work in this area in English, there is a dearth of research being done in other gendered and low resources languages, particularly the Indian languages. English is a non-gendered language, where it has genderless nouns. The methodologies for bias detection in English cannot be directly deployed in other gendered languages, where the syntax and semantics vary. In our paper, we measure gender bias associated with occupations in Hindi language models. Our major contributions in this paper are the construction of a novel corpus to evaluate occupational gender bias in Hindi, quantify this existing bias in these systems using a well-defined metric, and mitigate it by efficiently fine-tuning our model. Our results reflect that the bias is reduced post-introduction of our proposed mitigation techniques. Our codebase is available publicly.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/12/2023

Measuring Gender Bias in West Slavic Language Models

Pre-trained language models have been known to perpetuate biases from th...
research
06/16/2021

Evaluating Gender Bias in Hindi-English Machine Translation

With language models being deployed increasingly in the real world, it i...
research
05/12/2022

Mitigating Gender Stereotypes in Hindi and Marathi

As the use of natural language processing increases in our day-to-day li...
research
11/22/2021

Investigating Cross-Linguistic Gender Bias in Hindi-English Across Domains

Measuring, evaluating and reducing Gender Bias has come to the forefront...
research
01/01/2023

CORGI-PM: A Chinese Corpus For Gender Bias Probing and Mitigation

As natural language processing (NLP) for gender bias becomes a significa...
research
07/13/2016

Tie-breaker: Using language models to quantify gender bias in sports journalism

Gender bias is an increasingly important issue in sports journalism. In ...
research
03/14/2022

FairLex: A Multilingual Benchmark for Evaluating Fairness in Legal Text Processing

We present a benchmark suite of four datasets for evaluating the fairnes...

Please sign up or login with your details

Forgot password? Click here to reset