Debiasing Gender Bias in Information Retrieval Models

08/02/2022
by   Dhanasekar Sundararaman, et al.
0

Biases in culture, gender, ethnicity, etc. have existed for decades and have affected many areas of human social interaction. These biases have been shown to impact machine learning (ML) models, and for natural language processing (NLP), this can have severe consequences for downstream tasks. Mitigating gender bias in information retrieval (IR) is important to avoid propagating stereotypes. In this work, we employ a dataset consisting of two components: (1) relevance of a document to a query and (2) "gender" of a document, in which pronouns are replaced by male, female, and neutral conjugations. We definitively show that pre-trained models for IR do not perform well in zero-shot retrieval tasks when full fine-tuning of a large pre-trained BERT encoder is performed and that lightweight fine-tuning performed with adapter networks improves zero-shot retrieval performance almost by 20 We also illustrate that pre-trained models have gender biases that result in retrieved articles tending to be more often male than female. We overcome this by introducing a debiasing technique that penalizes the model when it prefers males over females, resulting in an effective model that retrieves articles in a balanced fashion across genders.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/07/2023

Language Models Get a Gender Makeover: Mitigating Gender Bias with Few-Shot Data Interventions

Societal biases present in pre-trained large language models are a criti...
research
01/19/2022

Grep-BiasIR: A Dataset for Investigating Gender Representation-Bias in Information Retrieval Results

The provided contents by information retrieval (IR) systems can reflect ...
research
06/06/2023

An Empirical Analysis of Parameter-Efficient Methods for Debiasing Pre-Trained Language Models

The increasingly large size of modern pretrained language models not onl...
research
09/14/2021

YES SIR!Optimizing Semantic Space of Negatives with Self-Involvement Ranker

Pre-trained model such as BERT has been proved to be an effective tool f...
research
01/30/2023

CSDR-BERT: a pre-trained scientific dataset match model for Chinese Scientific Dataset Retrieval

As the number of open and shared scientific datasets on the Internet inc...
research
05/26/2023

Leveraging Domain Knowledge for Inclusive and Bias-aware Humanitarian Response Entry Classification

Accurate and rapid situation analysis during humanitarian crises is crit...
research
05/14/2022

Naturalistic Causal Probing for Morpho-Syntax

Probing has become a go-to methodology for interpreting and analyzing de...

Please sign up or login with your details

Forgot password? Click here to reset