Detecting Natural Language Biases with Prompt-based Learning

09/11/2023
by   Md Abdul Aowal, et al.
0

In this project, we want to explore the newly emerging field of prompt engineering and apply it to the downstream task of detecting LM biases. More concretely, we explore how to design prompts that can indicate 4 different types of biases: (1) gender, (2) race, (3) sexual orientation, and (4) religion-based. Within our project, we experiment with different manually crafted prompts that can draw out the subtle biases that may be present in the language model. We apply these prompts to multiple variations of popular and well-recognized models: BERT, RoBERTa, and T5 to evaluate their biases. We provide a comparative analysis of these models and assess them using a two-fold method: use human judgment to decide whether model predictions are biased and utilize model-level judgment (through further prompts) to understand if a model can self-diagnose the biases of its own prediction.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/28/2020

Defining and Evaluating Fair Natural Language Generation

Our work focuses on the biases that emerge in the natural language gener...
research
07/21/2022

The Birth of Bias: A case study on the evolution of gender bias in an English language model

Detecting and mitigating harmful biases in modern language models are wi...
research
08/03/2022

Large scale analysis of gender bias and sexism in song lyrics

We employ Natural Language Processing techniques to analyse 377808 Engli...
research
04/11/2019

Reducing Lateral Visual Biases in Displays

The human visual system is composed of multiple physiological components...
research
07/02/2019

Quantifying Algorithmic Biases over Time

Algorithms now permeate multiple aspects of human lives and multiple rec...
research
02/08/2021

How True is GPT-2? An Empirical Analysis of Intersectional Occupational Biases

The capabilities of natural language models trained on large-scale data ...
research
07/14/2023

Mitigating Bias in Conversations: A Hate Speech Classifier and Debiaser with Prompts

Discriminatory language and biases are often present in hate speech duri...

Please sign up or login with your details

Forgot password? Click here to reset