DeepAI
Log In Sign Up

Debiased Large Language Models Still Associate Muslims with Uniquely Violent Acts

08/08/2022
by   Babak Hemmatian, et al.
0

Recent work demonstrates a bias in the GPT-3 model towards generating violent text completions when prompted about Muslims, compared with Christians and Hindus. Two pre-registered replication attempts, one exact and one approximate, found only the weakest bias in the more recent Instruct Series version of GPT-3, fine-tuned to eliminate biased and toxic outputs. Few violent completions were observed. Additional pre-registered experiments, however, showed that using common names associated with the religions in prompts yields a highly significant increase in violent completions, also revealing a stronger second-order bias against Muslims. Names of Muslim celebrities from non-violent domains resulted in relatively fewer violent completions, suggesting that access to individualized information can steer the model away from using stereotypes. Nonetheless, content analysis revealed religion-specific violent themes containing highly offensive ideas regardless of prompt format. Our results show the need for additional debiasing of large language models to address higher-order schemas and associations.

READ FULL TEXT
04/15/2021

Quantifying Gender Bias Towards Politicians in Cross-Lingual Language Models

While the prevalence of large pre-trained language models has led to sig...
02/05/2023

FineDeb: A Debiasing Framework for Language Models

As language models are increasingly included in human-facing machine lea...
01/24/2021

Stereotype and Skew: Quantifying Gender Bias in Pre-trained and Fine-tuned Language Models

This paper proposes two intuitive metrics, skew and stereotype, that qua...
10/01/2021

Low Frequency Names Exhibit Bias and Overfitting in Contextualizing Language Models

We use a dataset of U.S. first names with labels based on predominant ge...
07/09/2022

Towards Multimodal Vision-Language Models Generating Non-Generic Text

Vision-language models can assess visual context in an image and generat...
04/06/2020

"You are grounded!": Latent Name Artifacts in Pre-trained Language Models

Pre-trained language models (LMs) may perpetuate biases originating in t...
01/17/2022

Unintended Bias in Language Model-driven Conversational Recommendation

Conversational Recommendation Systems (CRSs) have recently started to le...