Log In Sign Up

Debiased Large Language Models Still Associate Muslims with Uniquely Violent Acts

by   Babak Hemmatian, et al.

Recent work demonstrates a bias in the GPT-3 model towards generating violent text completions when prompted about Muslims, compared with Christians and Hindus. Two pre-registered replication attempts, one exact and one approximate, found only the weakest bias in the more recent Instruct Series version of GPT-3, fine-tuned to eliminate biased and toxic outputs. Few violent completions were observed. Additional pre-registered experiments, however, showed that using common names associated with the religions in prompts yields a highly significant increase in violent completions, also revealing a stronger second-order bias against Muslims. Names of Muslim celebrities from non-violent domains resulted in relatively fewer violent completions, suggesting that access to individualized information can steer the model away from using stereotypes. Nonetheless, content analysis revealed religion-specific violent themes containing highly offensive ideas regardless of prompt format. Our results show the need for additional debiasing of large language models to address higher-order schemas and associations.


Quantifying Gender Bias Towards Politicians in Cross-Lingual Language Models

While the prevalence of large pre-trained language models has led to sig...

FineDeb: A Debiasing Framework for Language Models

As language models are increasingly included in human-facing machine lea...

Stereotype and Skew: Quantifying Gender Bias in Pre-trained and Fine-tuned Language Models

This paper proposes two intuitive metrics, skew and stereotype, that qua...

Low Frequency Names Exhibit Bias and Overfitting in Contextualizing Language Models

We use a dataset of U.S. first names with labels based on predominant ge...

Towards Multimodal Vision-Language Models Generating Non-Generic Text

Vision-language models can assess visual context in an image and generat...

"You are grounded!": Latent Name Artifacts in Pre-trained Language Models

Pre-trained language models (LMs) may perpetuate biases originating in t...

Unintended Bias in Language Model-driven Conversational Recommendation

Conversational Recommendation Systems (CRSs) have recently started to le...