Debiased Large Language Models Still Associate Muslims with Uniquely Violent Acts

08/08/2022
by   Babak Hemmatian, et al.
0

Recent work demonstrates a bias in the GPT-3 model towards generating violent text completions when prompted about Muslims, compared with Christians and Hindus. Two pre-registered replication attempts, one exact and one approximate, found only the weakest bias in the more recent Instruct Series version of GPT-3, fine-tuned to eliminate biased and toxic outputs. Few violent completions were observed. Additional pre-registered experiments, however, showed that using common names associated with the religions in prompts yields a highly significant increase in violent completions, also revealing a stronger second-order bias against Muslims. Names of Muslim celebrities from non-violent domains resulted in relatively fewer violent completions, suggesting that access to individualized information can steer the model away from using stereotypes. Nonetheless, content analysis revealed religion-specific violent themes containing highly offensive ideas regardless of prompt format. Our results show the need for additional debiasing of large language models to address higher-order schemas and associations.

READ FULL TEXT
research
04/15/2021

Quantifying Gender Bias Towards Politicians in Cross-Lingual Language Models

While the prevalence of large pre-trained language models has led to sig...
research
02/05/2023

FineDeb: A Debiasing Framework for Language Models

As language models are increasingly included in human-facing machine lea...
research
05/23/2023

Having Beer after Prayer? Measuring Cultural Bias in Large Language Models

Are language models culturally biased? It is important that language mod...
research
10/01/2021

Low Frequency Names Exhibit Bias and Overfitting in Contextualizing Language Models

We use a dataset of U.S. first names with labels based on predominant ge...
research
07/09/2022

Towards Multimodal Vision-Language Models Generating Non-Generic Text

Vision-language models can assess visual context in an image and generat...
research
09/08/2023

Matching Table Metadata with Business Glossaries Using Large Language Models

Enterprises often own large collections of structured data in the form o...
research
04/07/2023

What does ChatGPT return about human values? Exploring value bias in ChatGPT using a descriptive value theory

There has been concern about ideological basis and possible discriminati...

Please sign up or login with your details

Forgot password? Click here to reset