Toxicity in ChatGPT: Analyzing Persona-assigned Language Models

04/11/2023
by   Ameet Deshpande, et al.
10

Large language models (LLMs) have shown incredible capabilities and transcended the natural language processing (NLP) community, with adoption throughout many services like healthcare, therapy, education, and customer service. Since users include people with critical information needs like students or patients engaging with chatbots, the safety of these systems is of prime importance. Therefore, a clear understanding of the capabilities and limitations of LLMs is necessary. To this end, we systematically evaluate toxicity in over half a million generations of ChatGPT, a popular dialogue-based LLM. We find that setting the system parameter of ChatGPT by assigning it a persona, say that of the boxer Muhammad Ali, significantly increases the toxicity of generations. Depending on the persona assigned to ChatGPT, its toxicity can increase up to 6x, with outputs engaging in incorrect stereotypes, harmful dialogue, and hurtful opinions. This may be potentially defamatory to the persona and harmful to an unsuspecting user. Furthermore, we find concerning patterns where specific entities (e.g., certain races) are targeted more than others (3x more) irrespective of the assigned persona, that reflect inherent discriminatory biases in the model. We hope that our findings inspire the broader AI community to rethink the efficacy of current safety guardrails and develop better techniques that lead to robust, safe, and trustworthy AI systems.

READ FULL TEXT

page 4

page 8

research
04/07/2023

Should ChatGPT be Biased? Challenges and Risks of Bias in Large Language Models

As the capabilities of generative language models continue to advance, t...
research
06/13/2023

Assigning AI: Seven Approaches for Students, with Prompts

This paper examines the transformative role of Large Language Models (LL...
research
03/25/2023

Can Large Language Models assist in Hazard Analysis?

Large Language Models (LLMs), such as GPT-3, have demonstrated remarkabl...
research
05/29/2023

The Utility of Large Language Models and Generative AI for Education Research

The use of natural language processing (NLP) techniques in engineering e...
research
05/13/2021

NLP is Not enough – Contextualization of User Input in Chatbots

AI chatbots have made vast strides in technology improvement in recent y...
research
02/18/2023

Recent Advances towards Safe, Responsible, and Moral Dialogue Systems: A Survey

With the development of artificial intelligence, dialogue systems have b...
research
08/05/2021

Evaluating CLIP: Towards Characterization of Broader Capabilities and Downstream Implications

Recently, there have been breakthroughs in computer vision ("CV") models...

Please sign up or login with your details

Forgot password? Click here to reset