Reliability Check: An Analysis of GPT-3's Response to Sensitive Topics and Prompt Wording

06/09/2023
by   Aisha Khatun, et al.
0

Large language models (LLMs) have become mainstream technology with their versatile use cases and impressive performance. Despite the countless out-of-the-box applications, LLMs are still not reliable. A lot of work is being done to improve the factual accuracy, consistency, and ethical standards of these models through fine-tuning, prompting, and Reinforcement Learning with Human Feedback (RLHF), but no systematic analysis of the responses of these models to different categories of statements, or on their potential vulnerabilities to simple prompting changes is available. In this work, we analyze what confuses GPT-3: how the model responds to certain sensitive topics and what effects the prompt wording has on the model response. We find that GPT-3 correctly disagrees with obvious Conspiracies and Stereotypes but makes mistakes with common Misconceptions and Controversies. The model responses are inconsistent across prompts and settings, highlighting GPT-3's unreliability. Dataset and code of our analysis is available in https://github.com/tanny411/GPT3-Reliability-Check.

READ FULL TEXT
research
12/15/2021

Interscript: A dataset for interactive learning of scripts through error feedback

How can an end-user provide feedback if a deployed structured prediction...
research
07/13/2023

Negated Complementary Commonsense using Large Language Models

Larger language models, such as GPT-3, have shown to be excellent in man...
research
08/30/2023

Peering Through Preferences: Unraveling Feedback Acquisition for Aligning Large Language Models

Aligning large language models (LLMs) with human values and intents crit...
research
06/06/2023

I'm Afraid I Can't Do That: Predicting Prompt Refusal in Black-Box Generative Language Models

Since the release of OpenAI's ChatGPT, generative language models have a...
research
05/04/2023

Principle-Driven Self-Alignment of Language Models from Scratch with Minimal Human Supervision

Recent AI-assistant agents, such as ChatGPT, predominantly rely on super...
research
06/15/2023

Voting Booklet Bias: Stance Detection in Swiss Federal Communication

In this study, we use recent stance detection methods to study the stanc...
research
04/26/2023

The Internal State of an LLM Knows When its Lying

While Large Language Models (LLMs) have shown exceptional performance in...

Please sign up or login with your details

Forgot password? Click here to reset