Despite "super-human" performance, current LLMs are unsuited for decisions about ethics and safety

12/13/2022
by   Joshua Albrecht, et al.
0

Large language models (LLMs) have exploded in popularity in the past few years and have achieved undeniably impressive results on benchmarks as varied as question answering and text summarization. We provide a simple new prompting strategy that leads to yet another supposedly "super-human" result, this time outperforming humans at common sense ethical reasoning (as measured by accuracy on a subset of the ETHICS dataset). Unfortunately, we find that relying on average performance to judge capabilities can be highly misleading. LLM errors differ systematically from human errors in ways that make it easy to craft adversarial examples, or even perturb existing examples to flip the output label. We also observe signs of inverse scaling with model size on some examples, and show that prompting models to "explain their reasoning" often leads to alarming justifications of unethical actions. Our results highlight how human-like performance does not necessarily imply human-like understanding or reasoning.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/18/2019

Conversational AI : Open Domain Question Answering and Commonsense Reasoning

Our research is focused on making a human-like question answering system...
research
01/27/2023

Understanding the Effectiveness of Very Large Language Models on Dialog Evaluation

Language models have steadily increased in size over the past few years....
research
09/07/2018

Trick Me If You Can: Adversarial Writing of Trivia Challenge Questions

Modern natural language processing systems have been touted as approachi...
research
10/06/2022

ReAct: Synergizing Reasoning and Acting in Language Models

While large language models (LLMs) have demonstrated impressive capabili...
research
09/11/2023

Evaluating the Deductive Competence of Large Language Models

The development of highly fluent large language models (LLMs) has prompt...
research
12/07/2022

Discovering Latent Knowledge in Language Models Without Supervision

Existing techniques for training language models can be misaligned with ...
research
02/23/2018

Semantic Vector Spaces for Broadening Consideration of Consequences

Reasoning systems with too simple a model of the world and human intent ...

Please sign up or login with your details

Forgot password? Click here to reset