Are Language Models Worse than Humans at Following Prompts? It's Complicated

01/17/2023
by   Albert Webson, et al.
0

Prompts have been the center of progress in advancing language models' zero-shot and few-shot performance. However, recent work finds that models can perform surprisingly well when given intentionally irrelevant or misleading prompts. Such results may be interpreted as evidence that model behavior is not "human like". In this study, we challenge a central assumption in such work: that humans would perform badly when given pathological instructions. We find that humans are able to reliably ignore irrelevant instructions and thus, like models, perform well on the underlying task despite an apparent lack of signal regarding the task they are being asked to do. However, when given deliberately misleading instructions, humans follow the instructions faithfully, whereas models do not. Thus, our conclusion is mixed with respect to prior work. We argue against the earlier claim that high performance with irrelevant prompts constitutes evidence against models' instruction understanding, but we reinforce the claim that models' failure to follow misleading instructions raises concerns. More broadly, we caution that future research should not idealize human behaviors as a monolith and should not train or evaluate models to mimic assumptions about these behaviors without first validating humans' behaviors empirically.

READ FULL TEXT

page 20

page 24

page 25

page 27

page 28

research
10/22/2020

The Turking Test: Can Language Models Understand Instructions?

Supervised machine learning provides the learner with a set of input-out...
research
02/19/2020

BB_Evac: Fast Location-Sensitive Behavior-Based Building Evacuation

Past work on evacuation planning assumes that evacuees will follow instr...
research
10/27/2022

Can language models handle recursively nested grammatical structures? A case study on comparing models and humans

How should we compare the capabilities of language models and humans? He...
research
11/03/2022

Large Language Models Are Human-Level Prompt Engineers

By conditioning on natural language instructions, large language models ...
research
03/14/2022

GrIPS: Gradient-free, Edit-based Instruction Search for Prompting Large Language Models

Providing natural language instructions in prompts is a useful new parad...
research
09/02/2021

Do Prompt-Based Models Really Understand the Meaning of their Prompts?

Recently, a boom of papers have shown extraordinary progress in few-shot...
research
03/16/2022

Less is More: Summary of Long Instructions is Better for Program Synthesis

Despite the success of large pre-trained language models (LMs) such as C...

Please sign up or login with your details

Forgot password? Click here to reset