Can Large Language Models Truly Understand Prompts? A Case Study with Negated Prompts

09/26/2022
by   Joel Jang, et al.
9

Previous work has shown that there exists a scaling law between the size of Language Models (LMs) and their zero-shot performance on different downstream NLP tasks. In this work, we show that this phenomenon does not hold when evaluating large LMs on tasks with negated prompts, but instead shows an inverse scaling law. We evaluate 9 different tasks with negated prompts on (1) pretrained LMs (OPT GPT-3) of varying sizes (125M - 175B), (2) LMs further pretrained to generalize to novel prompts (InstructGPT), (3) LMs provided with few-shot examples, and (4) LMs fine-tuned specifically on negated prompts; all LM types perform worse on negated prompts as they scale and show a huge performance gap between the human performance when comparing the average score on both original and negated prompts. By highlighting a critical limitation of existing LMs and methods, we urge the community to develop new approaches of developing LMs that actually follow the given instructions. We provide the code and the datasets to explore negated prompts at https://github.com/joeljang/negated-prompts-for-llms

READ FULL TEXT

page 5

page 9

page 10

research
10/21/2022

Z-LaVI: Zero-Shot Language Solver Fueled by Visual Imagination

Large-scale pretrained language models have made significant advances in...
research
06/12/2023

Probing Quantifier Comprehension in Large Language Models

With their increasing size, Large language models (LLMs) are becoming in...
research
10/26/2022

Broken Neural Scaling Laws

We present a smoothly broken power law functional form that accurately m...
research
05/24/2023

Do LLMs Understand Social Knowledge? Evaluating the Sociability of Large Language Models with SocKET Benchmark

Large language models (LLMs) have been shown to perform well at a variet...
research
06/01/2023

Systematic Evaluation of GPT-3 for Zero-Shot Personality Estimation

Very large language models (LLMs) perform extremely well on a spectrum o...
research
04/18/2021

Fantastically Ordered Prompts and Where to Find Them: Overcoming Few-Shot Prompt Order Sensitivity

When primed with only a handful of training samples, very large pretrain...
research
04/19/2023

Is ChatGPT Good at Search? Investigating Large Language Models as Re-Ranking Agent

Large Language Models (LLMs) have demonstrated a remarkable ability to g...

Please sign up or login with your details

Forgot password? Click here to reset