Larger language models do in-context learning differently

03/07/2023
by   Jerry Wei, et al.
0

We study how in-context learning (ICL) in language models is affected by semantic priors versus input-label mappings. We investigate two setups-ICL with flipped labels and ICL with semantically-unrelated labels-across various model families (GPT-3, InstructGPT, Codex, PaLM, and Flan-PaLM). First, experiments on ICL with flipped labels show that overriding semantic priors is an emergent ability of model scale. While small language models ignore flipped labels presented in-context and thus rely primarily on semantic priors from pretraining, large models can override semantic priors when presented with in-context exemplars that contradict priors, despite the stronger semantic priors that larger models may hold. We next study semantically-unrelated label ICL (SUL-ICL), in which labels are semantically unrelated to their inputs (e.g., foo/bar instead of negative/positive), thereby forcing language models to learn the input-label mappings shown in in-context exemplars in order to perform the task. The ability to do SUL-ICL also emerges primarily with scale, and large-enough language models can even perform linear classification in a SUL-ICL setting. Finally, we evaluate instruction-tuned models and find that instruction tuning strengthens both the use of semantic priors and the capacity to learn input-label mappings, but more of the former.

READ FULL TEXT

page 16

page 18

page 19

page 23

research
05/15/2023

Symbol tuning improves in-context learning in language models

We present symbol tuning - finetuning language models on in-context inpu...
research
07/23/2023

In-Context Learning in Large Language Models Learns Label Relationships but Is Not Conventional Learning

The performance of Large Language Models (LLMs) on downstream tasks ofte...
research
01/31/2023

Large Language Models Can Be Easily Distracted by Irrelevant Context

Large language models have achieved impressive performance on various na...
research
05/23/2023

Probing in Context: Toward Building Robust Classifiers via Probing Large Language Models

Large language models are able to learn new tasks in context, where they...
research
11/05/2022

Small Language Models for Tabular Data

Supervised deep learning is most commonly applied to difficult problems ...
research
05/14/2019

Deep Residual Output Layers for Neural Language Generation

Many tasks, including language generation, benefit from learning the str...
research
07/28/2023

Investigating the Learning Behaviour of In-context Learning: A Comparison with Supervised Learning

Large language models (LLMs) have shown remarkable capacity for in-conte...

Please sign up or login with your details

Forgot password? Click here to reset