Larger language models do in-context learning differently

by   Jerry Wei, et al.

We study how in-context learning (ICL) in language models is affected by semantic priors versus input-label mappings. We investigate two setups-ICL with flipped labels and ICL with semantically-unrelated labels-across various model families (GPT-3, InstructGPT, Codex, PaLM, and Flan-PaLM). First, experiments on ICL with flipped labels show that overriding semantic priors is an emergent ability of model scale. While small language models ignore flipped labels presented in-context and thus rely primarily on semantic priors from pretraining, large models can override semantic priors when presented with in-context exemplars that contradict priors, despite the stronger semantic priors that larger models may hold. We next study semantically-unrelated label ICL (SUL-ICL), in which labels are semantically unrelated to their inputs (e.g., foo/bar instead of negative/positive), thereby forcing language models to learn the input-label mappings shown in in-context exemplars in order to perform the task. The ability to do SUL-ICL also emerges primarily with scale, and large-enough language models can even perform linear classification in a SUL-ICL setting. Finally, we evaluate instruction-tuned models and find that instruction tuning strengthens both the use of semantic priors and the capacity to learn input-label mappings, but more of the former.


page 16

page 18

page 19

page 23


Symbol tuning improves in-context learning in language models

We present symbol tuning - finetuning language models on in-context inpu...

In-Context Learning in Large Language Models Learns Label Relationships but Is Not Conventional Learning

The performance of Large Language Models (LLMs) on downstream tasks ofte...

Large Language Models Can Be Easily Distracted by Irrelevant Context

Large language models have achieved impressive performance on various na...

Probing in Context: Toward Building Robust Classifiers via Probing Large Language Models

Large language models are able to learn new tasks in context, where they...

Small Language Models for Tabular Data

Supervised deep learning is most commonly applied to difficult problems ...

Deep Residual Output Layers for Neural Language Generation

Many tasks, including language generation, benefit from learning the str...

Investigating the Learning Behaviour of In-context Learning: A Comparison with Supervised Learning

Large language models (LLMs) have shown remarkable capacity for in-conte...

Please sign up or login with your details

Forgot password? Click here to reset