A Theory of Emergent In-Context Learning as Implicit Structure Induction

03/14/2023
by   Michael Hahn, et al.
1

Scaling large language models (LLMs) leads to an emergent capacity to learn in-context from example demonstrations. Despite progress, theoretical understanding of this phenomenon remains limited. We argue that in-context learning relies on recombination of compositional operations found in natural language data. We derive an information-theoretic bound showing how in-context learning abilities arise from generic next-token prediction when the pretraining distribution has sufficient amounts of compositional structure, under linguistically motivated assumptions. A second bound provides a theoretical justification for the empirical success of prompting LLMs to output intermediate steps towards an answer. To validate theoretical predictions, we introduce a controlled setup for inducing in-context learning; unlike previous approaches, it accounts for the compositional nature of language. Trained transformers can perform in-context learning for a range of tasks, in a manner consistent with the theoretical results. Mirroring real-world LLMs in a miniature setup, in-context learning emerges when scaling parameters and data, and models perform better when prompted to output intermediate steps. Probing shows that in-context learning is supported by a representation of the input's compositional structure. Taken together, these results provide a step towards theoretical understanding of emergent behavior in large language models.

READ FULL TEXT

page 21

page 32

page 33

page 34

page 41

research
03/14/2023

The Learnability of In-Context Learning

In-context learning is a surprising and important phenomenon that emerge...
research
05/30/2023

Dissecting Chain-of-Thought: A Study on Compositional In-Context Learning of MLPs

Chain-of-thought (CoT) is a method that enables language models to handl...
research
05/22/2023

Meta-in-context learning in large language models

Large language models have shown tremendous performance in a variety of ...
research
07/18/2023

Overthinking the Truth: Understanding how Language Models Process False Demonstrations

Modern language models can imitate complex patterns through few-shot lea...
research
12/13/2022

Diverse Demonstrations Improve In-context Compositional Generalization

In-context learning has shown great success in i.i.d semantic parsing sp...
research
04/06/2022

Sub-Task Decomposition Enables Learning in Sequence to Sequence Tasks

The field of Natural Language Processing (NLP) has experienced a dramati...
research
06/01/2023

Birth of a Transformer: A Memory Viewpoint

Large language models based on transformers have achieved great empirica...

Please sign up or login with your details

Forgot password? Click here to reset