[B]. In..."> [B]. In..."> [B]. In...">

In-context Learning and Induction Heads

09/24/2022
by   Catherine Olsson, et al.
8

"Induction heads" are attention heads that implement a simple algorithm to complete token sequences like [A][B] ... [A] -> [B]. In this work, we present preliminary and indirect evidence for a hypothesis that induction heads might constitute the mechanism for the majority of all "in-context learning" in large transformer models (i.e. decreasing loss at increasing token indices). We find that induction heads develop at precisely the same point as a sudden sharp increase in in-context learning ability, visible as a bump in the training loss. We present six complementary lines of evidence, arguing that induction heads may be the mechanistic source of general in-context learning in transformer models of any size. For small attention-only models, we present strong, causal evidence; for larger models with MLPs, we present correlational evidence.

READ FULL TEXT

page 10

page 21

page 30

page 32

page 34

page 36

page 37

page 38

research
12/18/2022

Rethinking the Role of Scale for In-Context Learning: An Interpretability-based Case Study at 66 Billion Scale

Language models have been shown to perform better with an increase in sc...
research
05/17/2021

Induction and Skolemization in saturation theorem proving

We consider a typical integration of induction in saturation-based theor...
research
02/14/2022

CodeFill: Multi-token Code Completion by Jointly Learning from Structure and Naming Sequences

Code completion is an essential feature of IDEs, yet current autocomplet...
research
08/14/2020

Induction Models on ℕ

Mathematical induction is a fundamental tool in computer science and mat...
research
06/09/2023

Positivity certificates for linear recurrences

We show that for solutions of linear recurrences with polynomial coeffic...
research
05/13/2022

A Study of the Attention Abnormality in Trojaned BERTs

Trojan attacks raise serious security concerns. In this paper, we invest...
research
05/19/1999

Inducing a Semantically Annotated Lexicon via EM-Based Clustering

We present a technique for automatic induction of slot annotations for s...

Please sign up or login with your details

Forgot password? Click here to reset