[B]. In..."> [B]. In..."> [B]. In...">
DeepAI
Log In Sign Up

In-context Learning and Induction Heads

09/24/2022
by   Catherine Olsson, et al.
8

"Induction heads" are attention heads that implement a simple algorithm to complete token sequences like [A][B] ... [A] -> [B]. In this work, we present preliminary and indirect evidence for a hypothesis that induction heads might constitute the mechanism for the majority of all "in-context learning" in large transformer models (i.e. decreasing loss at increasing token indices). We find that induction heads develop at precisely the same point as a sudden sharp increase in in-context learning ability, visible as a bump in the training loss. We present six complementary lines of evidence, arguing that induction heads may be the mechanistic source of general in-context learning in transformer models of any size. For small attention-only models, we present strong, causal evidence; for larger models with MLPs, we present correlational evidence.

READ FULL TEXT

page 10

page 21

page 30

page 32

page 34

page 36

page 37

page 38

12/18/2022

Rethinking the Role of Scale for In-Context Learning: An Interpretability-based Case Study at 66 Billion Scale

Language models have been shown to perform better with an increase in sc...
05/17/2021

Induction and Skolemization in saturation theorem proving

We consider a typical integration of induction in saturation-based theor...
05/17/2021

(Deep) Induction Rules for GADTs

Deep data types are those that are defined in terms of other such data t...
02/14/2022

CodeFill: Multi-token Code Completion by Jointly Learning from Structure and Naming Sequences

Code completion is an essential feature of IDEs, yet current autocomplet...
08/14/2020

Induction Models on ℕ

Mathematical induction is a fundamental tool in computer science and mat...
05/26/2020

Guiding Symbolic Natural Language Grammar Induction via Transformer-Based Sequence Probabilities

A novel approach to automated learning of syntactic rules governing natu...
05/19/1999

Inducing a Semantically Annotated Lexicon via EM-Based Clustering

We present a technique for automatic induction of slot annotations for s...