Transformers generalize differently from information stored in context vs in weights

10/11/2022
by   Stephanie C. Y. Chan, et al.
0

Transformer models can use two fundamentally different kinds of information: information stored in weights during training, and information provided “in-context” at inference time. In this work, we show that transformers exhibit different inductive biases in how they represent and generalize from the information in these two sources. In particular, we characterize whether they generalize via parsimonious rules (rule-based generalization) or via direct comparison with observed examples (exemplar-based generalization). This is of important practical consequence, as it informs whether to encode information in weights or in context, depending on how we want models to use that information. In transformers trained on controlled stimuli, we find that generalization from weights is more rule-based whereas generalization from context is largely exemplar-based. In contrast, we find that in transformers pre-trained on natural language, in-context learning is significantly rule-based, with larger models showing more rule-basedness. We hypothesise that rule-based generalization from in-context information might be an emergent consequence of large-scale training on language, which has sparse rule-like structure. Using controlled stimuli, we verify that transformers pretrained on data containing sparse rule-like structure exhibit more rule-based generalization.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/07/2017

When rule-based models need to count

Rule-based modelers dislike direct enumeration of cases when more effici...
research
06/20/2023

Blackbird language matrices (BLM), a new task for rule-like generalization in neural networks: Motivations and Formal Specifications

We motivate and formally define a new task for fine-tuning rule-like gen...
research
12/01/2021

Systematic Generalization with Edge Transformers

Recent research suggests that systematic generalization in natural langu...
research
08/14/2023

Development and Evaluation of Three Chatbots for Postpartum Mood and Anxiety Disorders

In collaboration with Postpartum Support International (PSI), a non-prof...
research
07/18/2023

J. B. S. Haldane's Rule of Succession

After Bayes, the oldest Bayesian account of enumerative induction is giv...
research
10/08/2021

Distinguishing rule- and exemplar-based generalization in learning systems

Despite the increasing scale of datasets in machine learning, generaliza...
research
01/17/2023

Transformers as Algorithms: Generalization and Implicit Model Selection in In-context Learning

In-context learning (ICL) is a type of prompting where a transformer mod...

Please sign up or login with your details

Forgot password? Click here to reset