What Can Transformers Learn In-Context? A Case Study of Simple Function Classes

08/01/2022
by   Shivam Garg, et al.
0

In-context learning refers to the ability of a model to condition on a prompt sequence consisting of in-context examples (input-output pairs corresponding to some task) along with a new query input, and generate the corresponding output. Crucially, in-context learning happens only at inference time without any parameter updates to the model. While large language models such as GPT-3 exhibit some ability to perform in-context learning, it is unclear what the relationship is between tasks on which this succeeds and what is present in the training data. To make progress towards understanding in-context learning, we consider the well-defined problem of training a model to in-context learn a function class (e.g., linear functions): that is, given data derived from some functions in the class, can we train a model to in-context learn "most" functions from this class? We show empirically that standard Transformers can be trained from scratch to perform in-context learning of linear functions – that is, the trained model is able to learn unseen linear functions from in-context examples with performance comparable to the optimal least squares estimator. In fact, in-context learning is possible even under two forms of distribution shift: (i) between the training data of the model and inference-time prompts, and (ii) between the in-context examples and the query input during inference. We also show that we can train Transformers to in-context learn more complex function classes – namely sparse linear functions, two-layer neural networks, and decision trees – with performance that matches or exceeds task-specific learning algorithms. Our code and models are available at https://github.com/dtsip/in-context-learning .

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/08/2023

In-Context Learning through the Bayesian Prism

In-context learning is one of the surprising and useful features of larg...
research
05/26/2023

A Closer Look at In-Context Learning under Distribution Shifts

In-context learning, a capability that enables a model to learn from inp...
research
08/15/2023

Link-Context Learning for Multimodal LLMs

The ability to learn from context with novel concepts, and deliver appro...
research
06/27/2023

One-class systems seamlessly fit in the forward-forward algorithm

The forward-forward algorithm presents a new method of training neural n...
research
05/24/2023

Can Transformers Learn to Solve Problems Recursively?

Neural networks have in recent years shown promise for helping software ...
research
01/17/2023

Transformers as Algorithms: Generalization and Implicit Model Selection in In-context Learning

In-context learning (ICL) is a type of prompting where a transformer mod...
research
09/16/2016

Learning Opposites Using Neural Networks

Many research works have successfully extended algorithms such as evolut...

Please sign up or login with your details

Forgot password? Click here to reset