In-Context Learning through the Bayesian Prism

06/08/2023
by   Kabir Ahuja, et al.
0

In-context learning is one of the surprising and useful features of large language models. How it works is an active area of research. Recently, stylized meta-learning-like setups have been devised that train these models on a sequence of input-output pairs (x, f(x)) from a function class using the language modeling loss and observe generalization to unseen functions from the same class. One of the main discoveries in this line of research has been that for several problems such as linear regression, trained transformers learn algorithms for learning functions in context. However, the inductive biases of these models resulting in this behavior are not clearly understood. A model with unlimited training data and compute is a Bayesian predictor: it learns the pretraining distribution. It has been shown that high-capacity transformers mimic the Bayesian predictor for linear regression. In this paper, we show empirical evidence of transformers exhibiting the behavior of this ideal learner across different linear and non-linear function classes. We also extend the previous setups to work in the multitask setting and verify that transformers can do in-context learning in this setup as well and the Bayesian perspective sheds light on this setting also. Finally, via the example of learning Fourier series, we study the inductive bias for in-context learning. We find that in-context learning may or may not have simplicity bias depending on the pretraining data distribution.

READ FULL TEXT

page 10

page 29

page 31

research
08/01/2022

What Can Transformers Learn In-Context? A Case Study of Simple Function Classes

In-context learning refers to the ability of a model to condition on a p...
research
06/16/2023

Trained Transformers Learn Linear Models In-Context

Attention-based neural networks such as transformers have demonstrated a...
research
01/17/2023

Transformers as Algorithms: Generalization and Implicit Model Selection in In-context Learning

In-context learning (ICL) is a type of prompting where a transformer mod...
research
06/26/2023

Pretraining task diversity and the emergence of non-Bayesian in-context learning for regression

Pretrained transformers exhibit the remarkable ability of in-context lea...
research
05/30/2023

Dissecting Chain-of-Thought: A Study on Compositional In-Context Learning of MLPs

Chain-of-thought (CoT) is a method that enables language models to handl...
research
07/07/2023

One Step of Gradient Descent is Provably the Optimal In-Context Learner with One Layer of Linear Self-Attention

Recent works have empirically analyzed in-context learning and shown tha...
research
05/26/2023

A Closer Look at In-Context Learning under Distribution Shifts

In-context learning, a capability that enables a model to learn from inp...

Please sign up or login with your details

Forgot password? Click here to reset