Transformers as Algorithms: Generalization and Implicit Model Selection in In-context Learning

01/17/2023
by   Yingcong Li, et al.
0

In-context learning (ICL) is a type of prompting where a transformer model operates on a sequence of (input, output) examples and performs inference on-the-fly. This implicit training is in contrast to explicitly tuning the model weights based on examples. In this work, we formalize in-context learning as an algorithm learning problem, treating the transformer model as a learning algorithm that can be specialized via training to implement-at inference-time-another target algorithm. We first explore the statistical aspects of this abstraction through the lens of multitask learning: We obtain generalization bounds for ICL when the input prompt is (1) a sequence of i.i.d. (input, label) pairs or (2) a trajectory arising from a dynamical system. The crux of our analysis is relating the excess risk to the stability of the algorithm implemented by the transformer, which holds under mild assumptions. Secondly, we use our abstraction to show that transformers can act as an adaptive learning algorithm and perform model selection across different hypothesis classes. We provide numerical evaluations that (1) demonstrate transformers can indeed implement near-optimal algorithms on classical regression problems with i.i.d. and dynamic data, (2) identify an inductive bias phenomenon where the transfer risk on unseen tasks is independent of the transformer complexity, and (3) empirically verify our theoretical predictions.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/07/2023

Transformers as Statisticians: Provable In-Context Learning with In-Context Algorithm Selection

Neural sequence models based on the transformer architecture have demons...
research
06/08/2023

In-Context Learning through the Bayesian Prism

In-context learning is one of the surprising and useful features of larg...
research
08/01/2022

What Can Transformers Learn In-Context? A Case Study of Simple Function Classes

In-context learning refers to the ability of a model to condition on a p...
research
08/16/2023

Can Transformers Learn Optimal Filtering for Unknown Systems?

Transformers have demonstrated remarkable success in natural language pr...
research
10/11/2022

Transformers generalize differently from information stored in context vs in weights

Transformer models can use two fundamentally different kinds of informat...
research
11/28/2022

What learning algorithm is in-context learning? Investigations with linear models

Neural sequence models, especially transformers, exhibit a remarkable ca...
research
10/08/2021

Iterative Decoding for Compositional Generalization in Transformers

Deep learning models do well at generalizing to in-distribution data but...

Please sign up or login with your details

Forgot password? Click here to reset