Pretraining task diversity and the emergence of non-Bayesian in-context learning for regression

06/26/2023
by   Allan Raventos, et al.
0

Pretrained transformers exhibit the remarkable ability of in-context learning (ICL): they can learn tasks from just a few examples provided in the prompt without updating any weights. This raises a foundational question: can ICL solve fundamentally new tasks that are very different from those seen during pretraining? To probe this question, we examine ICL's performance on linear regression while varying the diversity of tasks in the pretraining dataset. We empirically demonstrate a task diversity threshold for the emergence of ICL. Below this threshold, the pretrained transformer cannot solve unseen regression tasks as it behaves like a Bayesian estimator with the non-diverse pretraining task distribution as the prior. Beyond this threshold, the transformer significantly outperforms this estimator; its behavior aligns with that of ridge regression, corresponding to a Gaussian prior over all tasks, including those not seen during pretraining. These results highlight that, when pretrained on data with task diversity greater than the threshold, transformers can solve fundamentally new tasks in-context. Importantly, this capability hinges on it deviating from the Bayes optimal estimator with the pretraining distribution as the prior. This study underscores, in a concrete example, the critical role of task diversity, alongside data and model scale, in the emergence of ICL. Code is available at https://github.com/mansheej/icl-task-diversity.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/26/2023

Supervised Pretraining Can Learn In-Context Reinforcement Learning

Large transformer models trained on diverse datasets have shown a remark...
research
07/21/2022

TinyViT: Fast Pretraining Distillation for Small Vision Transformers

Vision transformer (ViT) recently has drawn great attention in computer ...
research
03/09/2021

Pretrained Transformers as Universal Computation Engines

We investigate the capability of a transformer pretrained on natural lan...
research
11/03/2021

An Explanation of In-context Learning as Implicit Bayesian Inference

Large pretrained language models such as GPT-3 have the surprising abili...
research
06/08/2023

In-Context Learning through the Bayesian Prism

In-context learning is one of the surprising and useful features of larg...
research
06/07/2023

Transformers as Statisticians: Provable In-Context Learning with In-Context Algorithm Selection

Neural sequence models based on the transformer architecture have demons...
research
06/05/2020

Funnel-Transformer: Filtering out Sequential Redundancy for Efficient Language Processing

With the success of language pretraining, it is highly desirable to deve...

Please sign up or login with your details

Forgot password? Click here to reset