A Mechanism for Sample-Efficient In-Context Learning for Sparse Retrieval Tasks

05/26/2023
by   Jacob Abernethy, et al.
0

We study the phenomenon of in-context learning (ICL) exhibited by large language models, where they can adapt to a new learning task, given a handful of labeled examples, without any explicit parameter optimization. Our goal is to explain how a pre-trained transformer model is able to perform ICL under reasonable assumptions on the pre-training process and the downstream tasks. We posit a mechanism whereby a transformer can achieve the following: (a) receive an i.i.d. sequence of examples which have been converted into a prompt using potentially-ambiguous delimiters, (b) correctly segment the prompt into examples and labels, (c) infer from the data a sparse linear regressor hypothesis, and finally (d) apply this hypothesis on the given test example and return a predicted label. We establish that this entire procedure is implementable using the transformer mechanism, and we give sample complexity guarantees for this learning framework. Our empirical findings validate the challenge of segmentation, and we show a correspondence between our posited mechanisms and observed attention maps for step (c).

READ FULL TEXT
research
07/23/2023

In-Context Learning in Large Language Models Learns Label Relationships but Is Not Conventional Learning

The performance of Large Language Models (LLMs) on downstream tasks ofte...
research
09/21/2021

Survey: Transformer based Video-Language Pre-training

Inspired by the success of transformer-based pre-training methods on nat...
research
07/03/2023

Trainable Transformer in Transformer

Recent works attribute the capability of in-context learning (ICL) in la...
research
11/05/2022

Privacy-Preserving Models for Legal Natural Language Processing

Pre-training large transformer models with in-domain data improves domai...
research
06/07/2023

Transformers as Statisticians: Provable In-Context Learning with In-Context Algorithm Selection

Neural sequence models based on the transformer architecture have demons...
research
12/18/2022

Rethinking the Role of Scale for In-Context Learning: An Interpretability-based Case Study at 66 Billion Scale

Language models have been shown to perform better with an increase in sc...
research
10/06/2022

Generalization Properties of Retrieval-based Models

Many modern high-performing machine learning models such as GPT-3 primar...

Please sign up or login with your details

Forgot password? Click here to reset