Iterative Forward Tuning Boosts In-context Learning in Language Models

05/22/2023
by   Jiaxi Yang, et al.
0

Large language models (LLMs) have exhibited an emergent in-context learning (ICL) ability. However, the ICL models that can solve ordinary cases are hardly extended to solve more complex tasks by processing the demonstration examples once. This single-turn ICL is incoordinate with the decision making process of humans by learning from analogy. In this paper, we propose an effective and efficient two-stage framework to boost ICL in LLMs by exploiting a dual form between Transformer attention and gradient descent-based optimization. Concretely, we divide the ICL process into "Deep-Thinking" and inference stages. The "Deep-Thinking" stage performs iterative forward optimization of demonstrations, which is expected to boost the reasoning abilities of LLMs at test time by "thinking" demonstrations multiple times. It produces accumulated meta-gradients by manipulating the Key-Value matrices in the self-attention modules of the Transformer. Then, the inference stage only takes the test query as input without concatenating demonstrations and applies the learned meta-gradients through attention for output prediction. In this way, demonstrations are not required during the inference stage since they are already learned and stored in the definitive meta-gradients. LLMs can be effectively and efficiently adapted to downstream tasks. Extensive experiments on ten classification and multiple-choice datasets show that our method achieves substantially better performance than standard ICL in terms of both accuracy and efficiency.

READ FULL TEXT
research
12/20/2022

Why Can GPT Learn In-Context? Language Models Secretly Perform Gradient Descent as Meta-Optimizers

Large pretrained language models have shown surprising In-Context Learni...
research
07/05/2023

Scaling In-Context Demonstrations with Structured Attention

The recent surge of large language models (LLMs) highlights their abilit...
research
05/22/2023

Meta-in-context learning in large language models

Large language models have shown tremendous performance in a variety of ...
research
05/18/2023

Efficient Prompting via Dynamic In-Context Learning

The primary way of building AI applications is shifting from training sp...
research
07/15/2023

SINC: Self-Supervised In-Context Learning for Vision-Language Tasks

Large Pre-trained Transformers exhibit an intriguing capacity for in-con...
research
09/14/2023

Ambiguity-Aware In-Context Learning with Large Language Models

In-context learning (ICL) i.e. showing LLMs only a few task-specific dem...
research
12/05/2022

Meta-Learning Fast Weight Language Models

Dynamic evaluation of language models (LMs) adapts model parameters at t...

Please sign up or login with your details

Forgot password? Click here to reset