On the Role of Attention in Prompt-tuning

06/06/2023
by   Samet Oymak, et al.
0

Prompt-tuning is an emerging strategy to adapt large language models (LLM) to downstream tasks by learning a (soft-)prompt parameter from data. Despite its success in LLMs, there is limited theoretical understanding of the power of prompt-tuning and the role of the attention mechanism in prompting. In this work, we explore prompt-tuning for one-layer attention architectures and study contextual mixture-models where each input token belongs to a context-relevant or -irrelevant set. We isolate the role of prompt-tuning through a self-contained prompt-attention model. Our contributions are as follows: (1) We show that softmax-prompt-attention is provably more expressive than softmax-self-attention and linear-prompt-attention under our contextual data model. (2) We analyze the initial trajectory of gradient descent and show that it learns the prompt and prediction head with near-optimal sample complexity and demonstrate how prompt can provably attend to sparse context-relevant tokens. (3) Assuming a known prompt but an unknown prediction head, we characterize the exact finite sample performance of prompt-attention which reveals the fundamental performance limits and the precise benefit of the context information. We also provide experiments that verify our theoretical insights on real datasets and demonstrate how prompt-tuning enables the model to attend to context-relevant information.

READ FULL TEXT
research
04/26/2023

The Closeness of In-Context Learning and Weight Shifting for Softmax Regression

Large language models (LLMs) are known for their exceptional performance...
research
02/12/2023

A Theoretical Understanding of shallow Vision Transformers: Learning, Generalization, and Sample Complexity

Vision Transformers (ViTs) with self-attention modules have recently ach...
research
06/23/2023

Max-Margin Token Selection in Attention Mechanism

Attention mechanism is a central component of the transformer architectu...
research
04/18/2021

The Power of Scale for Parameter-Efficient Prompt Tuning

In this work, we explore "prompt tuning", a simple yet effective mechani...
research
10/11/2022

LARF: Two-level Attention-based Random Forests with a Mixture of Contamination Models

New models of the attention-based random forests called LARF (Leaf Atten...
research
02/17/2020

Low-Rank Bottleneck in Multi-head Attention Models

Attention based Transformer architecture has enabled significant advance...

Please sign up or login with your details

Forgot password? Click here to reset