Universality and Limitations of Prompt Tuning

05/30/2023
by   Yihan Wang, et al.
0

Despite the demonstrated empirical efficacy of prompt tuning to adapt a pretrained language model for a new task, the theoretical underpinnings of the difference between "tuning parameters before the input" against "the tuning of model weights" are limited. We thus take one of the first steps to understand the role of soft-prompt tuning for transformer-based architectures. By considering a general purpose architecture, we analyze prompt tuning from the lens of both: universal approximation and limitations with finite-depth fixed-weight pretrained transformers for continuous-valued functions. Our universality result guarantees the existence of a strong transformer with a prompt to approximate any sequence-to-sequence function in the set of Lipschitz functions. The limitations of prompt tuning for limited-depth transformers are first proved by constructing a set of datasets, that cannot be memorized by a prompt of any length for a given single encoder layer. We also provide a lower bound on the required number of tunable prompt parameters and compare the result with the number of parameters required for a low-rank update (based on LoRA) for a single-layer setting. We finally extend our analysis to multi-layer settings by providing sufficient conditions under which the transformer can at best learn datasets from invertible functions only. Our theoretical claims are also corroborated by empirical results.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/26/2022

Your Transformer May Not be as Powerful as You Expect

Relative Positional Encoding (RPE), which encodes the relative distance ...
research
07/26/2023

Are Transformers with One Layer Self-Attention Using Low-Rank Weight Matrices Universal Approximators?

Existing analyses of the expressive capacity of Transformer models have ...
research
08/10/2020

Intelligent Matrix Exponentiation

We present a novel machine learning architecture that uses the exponenti...
research
07/05/2023

Sumformer: Universal Approximation for Efficient Transformers

Natural language processing (NLP) made an impressive jump with the intro...
research
08/04/2022

Prompt Tuning for Generative Multimodal Pretrained Models

Prompt tuning has become a new paradigm for model tuning and it has demo...
research
05/30/2023

Approximation and Estimation Ability of Transformers for Sequence-to-Sequence Functions with Infinite Dimensional Input

Despite the great success of Transformer networks in various application...
research
01/30/2023

Looped Transformers as Programmable Computers

We present a framework for using transformer networks as universal compu...

Please sign up or login with your details

Forgot password? Click here to reset