DePT: Decomposed Prompt Tuning for Parameter-Efficient Fine-tuning

09/11/2023
by   Zhengxiang Shi, et al.
0

Prompt tuning (PT), where a small amount of trainable soft (continuous) prompt vectors is affixed to the input of language models (LM), has shown promising results across various tasks and models for parameter-efficient fine-tuning (PEFT). PT stands out from other PEFT approaches because it maintains competitive performance with fewer trainable parameters and does not drastically scale up its parameters as the model size expands. However, PT introduces additional soft prompt tokens, leading to longer input sequences, which significantly impacts training and inference time and memory usage due to the Transformer's quadratic complexity. Particularly concerning for Large Language Models (LLMs) that face heavy daily querying. To address this issue, we propose Decomposed Prompt Tuning (DePT), which decomposes the soft prompt into a shorter soft prompt and a pair of low-rank matrices that are then optimised with two different learning rates. This allows DePT to achieve better performance while saving over 20 and its variants, without changing trainable parameter sizes. Through extensive experiments on 23 natural language processing (NLP) and vision-language (VL) tasks, we demonstrate that DePT outperforms state-of-the-art PEFT approaches, including the full fine-tuning baseline in some scenarios. Additionally, we empirically show that DEPT grows more efficient as the model size increases. Our further study reveals that DePT integrates seamlessly with parameter-efficient transfer learning in the few-shot learning setting and highlights its adaptability to various model architectures and sizes.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/16/2023

Full Parameter Fine-tuning for Large Language Models with Limited Resources

Large Language Models (LLMs) have revolutionized Natural Language Proces...
research
08/07/2023

LoRA-FA: Memory-efficient Low-rank Adaptation for Large Language Models Fine-tuning

The low-rank adaptation (LoRA) method can largely reduce the amount of t...
research
10/10/2022

XPrompt: Exploring the Extreme of Prompt Tuning

Prompt tuning learns soft prompts to condition frozen Pre-trained Langua...
research
05/26/2023

Do We Really Need a Large Number of Visual Prompts?

Due to increasing interest in adapting models on resource-constrained ed...
research
05/24/2023

READ: Recurrent Adaptation of Large Transformers

Fine-tuning large-scale Transformers has led to the explosion of many AI...
research
12/31/2022

Rethinking with Retrieval: Faithful Large Language Model Inference

Despite the success of large language models (LLMs) in various natural l...
research
08/09/2023

Optimizing a Transformer-based network for a deep learning seismic processing workflow

StorSeismic is a recently introduced model based on the Transformer to a...

Please sign up or login with your details

Forgot password? Click here to reset