Multitask Vision-Language Prompt Tuning

11/21/2022
by   Sheng Shen, et al.
0

Prompt Tuning, conditioning on task-specific learned prompt vectors, has emerged as a data-efficient and parameter-efficient method for adapting large pretrained vision-language models to multiple downstream tasks. However, existing approaches usually consider learning prompt vectors for each task independently from scratch, thereby failing to exploit the rich shareable knowledge across different vision-language tasks. In this paper, we propose multitask vision-language prompt tuning (MVLPT), which incorporates cross-task knowledge into prompt tuning for vision-language models. Specifically, (i) we demonstrate the effectiveness of learning a single transferable prompt from multiple source tasks to initialize the prompt for each target task; (ii) we show many target tasks can benefit each other from sharing prompt vectors and thus can be jointly learned via multitask prompt tuning. We benchmark the proposed MVLPT using three representative prompt tuning methods, namely text prompt tuning, visual prompt tuning, and the unified vision-language prompt tuning. Results in 20 vision tasks demonstrate that the proposed approach outperforms all single-task baseline prompt tuning methods, setting the new state-of-the-art on the few-shot ELEVATER benchmarks and cross-task generalization benchmarks. To understand where the cross-task knowledge is most effective, we also conduct a large-scale study on task transferability with 20 vision tasks in 400 combinations for each prompt tuning method. It shows that the most performant MVLPT for each prompt tuning method prefers different task combinations and many tasks can benefit each other, depending on their visual similarity and label similarity. Code is available at https://github.com/sIncerass/MVLPT.

READ FULL TEXT
research
03/06/2023

Multitask Prompt Tuning Enables Parameter-Efficient Transfer Learning

Prompt tuning, in which a base pretrained model is adapted to each task ...
research
08/29/2022

Prompt Tuning with Soft Context Sharing for Vision-Language Models

Vision-language models have recently shown great potential on many compu...
research
10/13/2022

Unified Vision and Language Prompt Learning

Prompt tuning, a parameter- and data-efficient transfer learning paradig...
research
11/21/2022

Understanding and Improving Visual Prompting: A Label-Mapping Perspective

We revisit and advance visual prompting (VP), an input prompting techniq...
research
12/01/2022

Data-Efficient Finetuning Using Cross-Task Nearest Neighbors

Language models trained on massive prompted multitask datasets like T0 (...
research
09/14/2023

DePT: Decoupled Prompt Tuning

This work breaks through the Base-New Tradeoff (BNT)dilemma in prompt tu...
research
04/10/2023

Exploring Effective Factors for Improving Visual In-Context Learning

The In-Context Learning (ICL) is to understand a new task via a few demo...

Please sign up or login with your details

Forgot password? Click here to reset