Unified Vision and Language Prompt Learning

10/13/2022
by   Yuhang Zang, et al.
0

Prompt tuning, a parameter- and data-efficient transfer learning paradigm that tunes only a small number of parameters in a model's input space, has become a trend in the vision community since the emergence of large vision-language models like CLIP. We present a systematic study on two representative prompt tuning methods, namely text prompt tuning and visual prompt tuning. A major finding is that none of the unimodal prompt tuning methods performs consistently well: text prompt tuning fails on data with high intra-class visual variances while visual prompt tuning cannot handle low inter-class variances. To combine the best from both worlds, we propose a simple approach called Unified Prompt Tuning (UPT), which essentially learns a tiny neural network to jointly optimize prompts across different modalities. Extensive experiments on over 11 vision datasets show that UPT achieves a better trade-off than the unimodal counterparts on few-shot learning benchmarks, as well as on domain generalization benchmarks. Code and models will be released to facilitate future research.

READ FULL TEXT

page 6

page 9

research
11/21/2022

Multitask Vision-Language Prompt Tuning

Prompt Tuning, conditioning on task-specific learned prompt vectors, has...
research
12/08/2022

Learning Domain Invariant Prompt for Vision-Language Models

Prompt learning is one of the most effective and trending ways to adapt ...
research
09/06/2023

Distribution-Aware Prompt Tuning for Vision-Language Models

Pre-trained vision-language models (VLMs) have shown impressive performa...
research
10/03/2022

Towards a Unified View on Visual Parameter-Efficient Transfer Learning

Since the release of various large-scale natural language processing (NL...
research
04/27/2023

π-Tuning: Transferring Multimodal Foundation Models with Optimal Multi-task Interpolation

Foundation models have achieved great advances in multi-task learning wi...
research
06/11/2023

LAMM: Language-Assisted Multi-Modal Instruction-Tuning Dataset, Framework, and Benchmark

Large language models have become a potential pathway toward achieving a...
research
02/01/2023

Does Vision Accelerate Hierarchical Generalization of Neural Language Learners?

Neural language models (LMs) are arguably less data-efficient than human...

Please sign up or login with your details

Forgot password? Click here to reset