Variational prompt tuning improves generalization of vision-language models

Prompt tuning provides an efficient mechanism to adapt large vision-language models to downstream tasks by treating part of the input language prompts as learnable parameters while freezing the rest of the model. Existing works for prompt tuning are however prone to damaging the generalization capabilities of the foundation models, because the learned prompts lack the capacity of covering certain concepts within the language model. To avoid such limitation, we propose a probabilistic modeling of the underlying distribution of prompts, allowing prompts within the support of an associated concept to be derived through stochastic sampling. This results in a more complete and richer transfer of the information captured by the language model, providing better generalization capabilities for downstream tasks. The resulting algorithm relies on a simple yet powerful variational framework that can be directly integrated with other developments. We show our approach is seamlessly integrated into both standard and conditional prompt learning frameworks, improving the performance on both cases considerably, especially with regards to preserving the generalization capability of the original model. Our method provides the current state-of-the-art for prompt learning, surpassing CoCoOp by 1.6 surpasses the original CLIP model in terms of generalization to new classes. Implementation code will be released.

READ FULL TEXT
research
06/01/2023

Consistency-guided Prompt Learning for Vision-Language Models

We propose Consistency-guided Prompt learning (CoPrompt), a new fine-tun...
research
03/30/2023

Task-Oriented Multi-Modal Mutual Leaning for Vision-Language Models

Prompt learning has become one of the most efficient paradigms for adapt...
research
09/08/2023

Context-Aware Prompt Tuning for Vision-Language Model with Dual-Alignment

Large-scale vision-language models (VLMs), e.g., CLIP, learn broad visua...
research
12/12/2022

Prompting Is Programming: A Query Language For Large Language Models

Large language models have demonstrated outstanding performance on a wid...
research
07/28/2023

Cross-Modal Concept Learning and Inference for Vision-Language Models

Large-scale pre-trained Vision-Language Models (VLMs), such as CLIP, est...
research
03/16/2023

Patch-Token Aligned Bayesian Prompt Learning for Vision-Language Models

For downstream applications of vision-language pre-trained models, there...
research
07/04/2023

On Conditional and Compositional Language Model Differentiable Prompting

Prompts have been shown to be an effective method to adapt a frozen Pret...

Please sign up or login with your details

Forgot password? Click here to reset