PRE: Vision-Language Prompt Learning with Reparameterization Encoder

09/14/2023
by   Anh Pham Thi Minh, et al.
0

Large pre-trained vision-language models such as CLIP have demonstrated great potential in zero-shot transferability to downstream tasks. However, to attain optimal performance, the manual selection of prompts is necessary to improve alignment between the downstream image distribution and the textual class descriptions. This manual prompt engineering is the major challenge for deploying such models in practice since it requires domain expertise and is extremely time-consuming. To avoid non-trivial prompt engineering, recent work Context Optimization (CoOp) introduced the concept of prompt learning to the vision domain using learnable textual tokens. While CoOp can achieve substantial improvements over manual prompts, its learned context is worse generalizable to wider unseen classes within the same dataset. In this work, we present Prompt Learning with Reparameterization Encoder (PRE) - a simple and efficient method that enhances the generalization ability of the learnable prompt to unseen classes while maintaining the capacity to learn Base classes. Instead of directly optimizing the prompts, PRE employs a prompt encoder to reparameterize the input prompt embeddings, enhancing the exploration of task-specific knowledge from few-shot samples. Experiments and extensive ablation studies on 8 benchmarks demonstrate that our approach is an efficient method for prompt learning. Specifically, PRE achieves a notable enhancement of 5.60 CoOp in the 16-shot setting, all achieved within a good training time.

READ FULL TEXT

page 1

page 11

page 13

research
03/10/2022

Conditional Prompt Learning for Vision-Language Models

With the rise of powerful pre-trained vision-language models like CLIP, ...
research
05/23/2022

Supporting Vision-Language Model Inference with Causality-pruning Knowledge Prompt

Vision-language models are pre-trained by aligning image-text pairs in a...
research
09/06/2023

Image Aesthetics Assessment via Learnable Queries

Image aesthetics assessment (IAA) aims to estimate the aesthetics of ima...
research
10/09/2022

Learning to Decompose Visual Features with Latent Textual Prompts

Recent advances in pre-training vision-language models like CLIP have sh...
research
06/06/2022

OrdinalCLIP: Learning Rank Prompts for Language-Guided Ordinal Regression

This paper presents a language-powered paradigm for ordinal regression. ...
research
09/15/2023

Audio-free Prompt Tuning for Language-Audio Models

Contrastive Language-Audio Pretraining (CLAP) is pre-trained to associat...
research
04/03/2023

Not All Features Matter: Enhancing Few-shot CLIP with Adaptive Prior Refinement

The popularity of Contrastive Language-Image Pre-training (CLIP) has pro...

Please sign up or login with your details

Forgot password? Click here to reset