Not All Features Matter: Enhancing Few-shot CLIP with Adaptive Prior Refinement

04/03/2023
by   Xiangyang Zhu, et al.
0

The popularity of Contrastive Language-Image Pre-training (CLIP) has propelled its application to diverse downstream vision tasks. To improve its capacity on downstream tasks, few-shot learning has become a widely-adopted technique. However, existing methods either exhibit limited performance or suffer from excessive learnable parameters. In this paper, we propose APE, an Adaptive Prior rEfinement method for CLIP's pre-trained knowledge, which achieves superior accuracy with high computational efficiency. Via a prior refinement module, we analyze the inter-class disparity in the downstream data and decouple the domain-specific knowledge from the CLIP-extracted cache model. On top of that, we introduce two model variants, a training-free APE and a training-required APE-T. We explore the trilateral affinities between the test image, prior cache model, and textual representations, and only enable a lightweight category-residual module to be trained. For the average accuracy over 11 benchmarks, both APE and APE-T attain state-of-the-art and respectively outperform the second-best by +1.59 learnable parameters.

READ FULL TEXT

page 2

page 15

research
03/03/2023

Prompt, Generate, then Cache: Cascade of Foundation Models makes Strong Few-shot Learners

Visual recognition in low-data regimes requires deep neural networks to ...
research
07/19/2022

Tip-Adapter: Training-free Adaption of CLIP for Few-shot Classification

Contrastive Vision-Language Pre-training, known as CLIP, has provided a ...
research
09/28/2022

CALIP: Zero-Shot Enhancement of CLIP with Parameter-free Attention

Contrastive Language-Image Pre-training (CLIP) has been shown to learn v...
research
09/14/2023

PRE: Vision-Language Prompt Learning with Reparameterization Encoder

Large pre-trained vision-language models such as CLIP have demonstrated ...
research
11/06/2021

Tip-Adapter: Training-free CLIP-Adapter for Better Vision-Language Modeling

Contrastive Vision-Language Pre-training, known as CLIP, has provided a ...
research
07/29/2023

Instance-Wise Adaptive Tuning and Caching for Vision-Language Models

Large-scale vision-language models (LVLMs) pretrained on massive image-t...
research
09/06/2023

Image Aesthetics Assessment via Learnable Queries

Image aesthetics assessment (IAA) aims to estimate the aesthetics of ima...

Please sign up or login with your details

Forgot password? Click here to reset