Tip-Adapter: Training-free Adaption of CLIP for Few-shot Classification

07/19/2022
by   Renrui Zhang, et al.
0

Contrastive Vision-Language Pre-training, known as CLIP, has provided a new paradigm for learning visual representations using large-scale image-text pairs. It shows impressive performance on downstream tasks by zero-shot knowledge transfer. To further enhance CLIP's adaption capability, existing methods proposed to fine-tune additional learnable modules, which significantly improves the few-shot performance but introduces extra training time and computational resources. In this paper, we propose a training-free adaption method for CLIP to conduct few-shot classification, termed as Tip-Adapter, which not only inherits the training-free advantage of zero-shot CLIP but also performs comparably to those training-required approaches. Tip-Adapter constructs the adapter via a key-value cache model from the few-shot training set, and updates the prior knowledge encoded in CLIP by feature retrieval. On top of that, the performance of Tip-Adapter can be further boosted to be state-of-the-art on ImageNet by fine-tuning the cache model for 10× fewer epochs than existing methods, which is both effective and efficient. We conduct extensive experiments of few-shot classification on 11 datasets to demonstrate the superiority of our proposed methods. Code is released at https://github.com/gaopengcuhk/Tip-Adapter.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/06/2021

Tip-Adapter: Training-free CLIP-Adapter for Better Vision-Language Modeling

Contrastive Vision-Language Pre-training, known as CLIP, has provided a ...
research
11/28/2022

SuS-X: Training-Free Name-Only Transfer of Vision-Language Models

Contrastive Language-Image Pre-training (CLIP) has emerged as a simple y...
research
09/28/2022

CALIP: Zero-Shot Enhancement of CLIP with Parameter-free Attention

Contrastive Language-Image Pre-training (CLIP) has been shown to learn v...
research
04/03/2023

Not All Features Matter: Enhancing Few-shot CLIP with Adaptive Prior Refinement

The popularity of Contrastive Language-Image Pre-training (CLIP) has pro...
research
08/21/2023

Incorprating Prompt tuning for Commit classification with prior Knowledge

Commit Classification(CC) is an important task in software maintenance s...
research
07/29/2023

Instance-Wise Adaptive Tuning and Caching for Vision-Language Models

Large-scale vision-language models (LVLMs) pretrained on massive image-t...
research
08/18/2023

RLIPv2: Fast Scaling of Relational Language-Image Pre-training

Relational Language-Image Pre-training (RLIP) aims to align vision repre...

Please sign up or login with your details

Forgot password? Click here to reset