CLIP Itself is a Strong Fine-tuner: Achieving 85.7 Accuracy with ViT-B and ViT-L on ImageNet

12/12/2022
by   Xiaoyi Dong, et al.
0

Recent studies have shown that CLIP has achieved remarkable success in performing zero-shot inference while its fine-tuning performance is not satisfactory. In this paper, we identify that fine-tuning performance is significantly impacted by hyper-parameter choices. We examine various key hyper-parameters and empirically evaluate their impact in fine-tuning CLIP for classification tasks through a comprehensive study. We find that the fine-tuning performance of CLIP is substantially underestimated. Equipped with hyper-parameter refinement, we demonstrate CLIP itself is better or at least competitive in fine-tuning compared with large-scale supervised pre-training approaches or latest works that use CLIP as prediction targets in Masked Image Modeling. Specifically, CLIP ViT-Base/16 and CLIP ViT-Large/14 can achieve 85.7 observations challenge the conventional conclusion that CLIP is not suitable for fine-tuning, and motivate us to rethink recently proposed improvements based on CLIP. We will release our code publicly at <https://github.com/LightDXY/FT-CLIP>.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/16/2022

FactPEGASUS: Factuality-Aware Pre-training and Fine-tuning for Abstractive Summarization

We present FactPEGASUS, an abstractive summarization model that addresse...
research
05/27/2022

Contrastive Learning Rivals Masked Image Modeling in Fine-tuning via Feature Distillation

Masked image modeling (MIM) learns representations with remarkably good ...
research
10/19/2022

lo-fi: distributed fine-tuning without communication

When fine-tuning large neural networks, it is common to use multiple nod...
research
05/12/2023

A Comprehensive Analysis of Adapter Efficiency

Adapters have been positioned as a parameter-efficient fine-tuning (PEFT...
research
10/09/2022

SparseAdapter: An Easy Approach for Improving the Parameter-Efficiency of Adapters

Adapter Tuning, which freezes the pretrained language models (PLMs) and ...
research
03/20/2023

EVA-02: A Visual Representation for Neon Genesis

We launch EVA-02, a next-generation Transformer-based visual representat...
research
04/17/2023

Hyper-Decision Transformer for Efficient Online Policy Adaptation

Decision Transformers (DT) have demonstrated strong performances in offl...

Please sign up or login with your details

Forgot password? Click here to reset