Finetune like you pretrain: Improved finetuning of zero-shot vision models

12/01/2022
by   Sachin Goyal, et al.
0

Finetuning image-text models such as CLIP achieves state-of-the-art accuracies on a variety of benchmarks. However, recent works like WiseFT (Wortsman et al., 2021) and LP-FT (Kumar et al., 2022) have shown that even subtle differences in the finetuning process can lead to surprisingly large differences in the final performance, both for in-distribution (ID) and out-of-distribution (OOD) data. In this work, we show that a natural and simple approach of mimicking contrastive pretraining consistently outperforms alternative finetuning approaches. Specifically, we cast downstream class labels as text prompts and continue optimizing the contrastive loss between image embeddings and class-descriptive prompt embeddings (contrastive finetuning). Our method consistently outperforms baselines across 7 distribution shifts, 6 transfer learning, and 3 few-shot learning benchmarks. On WILDS-iWILDCam, our proposed approach FLYP outperforms the top of the leaderboard by 2.3% ID and 2.7% OOD, giving the highest reported accuracy. Averaged across 7 OOD datasets (2 WILDS and 5 ImageNet associated shifts), FLYP gives gains of 4.2% OOD over standard finetuning and outperforms the current state of the art (LP-FT) by more than 1% both ID and OOD. Similarly, on 3 few-shot learning benchmarks, our approach gives gains up to 4.6% over standard finetuning and 4.4% over the state of the art. In total, these benchmarks establish contrastive finetuning as a simple, intuitive, and state-of-the-art approach for supervised finetuning of image-text models like CLIP. Code is available at https://github.com/locuslab/FLYP.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/28/2022

CyCLIP: Cyclic Contrastive Language-Image Pretraining

Recent advances in contrastive representation learning over paired image...
research
06/02/2023

LoCoOp: Few-Shot Out-of-Distribution Detection via Prompt Learning

We present a novel vision-language prompt learning approach for few-shot...
research
07/14/2022

Contrastive Adapters for Foundation Model Group Robustness

While large pretrained foundation models (FMs) have shown remarkable zer...
research
10/18/2022

MedCLIP: Contrastive Learning from Unpaired Medical Images and Text

Existing vision-text contrastive learning like CLIP aims to match the pa...
research
04/07/2022

Unified Contrastive Learning in Image-Text-Label Space

Visual recognition is recently learned via either supervised learning on...
research
03/23/2023

Calibrated Out-of-Distribution Detection with a Generic Representation

Out-of-distribution detection is a common issue in deploying vision mode...
research
08/21/2023

SupEuclid: Extremely Simple, High Quality OoD Detection with Supervised Contrastive Learning and Euclidean Distance

Out-of-Distribution (OoD) detection has developed substantially in the p...

Please sign up or login with your details

Forgot password? Click here to reset