Contrastive Learning Rivals Masked Image Modeling in Fine-tuning via Feature Distillation

05/27/2022
by   Yixuan Wei, et al.
4

Masked image modeling (MIM) learns representations with remarkably good fine-tuning performances, overshadowing previous prevalent pre-training approaches such as image classification, instance contrastive learning, and image-text alignment. In this paper, we show that the inferior fine-tuning performance of these pre-training approaches can be significantly improved by a simple post-processing in the form of feature distillation (FD). The feature distillation converts the old representations to new representations that have a few desirable properties just like those representations produced by MIM. These properties, which we aggregately refer to as optimization friendliness, are identified and analyzed by a set of attention- and optimization-related diagnosis tools. With these properties, the new representations show strong fine-tuning performance. Specifically, the contrastive self-supervised learning methods are made as competitive in fine-tuning as the state-of-the-art masked image modeling (MIM) algorithms. The CLIP models' fine-tuning performance is also significantly improved, with a CLIP ViT-L model reaching 89.0 accuracy on ImageNet-1K classification. More importantly, our work provides a way for the future research to focus more effort on the generality and scalability of the learnt representations without being pre-occupied with optimization friendliness since it can be enhanced rather easily. The code will be available at https://github.com/SwinTransformer/Feature-Distillation.

READ FULL TEXT

page 5

page 7

page 13

research
06/18/2020

Recovering Petaflops in Contrastive Semi-Supervised Learning of Visual Representations

We investigate a strategy for improving the computational efficiency of ...
research
02/12/2021

Unleashing the Power of Contrastive Self-Supervised Visual Models via Contrast-Regularized Fine-Tuning

Contrastive self-supervised learning (CSL) leverages unlabeled data to t...
research
12/12/2022

CLIP Itself is a Strong Fine-tuner: Achieving 85.7 Accuracy with ViT-B and ViT-L on ImageNet

Recent studies have shown that CLIP has achieved remarkable success in p...
research
02/27/2023

Layer Grafted Pre-training: Bridging Contrastive Learning And Masked Image Modeling For Label-Efficient Representations

Recently, both Contrastive Learning (CL) and Mask Image Modeling (MIM) d...
research
03/15/2023

Task-specific Fine-tuning via Variational Information Bottleneck for Weakly-supervised Pathology Whole Slide Image Classification

While Multiple Instance Learning (MIL) has shown promising results in di...
research
12/02/2021

InsCLR: Improving Instance Retrieval with Self-Supervision

This work aims at improving instance retrieval with self-supervision. We...
research
05/26/2023

On convex conceptual regions in deep network representations

The current study of human-machine alignment aims at understanding the g...

Please sign up or login with your details

Forgot password? Click here to reset