Layer Grafted Pre-training: Bridging Contrastive Learning And Masked Image Modeling For Label-Efficient Representations

02/27/2023
by   Ziyu Jiang, et al.
0

Recently, both Contrastive Learning (CL) and Mask Image Modeling (MIM) demonstrate that self-supervision is powerful to learn good representations. However, naively combining them is far from success. In this paper, we start by making the empirical observation that a naive joint optimization of CL and MIM losses leads to conflicting gradient directions - more severe as the layers go deeper. This motivates us to shift the paradigm from combining loss at the end, to choosing the proper learning method per network layer. Inspired by experimental observations, we find that MIM and CL are suitable to lower and higher layers, respectively. We hence propose to combine them in a surprisingly simple, "sequential cascade" fashion: early layers are first trained under one MIM loss, on top of which latter layers continue to be trained under another CL loss. The proposed Layer Grafted Pre-training learns good visual representations that demonstrate superior label efficiency in downstream applications, in particular yielding strong few-shot performance besides linear evaluation. For instance, on ImageNet-1k, Layer Grafted Pre-training yields 65.5 improves MIM and CL baselines by 14.4 code is available at https://github.com/VITA-Group/layerGraftedPretraining_ICLR23.git.

READ FULL TEXT
research
03/03/2023

Prompt, Generate, then Cache: Cascade of Foundation Models makes Strong Few-shot Learners

Visual recognition in low-data regimes requires deep neural networks to ...
research
05/27/2022

Contrastive Learning Rivals Masked Image Modeling in Fine-tuning via Feature Distillation

Masked image modeling (MIM) learns representations with remarkably good ...
research
05/06/2014

Is Joint Training Better for Deep Auto-Encoders?

Traditionally, when generative models of data are developed via deep arc...
research
12/20/2014

Why does Deep Learning work? - A perspective from Group Theory

Why does Deep Learning work? What representations does it capture? How d...
research
04/08/2015

A Group Theoretic Perspective on Unsupervised Deep Learning

Why does Deep Learning work? What representations does it capture? How d...
research
01/26/2021

Revisiting Locally Supervised Learning: an Alternative to End-to-end Training

Due to the need to store the intermediate activations for back-propagati...
research
06/05/2023

Unsupervised Dense Retrieval with Relevance-Aware Contrastive Pre-Training

Dense retrievers have achieved impressive performance, but their demand ...

Please sign up or login with your details

Forgot password? Click here to reset