A Closer Look at Self-supervised Lightweight Vision Transformers

05/28/2022
by   Shaoru Wang, et al.
0

Self-supervised learning on large-scale Vision Transformers (ViTs) as pre-training methods has achieved promising downstream performance. Yet, how such pre-training paradigms promote lightweight ViTs' performance is considerably less studied. In this work, we mainly produce recipes for pre-training high-performance lightweight ViTs using masked-image-modeling-based MAE, namely MAE-lite, which achieves 78.4 accuracy on ImageNet with ViT-Tiny (5.7M). Furthermore, we develop and benchmark other fully-supervised and self-supervised pre-training counterparts, e.g., contrastive-learning-based MoCo-v3, on both ImageNet and other classification tasks. We analyze and clearly show the effect of such pre-training, and reveal that properly-learned lower layers of the pre-trained models matter more than higher ones in data-sufficient downstream tasks. Finally, by further comparing with the pre-trained representations of the up-scaled models, a distillation strategy during pre-training is developed to improve the pre-trained representations as well, leading to further downstream performance improvement. The code and models will be made publicly available.

READ FULL TEXT

page 6

page 7

research
09/07/2022

MimCo: Masked Image Modeling Pre-training with Contrastive Teacher

Recent masked image modeling (MIM) has received much attention in self-s...
research
12/12/2020

The Lottery Tickets Hypothesis for Supervised and Self-supervised Pre-training in Computer Vision Models

The computer vision world has been re-gaining enthusiasm in various pre-...
research
07/11/2022

A Closer Look at Invariances in Self-supervised Pre-training for 3D Vision

Self-supervised pre-training for 3D vision has drawn increasing research...
research
11/23/2022

SS-CXR: Multitask Representation Learning using Self Supervised Pre-training from Chest X-Rays

Chest X-rays (CXRs) are a widely used imaging modality for the diagnosis...
research
06/17/2021

Efficient Self-supervised Vision Transformers for Representation Learning

This paper investigates two techniques for developing efficient self-sup...
research
05/26/2022

Revealing the Dark Secrets of Masked Image Modeling

Masked image modeling (MIM) as pre-training is shown to be effective for...
research
03/27/2022

Beyond Masking: Demystifying Token-Based Pre-Training for Vision Transformers

The past year has witnessed a rapid development of masked image modeling...

Please sign up or login with your details

Forgot password? Click here to reset