Training data-efficient image transformers distillation through attention

12/23/2020
by   Hugo Touvron, et al.
15

Recently, neural networks purely based on attention were shown to address image understanding tasks such as image classification. However, these visual transformers are pre-trained with hundreds of millions of images using an expensive infrastructure, thereby limiting their adoption by the larger community. In this work, with an adequate training scheme, we produce a competitive convolution-free transformer by training on Imagenet only. We train it on a single computer in less than 3 days. Our reference vision transformer (86M parameters) achieves top-1 accuracy of 83.1 ImageNet with no external data. We share our code and models to accelerate community advances on this line of research. Additionally, we introduce a teacher-student strategy specific to transformers. It relies on a distillation token ensuring that the student learns from the teacher through attention. We show the interest of this token-based distillation, especially when using a convnet as a teacher. This leads us to report results competitive with convnets for both Imagenet (where we obtain up to 84.4

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/23/2021

Co-advise: Cross Inductive Bias Distillation

Transformers recently are adapted from the community of natural language...
research
03/23/2022

What to Hide from Your Students: Attention-Guided Masked Image Modeling

Transformers and masked language modeling are quickly being adopted and ...
research
12/28/2022

OVO: One-shot Vision Transformer Search with Online distillation

Pure transformers have shown great potential for vision tasks recently. ...
research
12/09/2022

Co-training 2^L Submodels for Visual Recognition

We introduce submodel co-training, a regularization method related to co...
research
06/28/2023

Hybrid Distillation: Connecting Masked Autoencoders with Contrastive Learners

Representation learning has been evolving from traditional supervised tr...
research
09/13/2022

PSAQ-ViT V2: Towards Accurate and General Data-Free Quantization for Vision Transformers

Data-free quantization can potentially address data privacy and security...
research
05/07/2021

ResMLP: Feedforward networks for image classification with data-efficient training

We present ResMLP, an architecture built entirely upon multi-layer perce...

Please sign up or login with your details

Forgot password? Click here to reset