Attention Distillation: self-supervised vision transformer students need more guidance

10/03/2022
by   Kai Wang, et al.
0

Self-supervised learning has been widely applied to train high-quality vision transformers. Unleashing their excellent performance on memory and compute constraint devices is therefore an important research topic. However, how to distill knowledge from one self-supervised ViT to another has not yet been explored. Moreover, the existing self-supervised knowledge distillation (SSKD) methods focus on ConvNet based architectures are suboptimal for ViT knowledge distillation. In this paper, we study knowledge distillation of self-supervised vision transformers (ViT-SSKD). We show that directly distilling information from the crucial attention mechanism from teacher to student can significantly narrow the performance gap between both. In experiments on ImageNet-Subset and ImageNet-1K, we show that our method AttnDistill outperforms existing self-supervised knowledge distillation (SSKD) methods and achieves state-of-the-art k-NN accuracy compared with self-supervised learning (SSL) methods learning from scratch (with the ViT-S model). We are also the first to apply the tiny ViT-T model on self-supervised learning. Moreover, AttnDistill is independent of self-supervised learning algorithms, it can be adapted to ViT based SSL methods to improve the performance in future research. The code is here: https://github.com/wangkai930418/attndistill

READ FULL TEXT

page 1

page 4

page 5

page 7

page 18

page 23

page 24

page 25

research
01/23/2023

A Simple Recipe for Competitive Low-compute Self supervised Vision Models

Self-supervised methods in vision have been mostly focused on large arch...
research
03/25/2023

Supervised Masked Knowledge Distillation for Few-Shot Transformers

Vision Transformers (ViTs) emerge to achieve impressive performance on m...
research
09/03/2023

COMEDIAN: Self-Supervised Learning and Knowledge Distillation for Action Spotting using Transformers

We present COMEDIAN, a novel pipeline to initialize spatio-temporal tran...
research
04/29/2021

Emerging Properties in Self-Supervised Vision Transformers

In this paper, we question if self-supervised learning provides new prop...
research
05/22/2023

EnSiam: Self-Supervised Learning With Ensemble Representations

Recently, contrastive self-supervised learning, where the proximity of r...
research
06/21/2021

Simple Distillation Baselines for Improving Small Self-supervised Models

While large self-supervised models have rivalled the performance of thei...
research
07/18/2018

Self-supervised Knowledge Distillation Using Singular Value Decomposition

To solve deep neural network (DNN)'s huge training dataset and its high ...

Please sign up or login with your details

Forgot password? Click here to reset