Cumulative Spatial Knowledge Distillation for Vision Transformers

07/17/2023
by   Borui Zhao, et al.
0

Distilling knowledge from convolutional neural networks (CNNs) is a double-edged sword for vision transformers (ViTs). It boosts the performance since the image-friendly local-inductive bias of CNN helps ViT learn faster and better, but leading to two problems: (1) Network designs of CNN and ViT are completely different, which leads to different semantic levels of intermediate features, making spatial-wise knowledge transfer methods (e.g., feature mimicking) inefficient. (2) Distilling knowledge from CNN limits the network convergence in the later training period since ViT's capability of integrating global information is suppressed by CNN's local-inductive-bias supervision. To this end, we present Cumulative Spatial Knowledge Distillation (CSKD). CSKD distills spatial-wise knowledge to all patch tokens of ViT from the corresponding spatial responses of CNN, without introducing intermediate features. Furthermore, CSKD exploits a Cumulative Knowledge Fusion (CKF) module, which introduces the global response of CNN and increasingly emphasizes its importance during the training. Applying CKF leverages CNN's local inductive bias in the early training period and gives full play to ViT's global capability in the later one. Extensive experiments and analysis on ImageNet-1k and downstream datasets demonstrate the superiority of our CSKD. Code will be publicly available.

READ FULL TEXT

page 7

page 8

page 11

research
04/27/2022

DearKD: Data-Efficient Early Knowledge Distillation for Vision Transformers

Transformers are successfully applied to computer vision due to their po...
research
03/25/2023

Supervised Masked Knowledge Distillation for Few-Shot Transformers

Vision Transformers (ViTs) emerge to achieve impressive performance on m...
research
02/27/2022

Transformer-based Knowledge Distillation for Efficient Semantic Segmentation of Road-driving Scenes

For scene understanding in robotics and automated driving, there is a gr...
research
06/23/2021

Co-advise: Cross Inductive Bias Distillation

Transformers recently are adapted from the community of natural language...
research
05/15/2023

Enhancing Performance of Vision Transformers on Small Datasets through Local Inductive Bias Incorporation

Vision transformers (ViTs) achieve remarkable performance on large datas...
research
12/09/2020

Positional Encoding as Spatial Inductive Bias in GANs

SinGAN shows impressive capability in learning internal patch distributi...
research
02/05/2020

Analyzing the Dependency of ConvNets on Spatial Information

Intuitively, image classification should profit from using spatial infor...

Please sign up or login with your details

Forgot password? Click here to reset