Supervised Masked Knowledge Distillation for Few-Shot Transformers

03/25/2023
by   Han Lin, et al.
0

Vision Transformers (ViTs) emerge to achieve impressive performance on many data-abundant computer vision tasks by capturing long-range dependencies among local features. However, under few-shot learning (FSL) settings on small datasets with only a few labeled data, ViT tends to overfit and suffers from severe performance degradation due to its absence of CNN-alike inductive bias. Previous works in FSL avoid such problem either through the help of self-supervised auxiliary losses, or through the dextile uses of label information under supervised settings. But the gap between self-supervised and supervised few-shot Transformers is still unfilled. Inspired by recent advances in self-supervised knowledge distillation and masked image modeling (MIM), we propose a novel Supervised Masked Knowledge Distillation model (SMKD) for few-shot Transformers which incorporates label information into self-distillation frameworks. Compared with previous self-supervised methods, we allow intra-class knowledge distillation on both class and patch tokens, and introduce the challenging task of masked patch tokens reconstruction across intra-class images. Experimental results on four few-shot classification benchmark datasets show that our method with simple design outperforms previous methods by a large margin and achieves a new start-of-the-art. Detailed ablation studies confirm the effectiveness of each component of our model. Code for this paper is available here: https://github.com/HL-hanlin/SMKD.

READ FULL TEXT

page 7

page 16

page 17

research
10/03/2022

Attention Distillation: self-supervised vision transformer students need more guidance

Self-supervised learning has been widely applied to train high-quality v...
research
03/17/2022

Attribute Surrogates Learning and Spectral Tokens Pooling in Transformers for Few-shot Learning

This paper presents new hierarchically cascaded transformers that can im...
research
03/14/2022

Self-Promoted Supervision for Few-Shot Transformer

The few-shot learning ability of vision transformers (ViTs) is rarely in...
research
07/17/2023

Cumulative Spatial Knowledge Distillation for Vision Transformers

Distilling knowledge from convolutional neural networks (CNNs) is a doub...
research
03/01/2022

Self-Supervised Vision Transformers Learn Visual Concepts in Histopathology

Tissue phenotyping is a fundamental task in learning objective character...
research
03/01/2021

Exploring Complementary Strengths of Invariant and Equivariant Representations for Few-Shot Learning

In many real-world problems, collecting a large number of labeled sample...
research
06/17/2020

Self-supervised Knowledge Distillation for Few-shot Learning

Real-world contains an overwhelmingly large number of object classes, le...

Please sign up or login with your details

Forgot password? Click here to reset