Class Attention Transfer Based Knowledge Distillation

04/25/2023
by   Ziyao Guo, et al.
0

Previous knowledge distillation methods have shown their impressive performance on model compression tasks, however, it is hard to explain how the knowledge they transferred helps to improve the performance of the student network. In this work, we focus on proposing a knowledge distillation method that has both high interpretability and competitive performance. We first revisit the structure of mainstream CNN models and reveal that possessing the capacity of identifying class discriminative regions of input is critical for CNN to perform classification. Furthermore, we demonstrate that this capacity can be obtained and enhanced by transferring class activation maps. Based on our findings, we propose class attention transfer based knowledge distillation (CAT-KD). Different from previous KD methods, we explore and present several properties of the knowledge transferred by our method, which not only improve the interpretability of CAT-KD but also contribute to a better understanding of CNN. While having high interpretability, CAT-KD achieves state-of-the-art performance on multiple benchmarks. Code is available at: https://github.com/GzyAftermath/CAT-KD.

READ FULL TEXT

page 2

page 5

page 6

research
05/05/2022

Spot-adaptive Knowledge Distillation

Knowledge distillation (KD) has become a well established paradigm for c...
research
11/08/2022

Understanding the Role of Mixup in Knowledge Distillation: An Empirical Study

Mixup is a popular data augmentation technique based on creating new sam...
research
03/16/2022

Decoupled Knowledge Distillation

State-of-the-art distillation methods are mainly based on distilling dee...
research
07/18/2023

Class-relation Knowledge Distillation for Novel Class Discovery

We tackle the problem of novel class discovery, which aims to learn nove...
research
02/22/2023

KS-DETR: Knowledge Sharing in Attention Learning for Detection Transformer

Scaled dot-product attention applies a softmax function on the scaled do...
research
07/31/2023

BearingPGA-Net: A Lightweight and Deployable Bearing Fault Diagnosis Network via Decoupled Knowledge Distillation and FPGA Acceleration

Deep learning has achieved remarkable success in the field of bearing fa...
research
05/04/2022

Impact of a DCT-driven Loss in Attention-based Knowledge-Distillation for Scene Recognition

Knowledge Distillation (KD) is a strategy for the definition of a set of...

Please sign up or login with your details

Forgot password? Click here to reset