Channel Distillation: Channel-Wise Attention for Knowledge Distillation

06/02/2020
by   Zaida Zhou, et al.
0

Knowledge distillation is to transfer the knowledge from the data learned by the teacher network to the student network, so that the student has the advantage of less parameters and less calculations, and the accuracy is close to the teacher. In this paper, we propose a new distillation method, which contains two transfer distillation strategies and a loss decay strategy. The first transfer strategy is based on channel-wise attention, called Channel Distillation (CD). CD transfers the channel information from the teacher to the student. The second is Guided Knowledge Distillation (GKD). Unlike Knowledge Distillation (KD), which allows the student to mimic each sample's prediction distribution of the teacher, GKD only enables the student to mimic the correct output of the teacher. The last part is Early Decay Teacher (EDT). During the training process, we gradually decay the weight of the distillation loss. The purpose is to enable the student to gradually control the optimization rather than the teacher. Our proposed method is evaluated on ImageNet and CIFAR100. On ImageNet, we achieve 27.68 state-of-the-art methods. On CIFAR100, we achieve surprising result that the student outperforms the teacher. Code is available at https://github.com/zhouzaida/channel-distillation.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/08/2020

ResKD: Residual-Guided Knowledge Distillation

Knowledge distillation has emerge as a promising technique for compressi...
research
07/01/2021

Revisiting Knowledge Distillation: An Inheritance and Exploration Framework

Knowledge Distillation (KD) is a popular technique to transfer knowledge...
research
02/08/2022

Exploring Inter-Channel Correlation for Diversity-preserved KnowledgeDistillation

Knowledge Distillation has shown very promising abil-ity in transferring...
research
07/08/2020

Robust Re-Identification by Multiple Views Knowledge Distillation

To achieve robustness in Re-Identification, standard methods leverage tr...
research
02/05/2021

Show, Attend and Distill:Knowledge Distillation via Attention-based Feature Matching

Knowledge distillation extracts general knowledge from a pre-trained tea...
research
02/09/2023

Toward Extremely Lightweight Distracted Driver Recognition With Distillation-Based Neural Architecture Search and Knowledge Transfer

The number of traffic accidents has been continuously increasing in rece...
research
11/21/2022

Blind Knowledge Distillation for Robust Image Classification

Optimizing neural networks with noisy labels is a challenging task, espe...

Please sign up or login with your details

Forgot password? Click here to reset