Distilling Knowledge via Intermediate Classifier Heads

02/28/2021
by   Aryan Asadian, et al.
0

The crux of knowledge distillation – as a transfer-learning approach – is to effectively train a resource-limited student model with the guide of a pre-trained larger teacher model. However, when there is a large difference between the model complexities of teacher and student (i.e., capacity gap), knowledge distillation loses its strength in transferring knowledge from the teacher to the student, thus training a weaker student. To mitigate the impact of the capacity gap, we introduce knowledge distillation via intermediate heads. By extending the intermediate layers of the teacher (at various depths) with classifier heads, we cheaply acquire a cohort of heterogeneous pre-trained teachers. The intermediate classifier heads can all together be efficiently learned while freezing the backbone of the pre-trained teacher. The cohort of teachers (including the original teacher) co-teach the student simultaneously. Our experiments on various teacher-student pairs and datasets have demonstrated that the proposed approach outperforms the canonical knowledge distillation approach and its extensions.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/23/2021

Student Network Learning via Evolutionary Knowledge Distillation

Knowledge distillation provides an effective way to transfer knowledge v...
research
04/24/2023

Improving Knowledge Distillation Via Transferring Learning Ability

Existing knowledge distillation methods generally use a teacher-student ...
research
02/09/2019

Improved Knowledge Distillation via Teacher Assistant: Bridging the Gap Between Student and Teacher

Despite the fact that deep neural networks are powerful models and achie...
research
09/21/2021

RAIL-KD: RAndom Intermediate Layer Mapping for Knowledge Distillation

Intermediate layer knowledge distillation (KD) can improve the standard ...
research
03/26/2022

Knowledge Distillation with the Reused Teacher Classifier

Knowledge distillation aims to compress a powerful yet cumbersome teache...
research
04/19/2023

Knowledge Distillation Under Ideal Joint Classifier Assumption

Knowledge distillation is a powerful technique to compress large neural ...
research
03/09/2020

Pacemaker: Intermediate Teacher Knowledge Distillation For On-The-Fly Convolutional Neural Network

There is a need for an on-the-fly computational process with very low pe...

Please sign up or login with your details

Forgot password? Click here to reset