Heterogeneous Knowledge Distillation using Information Flow Modeling

05/02/2020
by   Nikolaos Passalis, et al.
0

Knowledge Distillation (KD) methods are capable of transferring the knowledge encoded in a large and complex teacher into a smaller and faster student. Early methods were usually limited to transferring the knowledge only between the last layers of the networks, while latter approaches were capable of performing multi-layer KD, further increasing the accuracy of the student. However, despite their improved performance, these methods still suffer from several limitations that restrict both their efficiency and flexibility. First, existing KD methods typically ignore that neural networks undergo through different learning phases during the training process, which often requires different types of supervision for each one. Furthermore, existing multi-layer KD methods are usually unable to effectively handle networks with significantly different architectures (heterogeneous KD). In this paper we propose a novel KD method that works by modeling the information flow through the various layers of the teacher model and then train a student model to mimic this information flow. The proposed method is capable of overcoming the aforementioned limitations by using an appropriate supervision scheme during the different phases of the training process, as well as by designing and training an appropriate auxiliary teacher model that acts as a proxy model capable of "explaining" the way the teacher works to the student. The effectiveness of the proposed method is demonstrated using four image datasets and several different evaluation setups.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/01/2023

Improved Knowledge Distillation for Pre-trained Language Models via Knowledge Selection

Knowledge distillation addresses the problem of transferring knowledge f...
research
05/20/2022

InDistill: Transferring Knowledge From Pruned Intermediate Layers

Deploying deep neural networks on hardware with limited resources, such ...
research
04/11/2019

Variational Information Distillation for Knowledge Transfer

Transferring knowledge from a teacher neural network pretrained on the s...
research
09/30/2020

Pea-KD: Parameter-efficient and Accurate Knowledge Distillation

How can we efficiently compress a model while maintaining its performanc...
research
12/31/2019

Modeling Teacher-Student Techniques in Deep Neural Networks for Knowledge Distillation

Knowledge distillation (KD) is a new method for transferring knowledge o...
research
04/28/2023

CORSD: Class-Oriented Relational Self Distillation

Knowledge distillation conducts an effective model compression method wh...
research
08/12/2021

Learning from Matured Dumb Teacher for Fine Generalization

The flexibility of decision boundaries in neural networks that are ungui...

Please sign up or login with your details

Forgot password? Click here to reset