Revisiting Knowledge Distillation: An Inheritance and Exploration Framework

07/01/2021
by   Zhen Huang, et al.
0

Knowledge Distillation (KD) is a popular technique to transfer knowledge from a teacher model or ensemble to a student model. Its success is generally attributed to the privileged information on similarities/consistency between the class distributions or intermediate feature representations of the teacher model and the student model. However, directly pushing the student model to mimic the probabilities/features of the teacher model to a large extent limits the student model in learning undiscovered knowledge/features. In this paper, we propose a novel inheritance and exploration knowledge distillation framework (IE-KD), in which a student model is split into two parts - inheritance and exploration. The inheritance part is learned with a similarity loss to transfer the existing learned knowledge from the teacher model to the student model, while the exploration part is encouraged to learn representations different from the inherited ones with a dis-similarity loss. Our IE-KD framework is generic and can be easily combined with existing distillation or mutual learning methods for training deep neural networks. Extensive experiments demonstrate that these two parts can jointly push the student model to learn more diversified and effective representations, and our IE-KD can be a general technique to improve the student network to achieve SOTA performance. Furthermore, by applying our IE-KD to the training of two networks, the performance of both can be improved w.r.t. deep mutual learning. The code and models of IE-KD will be make publicly available at https://github.com/yellowtownhz/IE-KD.

READ FULL TEXT

page 8

page 14

research
06/02/2020

Channel Distillation: Channel-Wise Attention for Knowledge Distillation

Knowledge distillation is to transfer the knowledge from the data learne...
research
09/25/2019

Revisit Knowledge Distillation: a Teacher-free Framework

Knowledge Distillation (KD) aims to distill the knowledge of a cumbersom...
research
01/21/2022

Image-to-Video Re-Identification via Mutual Discriminative Knowledge Transfer

The gap in representations between image and video makes Image-to-Video ...
research
09/28/2019

Training convolutional neural networks with cheap convolutions and online distillation

The large memory and computation consumption in convolutional neural net...
research
05/19/2021

Comparing Kullback-Leibler Divergence and Mean Squared Error Loss in Knowledge Distillation

Knowledge distillation (KD), transferring knowledge from a cumbersome te...
research
12/05/2018

Feature Matters: A Stage-by-Stage Approach for Knowledge Transfer

Convolutional Neural Networks (CNNs) become deeper and deeper in recent ...
research
07/23/2022

Online Knowledge Distillation via Mutual Contrastive Learning for Visual Recognition

The teacher-free online Knowledge Distillation (KD) aims to train an ens...

Please sign up or login with your details

Forgot password? Click here to reset