FEED: Feature-level Ensemble for Knowledge Distillation

09/24/2019
by   Seonguk Park, et al.
0

Knowledge Distillation (KD) aims to transfer knowledge in a teacher-student framework, by providing the predictions of the teacher network to the student network in the training stage to help the student network generalize better. It can use either a teacher with high capacity or an ensemble of multiple teachers. However, the latter is not convenient when one wants to use feature-map-based distillation methods. For a solution, this paper proposes a versatile and powerful training algorithm named FEature-level Ensemble for knowledge Distillation (FEED), which aims to transfer the ensemble knowledge using multiple teacher networks. We introduce a couple of training algorithms that transfer ensemble knowledge to the student at the feature map level. Among the feature-map-based distillation methods, using several non-linear transformations in parallel for transferring the knowledge of the multiple teachers helps the student find more generalized solutions. We name this method as parallel FEED, andexperimental results on CIFAR-100 and ImageNet show that our method has clear performance enhancements, without introducing any additional parameters or computations at test time. We also show the experimental results of sequentially feeding teacher's information to the student, hence the name sequential FEED, and discuss the lessons obtained. Additionally, the empirical results on measuring the reconstruction errors at the feature map give hints for the enhancements.

READ FULL TEXT
research
11/29/2019

Towards Oracle Knowledge Distillation with Neural Architecture Search

We present a novel framework of knowledge distillation that is capable o...
research
03/02/2023

Distillation from Heterogeneous Models for Top-K Recommendation

Recent recommender systems have shown remarkable performance by using an...
research
10/23/2022

Respecting Transfer Gap in Knowledge Distillation

Knowledge distillation (KD) is essentially a process of transferring a t...
research
10/20/2022

Similarity of Neural Architectures Based on Input Gradient Transferability

In this paper, we aim to design a quantitative similarity function betwe...
research
11/15/2020

Online Ensemble Model Compression using Knowledge Distillation

This paper presents a novel knowledge distillation based model compressi...
research
07/03/2023

Review helps learn better: Temporal Supervised Knowledge Distillation

Reviewing plays an important role when learning knowledge. The knowledge...
research
12/01/2019

Online Knowledge Distillation with Diverse Peers

Distillation is an effective knowledge-transfer technique that uses pred...

Please sign up or login with your details

Forgot password? Click here to reset