Learn From the Past: Experience Ensemble Knowledge Distillation

02/25/2022
by   Chaofei Wang, et al.
0

Traditional knowledge distillation transfers "dark knowledge" of a pre-trained teacher network to a student network, and ignores the knowledge in the training process of the teacher, which we call teacher's experience. However, in realistic educational scenarios, learning experience is often more important than learning results. In this work, we propose a novel knowledge distillation method by integrating the teacher's experience for knowledge transfer, named experience ensemble knowledge distillation (EEKD). We save a moderate number of intermediate models from the training process of the teacher model uniformly, and then integrate the knowledge of these intermediate models by ensemble technique. A self-attention module is used to adaptively assign weights to different intermediate models in the process of knowledge transfer. Three principles of constructing EEKD on the quality, weights and number of intermediate models are explored. A surprising conclusion is found that strong ensemble teachers do not necessarily produce strong students. The experimental results on CIFAR-100 and ImageNet show that EEKD outperforms the mainstream knowledge distillation methods and achieves the state-of-the-art. In particular, EEKD even surpasses the standard ensemble distillation on the premise of saving training cost.

READ FULL TEXT
research
09/21/2021

RAIL-KD: RAndom Intermediate Layer Mapping for Knowledge Distillation

Intermediate layer knowledge distillation (KD) can improve the standard ...
research
10/12/2022

Efficient Knowledge Distillation from Model Checkpoints

Knowledge distillation is an effective approach to learn compact models ...
research
04/01/2021

Students are the Best Teacher: Exit-Ensemble Distillation with Multi-Exits

This paper proposes a novel knowledge distillation-based learning method...
research
11/11/2022

PILE: Pairwise Iterative Logits Ensemble for Multi-Teacher Labeled Distillation

Pre-trained language models have become a crucial part of ranking system...
research
11/23/2022

DGEKT: A Dual Graph Ensemble Learning Method for Knowledge Tracing

Knowledge tracing aims to trace students' evolving knowledge states by p...
research
10/05/2022

Meta-Ensemble Parameter Learning

Ensemble of machine learning models yields improved performance as well ...
research
10/14/2021

ClonalNet: Classifying Better by Focusing on Confusing Categories

Existing neural classification networks predominately adopt one-hot enco...

Please sign up or login with your details

Forgot password? Click here to reset