Students are the Best Teacher: Exit-Ensemble Distillation with Multi-Exits

04/01/2021
by   Hojung Lee, et al.
0

This paper proposes a novel knowledge distillation-based learning method to improve the classification performance of convolutional neural networks (CNNs) without a pre-trained teacher network, called exit-ensemble distillation. Our method exploits the multi-exit architecture that adds auxiliary classifiers (called exits) in the middle of a conventional CNN, through which early inference results can be obtained. The idea of our method is to train the network using the ensemble of the exits as the distillation target, which greatly improves the classification performance of the overall network. Our method suggests a new paradigm of knowledge distillation; unlike the conventional notion of distillation where teachers only teach students, we show that students can also help other students and even the teacher to learn better. Experimental results demonstrate that our method achieves significant improvement of classification performance on various popular CNN architectures (VGG, ResNet, ResNeXt, WideResNet, etc.). Furthermore, the proposed method can expedite the convergence of learning with improved stability. Our code will be available on Github.

READ FULL TEXT
research
02/25/2022

Learn From the Past: Experience Ensemble Knowledge Distillation

Traditional knowledge distillation transfers "dark knowledge" of a pre-t...
research
06/24/2022

Online Distillation with Mixed Sample Augmentation

Mixed Sample Regularization (MSR), such as MixUp or CutMix, is a powerfu...
research
09/17/2019

Ensemble Knowledge Distillation for Learning Improved and Efficient Networks

Ensemble models comprising of deep Convolutional Neural Networks (CNN) h...
research
07/04/2019

Graph-based Knowledge Distillation by Multi-head Attention Network

Knowledge distillation (KD) is a technique to derive optimal performance...
research
07/04/2019

Graph-based Knowledge Distillation by Multi-head Self-attention Network

Knowledge distillation (KD) is a technique to derive optimal performance...
research
11/23/2022

DGEKT: A Dual Graph Ensemble Learning Method for Knowledge Tracing

Knowledge tracing aims to trace students' evolving knowledge states by p...
research
04/14/2021

Sentence Embeddings by Ensemble Distillation

This paper contributes a new State Of The Art (SOTA) for Semantic Textua...

Please sign up or login with your details

Forgot password? Click here to reset