Spot-adaptive Knowledge Distillation

05/05/2022
by   Jie Song, et al.
5

Knowledge distillation (KD) has become a well established paradigm for compressing deep neural networks. The typical way of conducting knowledge distillation is to train the student network under the supervision of the teacher network to harness the knowledge at one or multiple spots (i.e., layers) in the teacher network. The distillation spots, once specified, will not change for all the training samples, throughout the whole distillation process. In this work, we argue that distillation spots should be adaptive to training samples and distillation epochs. We thus propose a new distillation strategy, termed spot-adaptive KD (SAKD), to adaptively determine the distillation spots in the teacher network per sample, at every training iteration during the whole distillation period. As SAKD actually focuses on "where to distill" instead of "what to distill" that is widely investigated by most existing works, it can be seamlessly integrated into existing distillation methods to further improve their performance. Extensive experiments with 10 state-of-the-art distillers are conducted to demonstrate the effectiveness of SAKD for improving their distillation performance, under both homogeneous and heterogeneous distillation settings. Code is available at https://github.com/zju-vipa/spot-adaptive-pytorch

READ FULL TEXT

page 1

page 3

page 10

page 12

research
06/11/2023

Adaptive Multi-Teacher Knowledge Distillation with Meta-Learning

Multi-Teacher knowledge distillation provides students with additional s...
research
09/12/2022

Switchable Online Knowledge Distillation

Online Knowledge Distillation (OKD) improves the involved models by reci...
research
08/04/2020

Prime-Aware Adaptive Distillation

Knowledge distillation(KD) aims to improve the performance of a student ...
research
05/13/2022

Knowledge Distillation Meets Open-Set Semi-Supervised Learning

Existing knowledge distillation methods mostly focus on distillation of ...
research
12/20/2022

Adam: Dense Retrieval Distillation with Adaptive Dark Examples

To improve the performance of the dual-encoder retriever, one effective ...
research
03/16/2022

Decoupled Knowledge Distillation

State-of-the-art distillation methods are mainly based on distilling dee...
research
04/25/2023

Class Attention Transfer Based Knowledge Distillation

Previous knowledge distillation methods have shown their impressive perf...

Please sign up or login with your details

Forgot password? Click here to reset