Semi-Online Knowledge Distillation

11/23/2021
by   Zhiqiang Liu, et al.
0

Knowledge distillation is an effective and stable method for model compression via knowledge transfer. Conventional knowledge distillation (KD) is to transfer knowledge from a large and well pre-trained teacher network to a small student network, which is a one-way process. Recently, deep mutual learning (DML) has been proposed to help student networks learn collaboratively and simultaneously. However, to the best of our knowledge, KD and DML have never been jointly explored in a unified framework to solve the knowledge distillation problem. In this paper, we investigate that the teacher model supports more trustworthy supervision signals in KD, while the student captures more similar behaviors from the teacher in DML. Based on these observations, we first propose to combine KD with DML in a unified framework. Furthermore, we propose a Semi-Online Knowledge Distillation (SOKD) method that effectively improves the performance of the student and the teacher. In this method, we introduce the peer-teaching training fashion in DML in order to alleviate the student's imitation difficulty, and also leverage the supervision signals provided by the well-trained teacher in KD. Besides, we also show our framework can be easily extended to feature-based distillation methods. Extensive experiments on CIFAR-100 and ImageNet datasets demonstrate the proposed method achieves state-of-the-art performance.

READ FULL TEXT
research
06/07/2020

Peer Collaborative Learning for Online Knowledge Distillation

Traditional knowledge distillation uses a two-stage training strategy to...
research
01/21/2021

Collaborative Teacher-Student Learning via Multiple Knowledge Transfer

Knowledge distillation (KD), as an efficient and effective model compres...
research
10/02/2020

Online Knowledge Distillation via Multi-branch Diversity Enhancement

Knowledge distillation is an effective method to transfer the knowledge ...
research
06/12/2020

Knowledge Distillation Meets Self-Supervision

Knowledge distillation, which involves extracting the "dark knowledge" f...
research
08/22/2022

Tree-structured Auxiliary Online Knowledge Distillation

Traditional knowledge distillation adopts a two-stage training process i...
research
12/11/2022

Learning What You Should Learn

In real teaching scenarios, an excellent teacher always teaches what he ...
research
01/23/2021

Network-Agnostic Knowledge Transfer for Medical Image Segmentation

Conventional transfer learning leverages weights of pre-trained networks...

Please sign up or login with your details

Forgot password? Click here to reset