Switchable Online Knowledge Distillation

09/12/2022
by   Biao Qian, et al.
0

Online Knowledge Distillation (OKD) improves the involved models by reciprocally exploiting the difference between teacher and student. Several crucial bottlenecks over the gap between them – e.g., Why and when does a large gap harm the performance, especially for student? How to quantify the gap between teacher and student? – have received limited formal study. In this paper, we propose Switchable Online Knowledge Distillation (SwitOKD), to answer these questions. Instead of focusing on the accuracy gap at test phase by the existing arts, the core idea of SwitOKD is to adaptively calibrate the gap at training phase, namely distillation gap, via a switching strategy between two modes – expert mode (pause the teacher while keep the student learning) and learning mode (restart the teacher). To possess an appropriate distillation gap, we further devise an adaptive switching threshold, which provides a formal criterion as to when to switch to learning mode or expert mode, and thus improves the student's performance. Meanwhile, the teacher benefits from our adaptive switching threshold and keeps basically on a par with other online arts. We further extend SwitOKD to multiple networks with two basis topologies. Finally, extensive experiments and analysis validate the merits of SwitOKD for classification over the state-of-the-arts. Our code is available at https://github.com/hfutqian/SwitOKD.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/05/2022

Spot-adaptive Knowledge Distillation

Knowledge distillation (KD) has become a well established paradigm for c...
research
06/11/2023

Adaptive Multi-Teacher Knowledge Distillation with Meta-Learning

Multi-Teacher knowledge distillation provides students with additional s...
research
03/28/2023

DisWOT: Student Architecture Search for Distillation WithOut Training

Knowledge distillation (KD) is an effective training strategy to improve...
research
05/28/2022

Parameter-Efficient and Student-Friendly Knowledge Distillation

Knowledge distillation (KD) has been extensively employed to transfer th...
research
08/15/2021

Multi-granularity for knowledge distillation

Considering the fact that students have different abilities to understan...
research
09/15/2022

On-Device Domain Generalization

We present a systematic study of domain generalization (DG) for tiny neu...
research
12/09/2020

Progressive Network Grafting for Few-Shot Knowledge Distillation

Knowledge distillation has demonstrated encouraging performances in deep...

Please sign up or login with your details

Forgot password? Click here to reset