Learning Student-Friendly Teacher Networks for Knowledge Distillation

02/12/2021
by   Dae Young Park, et al.
0

We propose a novel knowledge distillation approach to facilitate the transfer of dark knowledge from a teacher to a student. Contrary to most of the existing methods that rely on effective training of student models given pretrained teachers, we aim to learn the teacher models that are friendly to students and, consequently, more appropriate for knowledge transfer. In other words, even at the time of optimizing a teacher model, the proposed algorithm learns the student branches jointly to obtain student-friendly representations. Since the main goal of our approach lies in training teacher models and the subsequent knowledge distillation procedure is straightforward, most of the existing knowledge distillation algorithms can adopt this technique to improve the performance of the student models in terms of accuracy and convergence speed. The proposed algorithm demonstrates outstanding accuracy in several well-known knowledge distillation techniques with various combinations of teacher and student architectures.

READ FULL TEXT

page 7

page 11

page 12

research
04/24/2023

Improving Knowledge Distillation Via Transferring Learning Ability

Existing knowledge distillation methods generally use a teacher-student ...
research
04/18/2023

Deep Collective Knowledge Distillation

Many existing studies on knowledge distillation have focused on methods ...
research
05/18/2023

Student-friendly Knowledge Distillation

In knowledge distillation, the knowledge from the teacher model is often...
research
05/28/2022

Parameter-Efficient and Student-Friendly Knowledge Distillation

Knowledge distillation (KD) has been extensively employed to transfer th...
research
09/30/2020

Efficient Kernel Transfer in Knowledge Distillation

Knowledge distillation is an effective way for model compression in deep...
research
05/09/2023

DynamicKD: An Effective Knowledge Distillation via Dynamic Entropy Correction-Based Distillation for Gap Optimizing

The knowledge distillation uses a high-performance teacher network to gu...
research
11/18/2020

Privileged Knowledge Distillation for Online Action Detection

Online Action Detection (OAD) in videos is proposed as a per-frame label...

Please sign up or login with your details

Forgot password? Click here to reset