Spherical Knowledge Distillation

10/15/2020
by   Jia Guo, et al.
6

Knowledge distillation aims at obtaining a small but effective deep model by transferring knowledge from a much larger one. The previous approaches try to reach this goal by simply "logit-supervised" information transferring between the teacher and student, which somehow can be subsequently decomposed as the transfer of normalized logits and l^2 norm. We argue that the norm of logits is actually interference, which damages the efficiency in the transfer process. To address this problem, we propose Spherical Knowledge Distillation (SKD). Specifically, we project the teacher and the student's logits into a unit sphere, and then we can efficiently perform knowledge distillation on the sphere. We verify our argument via theoretical analysis and ablation study. Extensive experiments have demonstrated the superiority and scalability of our method over the SOTAs.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/01/2023

Improved Knowledge Distillation for Pre-trained Language Models via Knowledge Selection

Knowledge distillation addresses the problem of transferring knowledge f...
research
05/31/2022

What Knowledge Gets Distilled in Knowledge Distillation?

Knowledge distillation aims to transfer useful information from a teache...
research
03/08/2021

Parser-Free Virtual Try-on via Distilling Appearance Flows

Image virtual try-on aims to fit a garment image (target clothes) to a p...
research
09/30/2020

Efficient Kernel Transfer in Knowledge Distillation

Knowledge distillation is an effective way for model compression in deep...
research
05/15/2018

Improving Knowledge Distillation with Supporting Adversarial Samples

Many recent works on knowledge distillation have provided ways to transf...
research
05/15/2018

Knowledge Distillation with Adversarial Samples Supporting Decision Boundary

Many recent works on knowledge distillation have provided ways to transf...
research
10/04/2022

Learning Deep Nets for Gravitational Dynamics with Unknown Disturbance through Physical Knowledge Distillation: Initial Feasibility Study

Learning high-performance deep neural networks for dynamic modeling of h...

Please sign up or login with your details

Forgot password? Click here to reset