Knowledge Distillation ≈ Label Smoothing: Fact or Fallacy?

01/30/2023
by   Md Arafat Sultan, et al.
0

Contrary to its original interpretation as a facilitator of knowledge transfer from one model to another, some recent studies have suggested that knowledge distillation (KD) is instead a form of regularization. Perhaps the strongest support of all for this claim is found in its apparent similarities with label smoothing (LS). This paper investigates the stated equivalence of these two methods by examining the predictive uncertainties of the models they train. Experiments on four text classification tasks involving teachers and students of different capacities show that: (a) In most settings, KD and LS drive model uncertainty (entropy) in completely opposite directions, and (b) In KD, the student's predictive uncertainty is a direct function of that of its teacher, reinforcing the knowledge transfer view.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/25/2019

Revisit Knowledge Distillation: a Teacher-free Framework

Knowledge Distillation (KD) aims to distill the knowledge of a cumbersom...
research
05/28/2022

Parameter-Efficient and Student-Friendly Knowledge Distillation

Knowledge distillation (KD) has been extensively employed to transfer th...
research
07/17/2022

Subclass Knowledge Distillation with Known Subclass Labels

This work introduces a novel knowledge distillation framework for classi...
research
04/01/2021

Is Label Smoothing Truly Incompatible with Knowledge Distillation: An Empirical Study

This work aims to empirically clarify a recently discovered perspective ...
research
06/09/2020

Self-Distillation as Instance-Specific Label Smoothing

It has been recently demonstrated that multi-generational self-distillat...
research
08/15/2019

Adaptive Regularization of Labels

Recently, a variety of regularization techniques have been widely applie...
research
06/29/2022

Revisiting Label Smoothing and Knowledge Distillation Compatibility: What was Missing?

This work investigates the compatibility between label smoothing (LS) an...

Please sign up or login with your details

Forgot password? Click here to reset