Self-Distillation as Instance-Specific Label Smoothing

06/09/2020
by   Mert R. Sabuncu, et al.
0

It has been recently demonstrated that multi-generational self-distillation can improve generalization. Despite this intriguing observation, reasons for the enhancement remain poorly understood. In this paper, we first demonstrate experimentally that the improved performance of multi-generational self-distillation is in part associated with the increasing diversity in teacher predictions. With this in mind, we offer a new interpretation for teacher-student training as amortized MAP estimation, such that teacher predictions enable instance-specific regularization. Our framework allows us to theoretically relate self-distillation to label smoothing, a commonly used technique that regularizes predictive uncertainty, and suggests the importance of predictive diversity in addition to predictive uncertainty. We present experimental results using multiple datasets and neural network architectures that, overall, demonstrate the utility of predictive diversity. Finally, we propose a novel instance-specific label smoothing technique that promotes predictive diversity without the need for a separately trained teacher model. We provide an empirical evaluation of the proposed method, which, we find, often outperforms classical label smoothing.

READ FULL TEXT

page 8

page 14

page 15

research
10/11/2021

Instance-based Label Smoothing For Better Calibrated Classification Networks

Label smoothing is widely used in deep neural networks for multi-class c...
research
09/25/2019

Revisit Knowledge Distillation: a Teacher-free Framework

Knowledge Distillation (KD) aims to distill the knowledge of a cumbersom...
research
01/30/2023

Knowledge Distillation ≈ Label Smoothing: Fact or Fallacy?

Contrary to its original interpretation as a facilitator of knowledge tr...
research
09/11/2020

Extending Label Smoothing Regularization with Self-Knowledge Distillation

Inspired by the strong correlation between the Label Smoothing Regulariz...
research
01/30/2023

Understanding Self-Distillation in the Presence of Label Noise

Self-distillation (SD) is the process of first training a teacher model ...
research
06/04/2021

Churn Reduction via Distillation

In real-world systems, models are frequently updated as more data become...
research
07/26/2022

Efficient One Pass Self-distillation with Zipf's Label Smoothing

Self-distillation exploits non-uniform soft supervision from itself duri...

Please sign up or login with your details

Forgot password? Click here to reset