Self-Knowledge Distillation via Dropout

08/11/2022
by   Hyoje Lee, et al.
0

To boost the performance, deep neural networks require deeper or wider network structures that involve massive computational and memory costs. To alleviate this issue, the self-knowledge distillation method regularizes the model by distilling the internal knowledge of the model itself. Conventional self-knowledge distillation methods require additional trainable parameters or are dependent on the data. In this paper, we propose a simple and effective self-knowledge distillation method using a dropout (SD-Dropout). SD-Dropout distills the posterior distributions of multiple models through a dropout sampling. Our method does not require any additional trainable modules, does not rely on data, and requires only simple operations. Furthermore, this simple method can be easily combined with various self-knowledge distillation approaches. We provide a theoretical and experimental analysis of the effect of forward and reverse KL-divergences in our work. Extensive experiments on various vision tasks, i.e., image classification, object detection, and distribution shift, demonstrate that the proposed method can effectively improve the generalization of a single network. Further experiments show that the proposed method also improves calibration performance, adversarial robustness, and out-of-distribution detection ability.

READ FULL TEXT
research
06/22/2020

Self-Knowledge Distillation: A Simple Way for Better Generalization

The generalization capability of deep neural networks has been substanti...
research
03/31/2020

Regularizing Class-wise Predictions via Self-knowledge Distillation

Deep neural networks with millions of parameters may suffer from poor ge...
research
07/06/2021

Embracing the Dark Knowledge: Domain Generalization Using Regularized Knowledge Distillation

Though convolutional neural networks are widely used in different tasks,...
research
02/08/2018

Imitation networks: Few-shot learning of neural networks from scratch

In this paper, we propose imitation networks, a simple but effective met...
research
08/26/2021

Efficient training of lightweight neural networks using Online Self-Acquired Knowledge Distillation

Knowledge Distillation has been established as a highly promising approa...
research
04/01/2022

Rethinking Position Bias Modeling with Knowledge Distillation for CTR Prediction

Click-through rate (CTR) Prediction is of great importance in real-world...
research
09/30/2022

Towards a Unified View of Affinity-Based Knowledge Distillation

Knowledge transfer between artificial neural networks has become an impo...

Please sign up or login with your details

Forgot password? Click here to reset