Self-Knowledge Distillation: A Simple Way for Better Generalization

06/22/2020
by   Kyungyul Kim, et al.
0

The generalization capability of deep neural networks has been substantially improved by applying a wide spectrum of regularization methods, e.g., restricting function space, injecting randomness during training, augmenting data, etc. In this work, we propose a simple yet effective regularization method named self-knowledge distillation (Self-KD), which progressively distills a model's own knowledge to soften hard targets (i.e., one-hot vectors) during training. Hence, it can be interpreted within a framework of knowledge distillation as a student becomes a teacher itself. The proposed method is applicable to any supervised learning tasks with hard targets and can be easily combined with existing regularization methods to further enhance the generalization performance. Furthermore, we show that Self-KD achieves not only better accuracy, but also provides high quality of confidence estimates. Extensive experimental results on three different tasks, image classification, object detection, and machine translation, demonstrate that our method consistently improves the performance of the state-of-the-art baselines, and especially, it achieves state-of-the-art BLEU score of 30.0 and 36.2 on IWSLT15 English-to-German and German-to-English tasks, respectively.

READ FULL TEXT

page 7

page 13

page 14

page 15

research
07/06/2021

Embracing the Dark Knowledge: Domain Generalization Using Regularized Knowledge Distillation

Though convolutional neural networks are widely used in different tasks,...
research
08/11/2022

Self-Knowledge Distillation via Dropout

To boost the performance, deep neural networks require deeper or wider n...
research
05/27/2021

Selective Knowledge Distillation for Neural Machine Translation

Neural Machine Translation (NMT) models achieve state-of-the-art perform...
research
02/13/2020

Self-Distillation Amplifies Regularization in Hilbert Space

Knowledge distillation introduced in the deep learning context is a meth...
research
03/31/2020

Regularizing Class-wise Predictions via Self-knowledge Distillation

Deep neural networks with millions of parameters may suffer from poor ge...
research
02/14/2022

What is Next when Sequential Prediction Meets Implicitly Hard Interaction?

Hard interaction learning between source sequences and their next target...
research
04/03/2023

Domain Generalization for Crop Segmentation with Knowledge Distillation

In recent years, precision agriculture has gradually oriented farming cl...

Please sign up or login with your details

Forgot password? Click here to reset