When Does Label Smoothing Help?

06/06/2019
by   Rafael Müller, et al.
7

The generalization and learning speed of a multi-class neural network can often be significantly improved by using soft targets that are a weighted average of the hard targets and the uniform distribution over labels. Smoothing the labels in this way prevents the network from becoming over-confident and label smoothing has been used in many state-of-the-art models, including image classification, language translation and speech recognition. Despite its widespread use, label smoothing is still poorly understood. Here we show empirically that in addition to improving generalization, label smoothing improves model calibration which can significantly improve beam-search. However, we also observe that if a teacher network is trained with label smoothing, knowledge distillation into a student network is much less effective. To explain these observations, we visualize how label smoothing changes the representations learned by the penultimate layer of the network. We show that label smoothing encourages the representations of training examples from the same class to group in tight clusters. This results in loss of information in the logits about resemblances between instances of different classes, which is necessary for distillation, but does not hurt generalization or calibration of the model's predictions.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/11/2021

Instance-based Label Smoothing For Better Calibrated Classification Networks

Label smoothing is widely used in deep neural networks for multi-class c...
research
10/22/2022

Adaptive Label Smoothing with Self-Knowledge in Natural Language Generation

Overconfidence has been shown to impair generalization and calibration o...
research
03/05/2020

Does label smoothing mitigate label noise?

Label smoothing is commonly used in training deep learning models, where...
research
04/01/2021

Is Label Smoothing Truly Incompatible with Knowledge Distillation: An Empirical Study

This work aims to empirically clarify a recently discovered perspective ...
research
07/23/2021

Similarity Based Label Smoothing For Dialogue Generation

Generative neural conversational systems are generally trained with the ...
research
04/16/2020

Knowledge Distillation for Action Anticipation via Label Smoothing

Human capability to anticipate near future from visual observations and ...
research
09/14/2020

Adaptive Label Smoothing

This paper concerns the use of objectness measures to improve the calibr...

Please sign up or login with your details

Forgot password? Click here to reset