Adaptive Label Smoothing with Self-Knowledge in Natural Language Generation

10/22/2022
by   Dongkyu Lee, et al.
0

Overconfidence has been shown to impair generalization and calibration of a neural network. Previous studies remedy this issue by adding a regularization term to a loss function, preventing a model from making a peaked distribution. Label smoothing smoothes target labels with a pre-defined prior label distribution; as a result, a model is learned to maximize the likelihood of predicting the soft label. Nonetheless, the amount of smoothing is the same in all samples and remains fixed in training. In other words, label smoothing does not reflect the change in probability distribution mapped by a model over the course of training. To address this issue, we propose a regularization scheme that brings dynamic nature into the smoothing parameter by taking model probability distribution into account, thereby varying the parameter per instance. A model in training self-regulates the extent of smoothing on the fly during forward propagation. Furthermore, inspired by recent work in bridging label smoothing and knowledge distillation, our work utilizes self-knowledge as a prior label distribution in softening target labels, and presents theoretical support for the regularization effect by knowledge distillation and the dynamic smoothing parameter. Our regularizer is validated comprehensively, and the result illustrates marked improvements in model generalization and calibration, enhancing robustness and trustworthiness of a model.

READ FULL TEXT
research
06/06/2019

When Does Label Smoothing Help?

The generalization and learning speed of a multi-class neural network ca...
research
10/11/2021

Instance-based Label Smoothing For Better Calibrated Classification Networks

Label smoothing is widely used in deep neural networks for multi-class c...
research
05/30/2021

Diversifying Dialog Generation via Adaptive Label Smoothing

Neural dialogue generation models trained with the one-hot target distri...
research
04/16/2020

Knowledge Distillation for Action Anticipation via Label Smoothing

Human capability to anticipate near future from visual observations and ...
research
12/12/2020

Normalized Label Distribution: Towards Learning Calibrated, Adaptable and Efficient Activation Maps

The vulnerability of models to data aberrations and adversarial attacks ...
research
07/26/2022

Efficient One Pass Self-distillation with Zipf's Label Smoothing

Self-distillation exploits non-uniform soft supervision from itself duri...
research
05/15/2023

Label Smoothing is Robustification against Model Misspecification

Label smoothing (LS) adopts smoothed targets in classification tasks. Fo...

Please sign up or login with your details

Forgot password? Click here to reset