Generalized Entropy Regularization or: There's Nothing Special about Label Smoothing

05/02/2020
by   Clara Meister, et al.
0

Prior work has explored directly regularizing the output distributions of probabilistic models to alleviate peaky (i.e. over-confident) predictions, a common sign of overfitting. This class of techniques, of which label smoothing is one, has a connection to entropy regularization. Despite the consistent success of label smoothing across architectures and data sets in language generation tasks, two problems remain open: (1) there is little understanding of the underlying effects entropy regularizers have on models, and (2) the full space of entropy regularization techniques is largely unexplored. We introduce a parametric family of entropy regularizers, which includes label smoothing as a special case, and use it to gain a better understanding of the relationship between the entropy of a model and its performance on language generation tasks. We also find that variance in model performance can be explained largely by the resulting entropy of the model. Lastly, we find that label smoothing provably does not allow for sparsity in an output distribution, an undesirable property for language generation models, and therefore advise the use of other entropy regularization methods in its place.

READ FULL TEXT

page 7

page 8

page 16

research
03/05/2020

Does label smoothing mitigate label noise?

Label smoothing is commonly used in training deep learning models, where...
research
01/23/2017

Regularizing Neural Networks by Penalizing Confident Output Distributions

We systematically explore regularizing neural networks by penalizing low...
research
10/23/2020

An Investigation of how Label Smoothing Affects Generalization

It has been hypothesized that label smoothing can reduce overfitting and...
research
02/13/2021

Capturing Label Distribution: A Case Study in NLI

We study estimating inherent human disagreement (annotation label distri...
research
07/23/2021

Similarity Based Label Smoothing For Dialogue Generation

Generative neural conversational systems are generally trained with the ...
research
08/05/2017

Inception Score, Label Smoothing, Gradient Vanishing and -log(D(x)) Alternative

In this paper, we study several GAN related topics mathematically, inclu...
research
10/19/2022

A Continuum of Generation Tasks for Investigating Length Bias and Degenerate Repetition

Language models suffer from various degenerate behaviors. These differ b...

Please sign up or login with your details

Forgot password? Click here to reset