Tailoring Language Generation Models under Total Variation Distance

02/26/2023
by   Haozhe Ji, et al.
0

The standard paradigm of neural language generation adopts maximum likelihood estimation (MLE) as the optimizing method. From a distributional view, MLE in fact minimizes the Kullback-Leibler divergence (KLD) between the distribution of the real data and that of the model. However, this approach forces the model to distribute non-zero (sometimes large) probability mass to all training samples regardless of their quality. Moreover, in the attempt to cover the low-probability regions in the data distribution, the model systematically overestimates the probability of corrupted text sequences, which we conjecture is one of the main reasons for text degeneration during autoregressive decoding. To remedy this problem, we leverage the total variation distance (TVD) with its robustness to outliers, and develop practical bounds to apply it to language generation. Then, we introduce the TaiLr objective that balances the tradeoff of estimating TVD. Intuitively, TaiLr downweights real data samples that have low model probabilities with tunable penalization intensity. Experimental results show that our method alleviates the overestimation of degenerated sequences without sacrificing diversity and improves generation quality on a wide range of text generation tasks.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/30/2022

Calibrating Sequence likelihood Improves Conditional Language Generation

Conditional language models are predominantly trained with maximum likel...
research
06/08/2023

SequenceMatch: Imitation Learning for Autoregressive Sequence Modelling with Backtracking

In many domains, autoregressive models can achieve low log-likelihood on...
research
07/12/2020

Improving Maximum Likelihood Training for Text Generation with Density Ratio Estimation

Auto-regressive sequence generative models trained by Maximum Likelihood...
research
08/12/2019

Neural Text Generation with Unlikelihood Training

Neural text generation is a key tool in natural language applications, b...
research
06/14/2021

Straight to the Gradient: Learning to Use Novel Tokens for Neural Text Generation

Advanced large-scale neural language models have led to significant succ...
research
03/13/2021

Improving Diversity of Neural Text Generation via Inverse Probability Weighting

The neural network based text generation suffers from the text degenerat...
research
04/22/2020

Trading Off Diversity and Quality in Natural Language Generation

For open-ended language generation tasks such as storytelling and dialog...

Please sign up or login with your details

Forgot password? Click here to reset