Residual Energy-Based Models for Text Generation

04/22/2020
by   Yuntian Deng, et al.
0

Text generation is ubiquitous in many NLP tasks, from summarization, to dialogue and machine translation. The dominant parametric approach is based on locally normalized models which predict one word at a time. While these work remarkably well, they are plagued by exposure bias due to the greedy nature of the generation process. In this work, we investigate un-normalized energy-based models (EBMs) which operate not at the token but at the sequence level. In order to make training tractable, we first work in the residual of a pretrained locally normalized language model and second we train using noise contrastive estimation. Furthermore, since the EBM works at the sequence level, we can leverage pretrained bi-directional contextual representations, such as BERT and RoBERTa. Our experiments on two large language modeling datasets show that residual EBMs yield lower perplexity compared to locally normalized baselines. Moreover, generation via importance sampling is very efficient and of higher quality than the baseline models according to human evaluation.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/22/2020

Investigating Label Bias in Beam Search for Open-ended Text Generation

Beam search is an effective and widely used decoding algorithm in many s...
research
05/29/2022

CoNT: Contrastive Neural Text Generation

Recently, contrastive learning attracts increasing interests in neural t...
research
11/10/2019

Distilling the Knowledge of BERT for Text Generation

Large-scale pre-trained language model, such as BERT, has recently achie...
research
11/06/2022

Suffix Retrieval-Augmented Language Modeling

Causal language modeling (LM) uses word history to predict the next word...
research
10/22/2020

Autoregressive Modeling is Misspecified for Some Sequence Distributions

Should sequences be modeled autoregressively—one symbol at a time? How m...
research
11/11/2021

Self-Normalized Importance Sampling for Neural Language Modeling

To mitigate the problem of having to traverse over the full vocabulary i...
research
06/04/2021

Exposing the Implicit Energy Networks behind Masked Language Models via Metropolis–Hastings

While recent work has shown that scores from models trained by the ubiqu...

Please sign up or login with your details

Forgot password? Click here to reset