Your Autoregressive Generative Model Can be Better If You Treat It as an Energy-Based One

06/26/2022
by   Yezhen Wang, et al.
56

Autoregressive generative models are commonly used, especially for those tasks involving sequential data. They have, however, been plagued by a slew of inherent flaws due to the intrinsic characteristics of chain-style conditional modeling (e.g., exposure bias or lack of long-range coherence), severely limiting their ability to model distributions properly. In this paper, we propose a unique method termed E-ARM for training autoregressive generative models that takes advantage of a well-designed energy-based learning objective. By leveraging the extra degree of freedom of the softmax operation, we are allowed to make the autoregressive model itself be an energy-based model for measuring the likelihood of input without introducing any extra parameters. Furthermore, we show that E-ARM can be trained efficiently and is capable of alleviating the exposure bias problem and increase temporal coherence for autoregressive generative models. Extensive empirical results, covering benchmarks like language modeling, neural machine translation, and image generation, demonstrate the effectiveness of the proposed approach.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/18/2020

Glancing Transformer for Non-Autoregressive Neural Machine Translation

Non-autoregressive neural machine translation achieves remarkable infere...
research
05/02/2020

ENGINE: Energy-Based Inference Networks for Non-Autoregressive Machine Translation

We propose to train a non-autoregressive machine translation model to mi...
research
09/20/2019

Creative GANs for generating poems, lyrics, and metaphors

Generative models for text have substantially contributed to tasks like ...
research
06/28/2020

Scalable Deep Generative Modeling for Sparse Graphs

Learning graph generative models is a challenging task for deep learning...
research
10/01/2019

Generalization in Generation: A closer look at Exposure Bias

Exposure bias refers to the train-test discrepancy that seemingly arises...
research
08/03/2020

A Spectral Energy Distance for Parallel Speech Synthesis

Speech synthesis is an important practical generative modeling problem t...
research
09/21/2020

Haar Wavelet based Block Autoregressive Flows for Trajectories

Prediction of trajectories such as that of pedestrians is crucial to the...

Please sign up or login with your details

Forgot password? Click here to reset