Alleviate Exposure Bias in Sequence Prediction with Recurrent Neural Networks

03/22/2021
by   Liping Yuan, et al.
0

A popular strategy to train recurrent neural networks (RNNs), known as “teacher forcing” takes the ground truth as input at each time step and makes the later predictions partly conditioned on those inputs. Such training strategy impairs their ability to learn rich distributions over entire sequences because the chosen inputs hinders the gradients back-propagating to all previous states in an end-to-end manner. We propose a fully differentiable training algorithm for RNNs to better capture long-term dependencies by recovering the probability of the whole sequence. The key idea is that at each time step, the network takes as input a “bundle” of similar words predicted at the previous step instead of a single ground truth. The representations of these similar words forms a convex hull, which can be taken as a kind of regularization to the input. Smoothing the inputs by this way makes the whole process trainable and differentiable. This design makes it possible for the model to explore more feasible combinations (possibly unseen sequences), and can be interpreted as a computationally efficient approximation to the beam search. Experiments on multiple sequence generation tasks yield performance improvements, especially in sequence-level metrics, such as BLUE or ROUGE-2.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/25/2018

Coupled Recurrent Network (CRN)

Many semantic video analysis tasks can benefit from multiple, heterogeno...
research
11/16/2015

A Neural Transducer

Sequence-to-sequence models have achieved impressive results on various ...
research
05/14/2018

Token-level and sequence-level loss smoothing for RNN language models

Despite the effectiveness of recurrent neural network language models, t...
research
06/12/2018

Quaternion Recurrent Neural Networks

Recurrent neural networks (RNNs) are powerful architectures to model seq...
research
03/22/2021

SparseGAN: Sparse Generative Adversarial Network for Text Generation

It is still a challenging task to learn a neural text generation model u...
research
02/22/2023

Learning from Predictions: Fusing Training and Autoregressive Inference for Long-Term Spatiotemporal Forecasts

Recurrent Neural Networks (RNNs) have become an integral part of modelin...
research
06/14/2017

SEARNN: Training RNNs with Global-Local Losses

We propose SEARNN, a novel training algorithm for recurrent neural netwo...

Please sign up or login with your details

Forgot password? Click here to reset