Scheduled Sampling for Sequence Prediction with Recurrent Neural Networks

06/09/2015
by   Samy Bengio, et al.
0

Recurrent Neural Networks can be trained to produce sequences of tokens given some input, as exemplified by recent results in machine translation and image captioning. The current approach to training them consists of maximizing the likelihood of each token in the sequence given the current (recurrent) state and the previous token. At inference, the unknown previous token is then replaced by a token generated by the model itself. This discrepancy between training and inference can yield errors that can accumulate quickly along the generated sequence. We propose a curriculum learning strategy to gently change the training process from a fully guided scheme using the true previous token, towards a less guided scheme which mostly uses the generated token instead. Experiments on several sequence prediction tasks show that this approach yields significant improvements. Moreover, it was used successfully in our winning entry to the MSCOCO image captioning challenge, 2015.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/14/2018

Token-level and sequence-level loss smoothing for RNN language models

Despite the effectiveness of recurrent neural network language models, t...
research
07/21/2020

Neural Machine Translation with Error Correction

Neural machine translation (NMT) generates the next target token given a...
research
12/28/2020

Neural Text Generation with Artificial Negative Examples

Neural text generation models conditioning on given input (e.g. machine ...
research
01/02/2018

Character-level Recurrent Neural Networks in Practice: Comparing Training and Sampling Schemes

Recurrent neural networks are nowadays successfully used in an abundance...
research
11/01/2019

Sequence Modeling with Unconstrained Generation Order

The dominant approach to sequence generation is to produce a sequence in...
research
12/12/2018

Sentence-wise Smooth Regularization for Sequence to Sequence Learning

Maximum-likelihood estimation (MLE) is widely used in sequence to sequen...
research
05/29/2015

A Critical Review of Recurrent Neural Networks for Sequence Learning

Countless learning tasks require dealing with sequential data. Image cap...

Please sign up or login with your details

Forgot password? Click here to reset