Token-level and sequence-level loss smoothing for RNN language models

05/14/2018
by   Maha Elbayad, et al.
0

Despite the effectiveness of recurrent neural network language models, their maximum likelihood estimation suffers from two limitations. It treats all sentences that do not match the ground truth as equally poor, ignoring the structure of the output space. Second, it suffers from "exposure bias": during training tokens are predicted given ground-truth sequences, while at test time prediction is conditioned on generated output sequences. To overcome these limitations we build upon the recent reward augmented maximum likelihood approach sequence-level smoothing that encourages the model to predict sentences close to the ground truth according to a given performance metric. We extend this approach to token-level loss smoothing, and propose improvements to the sequence-level smoothing approach. Our experiments on two different tasks, image captioning and machine translation, show that token-level and sequence-level loss smoothing are complementary, and significantly improve results.

READ FULL TEXT

page 8

page 13

page 14

research
06/09/2015

Scheduled Sampling for Sequence Prediction with Recurrent Neural Networks

Recurrent Neural Networks can be trained to produce sequences of tokens ...
research
06/28/2017

Generative Bridging Network in Neural Sequence Prediction

Maximum Likelihood Estimation (MLE) suffers from data sparsity problem i...
research
08/29/2019

Translating Mathematical Formula Images to LaTeX Sequences Using Deep Neural Networks with Sequence-level Training

In this paper we propose a deep neural network model with an encoder-dec...
research
01/18/2019

Improving Sequence-to-Sequence Learning via Optimal Transport

Sequence-to-sequence models are commonly trained via maximum likelihood ...
research
06/10/2021

Mode recovery in neural autoregressive sequence modeling

Despite its wide use, recent studies have revealed unexpected and undesi...
research
08/24/2021

Reducing Exposure Bias in Training Recurrent Neural Network Transducers

When recurrent neural network transducers (RNNTs) are trained using the ...
research
03/22/2021

Alleviate Exposure Bias in Sequence Prediction with Recurrent Neural Networks

A popular strategy to train recurrent neural networks (RNNs), known as “...

Please sign up or login with your details

Forgot password? Click here to reset