Differentiable Scheduled Sampling for Credit Assignment

04/23/2017
by   Kartik Goyal, et al.
0

We demonstrate that a continuous relaxation of the argmax operation can be used to create a differentiable approximation to greedy decoding for sequence-to-sequence (seq2seq) models. By incorporating this approximation into the scheduled sampling training procedure (Bengio et al., 2015)--a well-known technique for correcting exposure bias--we introduce a new training objective that is continuous and differentiable everywhere and that can provide informative gradients near points where previous decoding decisions change their value. In addition, by using a related approximation, we demonstrate a similar approach to sampled-based training. Finally, we show that our approach outperforms cross-entropy training and scheduled sampling procedures in two sequence prediction tasks: named entity recognition and machine translation.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/01/2017

A Continuous Relaxation of Beam Search for End-to-end Training of Neural Sequence Models

Beam search is a desirable choice of test-time decoding algorithm for ne...
research
04/04/2019

Differentiable Sampling with Flexible Reference Word Order for Neural Machine Translation

Despite some empirical success at correcting exposure bias in machine tr...
research
01/21/2019

Error-Correcting Neural Sequence Prediction

In this paper we propose a novel neural language modelling (NLM) method ...
research
09/10/2018

Greedy Search with Probabilistic N-gram Matching for Neural Machine Translation

Neural machine translation (NMT) models are usually trained with the wor...
research
09/13/2021

Mitigating Catastrophic Forgetting in Scheduled Sampling with Elastic Weight Consolidation in Neural Machine Translation

Despite strong performance in many sequence-to-sequence tasks, autoregre...
research
10/28/2020

CopyNext: Explicit Span Copying and Alignment in Sequence to Sequence Models

Copy mechanisms are employed in sequence to sequence models (seq2seq) to...
research
01/22/2019

Linearized Multi-Sampling for Differentiable Image Transformation

We propose a novel image sampling method for differentiable image transf...

Please sign up or login with your details

Forgot password? Click here to reset