Differentiable Sampling with Flexible Reference Word Order for Neural Machine Translation

04/04/2019
by   Weijia Xu, et al.
0

Despite some empirical success at correcting exposure bias in machine translation, scheduled sampling algorithms suffer from a major drawback: they incorrectly assume that words in the reference translations and in sampled sequences are aligned at each time step. Our new differentiable sampling algorithm addresses this issue by optimizing the probability that the reference can be aligned with the sampled output, based on a soft alignment predicted by the model itself. As a result, the output distribution at each time step is evaluated with respect to the whole predicted sequence. Experiments on IWSLT translation tasks show that our approach improves BLEU compared to maximum likelihood and scheduled sampling baselines. In addition, our approach is simpler to train with no need for sampling schedule and yields models that achieve larger improvements with smaller beam sizes.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/30/2022

Rephrasing the Reference for Non-Autoregressive Machine Translation

Non-autoregressive neural machine translation (NAT) models suffer from t...
research
09/13/2021

Mitigating Catastrophic Forgetting in Scheduled Sampling with Elastic Weight Consolidation in Neural Machine Translation

Despite strong performance in many sequence-to-sequence tasks, autoregre...
research
03/08/2020

Investigating the Decoders of Maximum Likelihood Sequence Models: A Look-ahead Approach

We demonstrate how we can practically incorporate multi-step future info...
research
04/23/2017

Differentiable Scheduled Sampling for Credit Assignment

We demonstrate that a continuous relaxation of the argmax operation can ...
research
09/10/2018

Greedy Search with Probabilistic N-gram Matching for Neural Machine Translation

Neural machine translation (NMT) models are usually trained with the wor...
research
09/26/2019

Attention Forcing for Sequence-to-sequence Model Training

Auto-regressive sequence-to-sequence models with attention mechanism hav...
research
03/14/2019

Stochastic Beams and Where to Find Them: The Gumbel-Top-k Trick for Sampling Sequences Without Replacement

The well-known Gumbel-Max trick for sampling from a categorical distribu...

Please sign up or login with your details

Forgot password? Click here to reset