Bridging the Gap between Training and Inference for Neural Machine Translation

06/06/2019
by   Wen Zhang, et al.
5

Neural Machine Translation (NMT) generates target words sequentially in the way of predicting the next word conditioned on the context words. At training time, it predicts with the ground truth words as context while at inference it has to generate the entire sequence from scratch. This discrepancy of the fed context leads to error accumulation among the way. Furthermore, word-level training requires strict matching between the generated sequence and the ground truth sequence which leads to overcorrection over different but reasonable translations. In this paper, we address these issues by sampling context words not only from the ground truth sequence but also from the predicted sequence by the model during training, where the predicted sequence is selected with a sentence-level optimum. Experiment results on Chinese->English and WMT'14 English->German translation tasks demonstrate that our approach can achieve significant improvements on multiple datasets.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/30/2019

Modeling Fluency and Faithfulness for Diverse Neural Machine Translation

Neural machine translation models usually adopt the teacher forcing stra...
research
09/10/2018

Greedy Search with Probabilistic N-gram Matching for Neural Machine Translation

Neural machine translation (NMT) models are usually trained with the wor...
research
07/22/2021

Confidence-Aware Scheduled Sampling for Neural Machine Translation

Scheduled sampling is an effective method to alleviate the exposure bias...
research
09/26/2017

Learning to Explain Non-Standard English Words and Phrases

We describe a data-driven approach for automatically explaining new, non...
research
02/28/2020

Modeling Future Cost for Neural Machine Translation

Existing neural machine translation (NMT) systems utilize sequence-to-se...
research
08/30/2021

Scheduled Sampling Based on Decoding Steps for Neural Machine Translation

Scheduled sampling is widely used to mitigate the exposure bias problem ...
research
01/18/2019

Improving Sequence-to-Sequence Learning via Optimal Transport

Sequence-to-sequence models are commonly trained via maximum likelihood ...

Please sign up or login with your details

Forgot password? Click here to reset