Confidence-Aware Scheduled Sampling for Neural Machine Translation

07/22/2021
by   Yijin Liu, et al.
0

Scheduled sampling is an effective method to alleviate the exposure bias problem of neural machine translation. It simulates the inference scene by randomly replacing ground-truth target input tokens with predicted ones during training. Despite its success, its critical schedule strategies are merely based on training steps, ignoring the real-time model competence, which limits its potential performance and convergence speed. To address this issue, we propose confidence-aware scheduled sampling. Specifically, we quantify real-time model competence by the confidence of model predictions, based on which we design fine-grained schedule strategies. In this way, the model is exactly exposed to predicted tokens for high-confidence positions and still ground-truth tokens for low-confidence positions. Moreover, we observe vanilla scheduled sampling suffers from degenerating into the original teacher forcing mode since most predicted tokens are the same as ground-truth tokens. Therefore, under the above confidence-aware strategy, we further expose more noisy tokens (e.g., wordy and incorrect word order) instead of predicted ones for high-confidence token positions. We evaluate our approach on the Transformer and conduct experiments on large-scale WMT 2014 English-German, WMT 2014 English-French, and WMT 2019 Chinese-English. Results show that our approach significantly outperforms the Transformer and vanilla scheduled sampling on both translation quality and convergence speed.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/30/2021

Scheduled Sampling Based on Decoding Steps for Neural Machine Translation

Scheduled sampling is widely used to mitigate the exposure bias problem ...
research
06/06/2019

Bridging the Gap between Training and Inference for Neural Machine Translation

Neural Machine Translation (NMT) generates target words sequentially in ...
research
07/21/2020

Neural Machine Translation with Error Correction

Neural machine translation (NMT) generates the next target token given a...
research
05/14/2023

Towards Understanding and Improving Knowledge Distillation for Neural Machine Translation

Knowledge distillation (KD) is a promising technique for model compressi...
research
11/30/2019

Modeling Fluency and Faithfulness for Diverse Neural Machine Translation

Neural machine translation models usually adopt the teacher forcing stra...
research
05/26/2021

Bilingual Mutual Information Based Adaptive Training for Neural Machine Translation

Recently, token-level adaptive training has achieved promising improveme...
research
04/07/2021

FSR: Accelerating the Inference Process of Transducer-Based Models by Applying Fast-Skip Regularization

Transducer-based models, such as RNN-Transducer and transformer-transduc...

Please sign up or login with your details

Forgot password? Click here to reset