Guiding Teacher Forcing with Seer Forcing for Neural Machine Translation

06/12/2021
by   Yang Feng, et al.
0

Although teacher forcing has become the main training paradigm for neural machine translation, it usually makes predictions only conditioned on past information, and hence lacks global planning for the future. To address this problem, we introduce another decoder, called seer decoder, into the encoder-decoder framework during training, which involves future information in target predictions. Meanwhile, we force the conventional decoder to simulate the behaviors of the seer decoder via knowledge distillation. In this way, at test the conventional decoder can perform like the seer decoder without the attendance of it. Experiment results on the Chinese-English, English-German and English-Romanian translation tasks show our method can outperform competitive baselines significantly and achieves greater improvements on the bigger data sets. Besides, the experiments also prove knowledge distillation the best way to transfer knowledge from the seer decoder to the conventional decoder compared to adversarial learning and L2 regularization.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/27/2021

Selective Knowledge Distillation for Neural Machine Translation

Neural Machine Translation (NMT) models achieve state-of-the-art perform...
research
08/05/2017

Neural Machine Translation with Word Predictions

In the encoder-decoder architecture for neural machine translation (NMT)...
research
12/23/2020

Future-Guided Incremental Transformer for Simultaneous Translation

Simultaneous translation (ST) starts translations synchronously while re...
research
02/28/2022

Confidence Based Bidirectional Global Context Aware Training Framework for Neural Machine Translation

Most dominant neural machine translation (NMT) models are restricted to ...
research
05/14/2023

Towards Understanding and Improving Knowledge Distillation for Neural Machine Translation

Knowledge distillation (KD) is a promising technique for model compressi...
research
04/08/2019

Improving Domain Adaptation Translation with Domain Invariant and Specific Information

In domain adaptation for neural machine translation, translation perform...
research
04/01/2021

Sampling and Filtering of Neural Machine Translation Distillation Data

In most of neural machine translation distillation or stealing scenarios...

Please sign up or login with your details

Forgot password? Click here to reset