On the Complementarity between Pre-Training and Back-Translation for Neural Machine Translation

10/05/2021
by   Xuebo Liu, et al.
0

Pre-training (PT) and back-translation (BT) are two simple and powerful methods to utilize monolingual data for improving the model performance of neural machine translation (NMT). This paper takes the first step to investigate the complementarity between PT and BT. We introduce two probing tasks for PT and BT respectively and find that PT mainly contributes to the encoder module while BT brings more benefits to the decoder. Experimental results show that PT and BT are nicely complementary to each other, establishing state-of-the-art performances on the WMT16 English-Romanian and English-Russian benchmarks. Through extensive analyses on sentence originality and word frequency, we also demonstrate that combining Tagged BT with PT is more helpful to their complementarity, leading to better translation quality. Source code is freely available at https://github.com/SunbowLiu/PTvsBT.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/17/2021

On the Copying Behaviors of Pre-Training for Neural Machine Translation

Previous studies have shown that initializing neural machine translation...
research
08/21/2019

Improving Neural Machine Translation with Pre-trained Representation

Monolingual data has been demonstrated to be helpful in improving the tr...
research
09/17/2020

Code-switching pre-training for neural machine translation

This paper proposes a new pre-training method, called Code-Switching Pre...
research
09/07/2022

On the Complementarity between Pre-Training and Random-Initialization for Resource-Rich Machine Translation

Pre-Training (PT) of text representations has been successfully applied ...
research
07/30/2021

ChrEnTranslate: Cherokee-English Machine Translation Demo with Quality Estimation and Corrective Feedback

We introduce ChrEnTranslate, an online machine translation demonstration...
research
04/28/2022

UniTE: Unified Translation Evaluation

Translation quality evaluation plays a crucial role in machine translati...
research
08/19/2018

SentencePiece: A simple and language independent subword tokenizer and detokenizer for Neural Text Processing

This paper describes SentencePiece, a language-independent subword token...

Please sign up or login with your details

Forgot password? Click here to reset