Understanding and Improving Sequence-to-Sequence Pretraining for Neural Machine Translation

03/16/2022
by   Wenxuan Wang, et al.
0

In this paper, we present a substantial step in better understanding the SOTA sequence-to-sequence (Seq2Seq) pretraining for neural machine translation (NMT). We focus on studying the impact of the jointly pretrained decoder, which is the main difference between Seq2Seq pretraining and previous encoder-based pretraining approaches for NMT. By carefully designing experiments on three language pairs, we find that Seq2Seq pretraining is a double-edged sword: On one hand, it helps NMT models to produce more diverse translations and reduce adequacy-related translation errors. On the other hand, the discrepancies between Seq2Seq pretraining and NMT finetuning limit the translation quality (i.e., domain discrepancy) and induce the over-estimation issue (i.e., objective discrepancy). Based on these observations, we further propose simple and effective strategies, named in-domain pretraining and input adaptation to remedy the domain and objective discrepancies, respectively. Experimental results on several language pairs show that our approach can consistently improve both translation performance and model robustness upon Seq2Seq pretraining.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/10/2021

Exploring Unsupervised Pretraining Objectives for Machine Translation

Unsupervised cross-lingual pretraining has achieved strong results in ne...
research
09/10/2022

Simple and Effective Gradient-Based Tuning of Sequence-to-Sequence Models

Recent trends towards training ever-larger language models have substant...
research
09/17/2019

Pointer-based Fusion of Bilingual Lexicons into Neural Machine Translation

Neural machine translation (NMT) systems require large amounts of high q...
research
02/27/2020

Echo State Neural Machine Translation

We present neural machine translation (NMT) models inspired by echo stat...
research
09/26/2019

Large-scale Pretraining for Neural Machine Translation with Tens of Billions of Sentence Pairs

In this paper, we investigate the problem of training neural machine tra...
research
09/16/2021

Improving Neural Machine Translation by Bidirectional Training

We present a simple and effective pretraining strategy – bidirectional t...
research
07/24/2021

MDQE: A More Accurate Direct Pretraining for Machine Translation Quality Estimation

It is expensive to evaluate the results of Machine Translation(MT), whic...

Please sign up or login with your details

Forgot password? Click here to reset