Large-scale Pretraining for Neural Machine Translation with Tens of Billions of Sentence Pairs

09/26/2019
by   Yuxian Meng, et al.
0

In this paper, we investigate the problem of training neural machine translation (NMT) systems with a dataset of more than 40 billion bilingual sentence pairs, which is larger than the largest dataset to date by orders of magnitude. Unprecedented challenges emerge in this situation compared to previous NMT work, including severe noise in the data and prohibitively long training time. We propose practical solutions to handle these issues and demonstrate that large-scale pretraining significantly improves NMT performance. We are able to push the BLEU score of WMT17 Chinese-English dataset to 32.3, with a significant performance boost of +3.2 over existing state-of-the-art results.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/30/2016

Supervised Attentions for Neural Machine Translation

In this paper, we improve the attention or alignment accuracy of neural ...
research
06/06/2022

Bi-SimCut: A Simple Strategy for Boosting Neural Machine Translation

We introduce Bi-SimCut: a simple but effective training strategy to boos...
research
05/02/2018

KNPTC: Knowledge and Neural Machine Translation Powered Chinese Pinyin Typo Correction

Chinese pinyin input methods are very important for Chinese language pro...
research
05/03/2017

Chunk-Based Bi-Scale Decoder for Neural Machine Translation

In typical neural machine translation (NMT), the decoder generates a sen...
research
05/01/2018

Dynamic Sentence Sampling for Efficient Training of Neural Machine Translation

Traditional Neural machine translation (NMT) involves a fixed training p...
research
03/16/2022

Understanding and Improving Sequence-to-Sequence Pretraining for Neural Machine Translation

In this paper, we present a substantial step in better understanding the...
research
05/04/2023

Unified Model Learning for Various Neural Machine Translation

Existing neural machine translation (NMT) studies mainly focus on develo...

Please sign up or login with your details

Forgot password? Click here to reset