Towards Making the Most of BERT in Neural Machine Translation

08/15/2019
by   Jiacheng Yang, et al.
0

GPT-2 and BERT demonstrate the effectiveness of using pre-trained language models (LMs) on various natural language processing tasks. However, LM fine-tuning often suffers from catastrophic forgetting when applied to resource-rich tasks. In this work, we introduce a concerted training framework () that is the key to integrate the pre-trained LMs to neural machine translation (NMT). Our proposed Cnmt consists of three techniques: a) asymptotic distillation to ensure that the NMT model can retain the previous pre-trained knowledge; * a dynamic switching gate to avoid catastrophic forgetting of pre-trained knowledge; and b)a strategy to adjust the learning paces according to a scheduled policy. Our experiments in machine translation show gains of up to 3 BLEU score on the WMT14 English-German language pair which even surpasses the previous state-of-the-art pre-training aided NMT by 1.4 BLEU score. While for the large WMT14 English-French task with 40 millions of sentence-pairs, our base model still significantly improves upon the state-of-the-art Transformer big model by more than 1 BLEU score.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/04/2019

Acquiring Knowledge from Pre-trained Model to Neural Machine Translation

Pre-training and fine-tuning have achieved great success in the natural ...
research
09/09/2021

BERT, mBERT, or BiBERT? A Study on Contextualized Embeddings for Neural Machine Translation

The success of bidirectional encoders using masked language models, such...
research
11/02/2020

Emergent Communication Pretraining for Few-Shot Machine Translation

While state-of-the-art models that rely upon massively multilingual pret...
research
04/07/2021

Better Neural Machine Translation by Extracting Linguistic Information from BERT

Adding linguistic information (syntax or semantics) to neural machine tr...
research
10/13/2020

Incorporating BERT into Parallel Sequence Decoding with Adapters

While large scale pre-trained language models such as BERT have achieved...
research
11/18/2022

A Copy Mechanism for Handling Knowledge Base Elements in SPARQL Neural Machine Translation

Neural Machine Translation (NMT) models from English to SPARQL are a pro...
research
10/12/2022

Using Massive Multilingual Pre-Trained Language Models Towards Real Zero-Shot Neural Machine Translation in Clinical Domain

Massively multilingual pre-trained language models (MMPLMs) are develope...

Please sign up or login with your details

Forgot password? Click here to reset