Unsupervised Pretraining for Neural Machine Translation Using Elastic Weight Consolidation

10/19/2020
by   Dušan Variš, et al.
0

This work presents our ongoing research of unsupervised pretraining in neural machine translation (NMT). In our method, we initialize the weights of the encoder and decoder with two language models that are trained with monolingual data and then fine-tune the model on parallel data using Elastic Weight Consolidation (EWC) to avoid forgetting of the original language modeling tasks. We compare the regularization by EWC with the previous work that focuses on regularization by language modeling objectives. The positive result is that using EWC with the decoder achieves BLEU scores similar to the previous work. However, the model converges 2-3 times faster and does not require the original unlabeled training data during the fine-tuning stage. In contrast, the regularization using EWC is less effective if the original and new tasks are not closely related. We show that initializing the bidirectional NMT encoder with a left-to-right language model and forcing the model to remember the original left-to-right language modeling task limits the learning capacity of the encoder for the whole bidirectional context.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/20/2015

Improving Neural Machine Translation Models with Monolingual Data

Neural Machine Translation (NMT) has obtained state-of-the art performan...
research
06/10/2021

Exploring Unsupervised Pretraining Objectives for Machine Translation

Unsupervised cross-lingual pretraining has achieved strong results in ne...
research
10/16/2021

Towards Making the Most of Multilingual Pretraining for Zero-Shot Neural Machine Translation

This paper demonstrates that multilingual pretraining, a proper fine-tun...
research
11/01/2018

Language-Independent Representor for Neural Machine Translation

Current Neural Machine Translation (NMT) employs a language-specific enc...
research
09/03/2019

The Bottom-up Evolution of Representations in the Transformer: A Study with Machine Translation and Language Modeling Objectives

We seek to understand how the representations of individual tokens and t...
research
07/13/2023

In-context Autoencoder for Context Compression in a Large Language Model

We propose the In-context Autoencoder (ICAE) for context compression in ...
research
09/13/2022

Revisiting Neural Scaling Laws in Language and Vision

The remarkable progress in deep learning in recent years is largely driv...

Please sign up or login with your details

Forgot password? Click here to reset