Multilingual Translation via Grafting Pre-trained Language Models

09/11/2021
by   Zewei Sun, et al.
0

Can pre-trained BERT for one language and GPT for another be glued together to translate texts? Self-supervised training using only monolingual data has led to the success of pre-trained (masked) language models in many NLP tasks. However, directly connecting BERT as an encoder and GPT as a decoder can be challenging in machine translation, for GPT-like models lack a cross-attention component that is needed in seq2seq decoders. In this paper, we propose Graformer to graft separately pre-trained (masked) language models for machine translation. With monolingual data for pre-training and parallel data for grafting training, we maximally take advantage of the usage of both types of data. Experiments on 60 directions show that our method achieves average improvements of 5.8 BLEU in x2en and 2.9 BLEU in en2x directions comparing with the multilingual Transformer of the same size.

READ FULL TEXT
research
11/07/2019

The LIG system for the English-Czech Text Translation Task of IWSLT 2019

In this paper, we present our submission for the English to Czech Text T...
research
05/13/2020

Parallel Corpus Filtering via Pre-trained Language Models

Web-crawled data provides a good source of parallel corpora for training...
research
11/08/2022

Parameter and Data Efficient Continual Pre-training for Robustness to Dialectal Variance in Arabic

The use of multilingual language models for tasks in low and high-resour...
research
05/12/2021

How Reliable are Model Diagnostics?

In the pursuit of a deeper understanding of a model's behaviour, there i...
research
05/13/2022

Controlling Translation Formality Using Pre-trained Multilingual Language Models

This paper describes the University of Maryland's submission to the Spec...
research
06/16/2021

TSSuBERT: Tweet Stream Summarization Using BERT

The development of deep neural networks and the emergence of pre-trained...
research
04/05/2020

Unsupervised Domain Clusters in Pretrained Language Models

The notion of "in-domain data" in NLP is often over-simplistic and vague...

Please sign up or login with your details

Forgot password? Click here to reset