Emergent Communication Pretraining for Few-Shot Machine Translation

11/02/2020
by   Yaoyiran Li, et al.
0

While state-of-the-art models that rely upon massively multilingual pretrained encoders achieve sample efficiency in downstream applications, they still require abundant amounts of unlabelled text. Nevertheless, most of the world's languages lack such resources. Hence, we investigate a more radical form of unsupervised knowledge transfer in the absence of linguistic data. In particular, for the first time we pretrain neural networks via emergent communication from referential games. Our key assumption is that grounding communication on images—as a crude approximation of real-world environments—inductively biases the model towards learning natural languages. On the one hand, we show that this substantially benefits machine translation in few-shot settings. On the other hand, this also provides an extrinsic evaluation protocol to probe the properties of emergent languages ex vitro. Intuitively, the closer they are to natural languages, the higher the gains from pretraining on them should be. For instance, in this work we measure the influence of communication success and maximum sequence length on downstream performances. Finally, we introduce a customised adapter layer and annealing strategies for the regulariser of maximum-a-posteriori inference during fine-tuning. These turn out to be crucial to facilitate knowledge transfer and prevent catastrophic forgetting. Compared to a recurrent baseline, our method yields gains of 59.0%∼147.6% in BLEU score with only 500 NMT training instances and 65.1%∼196.7% with 1,000 NMT training instances across four language pairs. These proof-of-concept results reveal the potential of emergent communication pretraining for both natural language processing tasks in resource-poor settings and extrinsic evaluation of artificial languages.

READ FULL TEXT
research
09/30/2022

Language-Family Adapters for Multilingual Neural Machine Translation

Massively multilingual models pretrained on abundant corpora with self-s...
research
08/15/2019

Towards Making the Most of BERT in Neural Machine Translation

GPT-2 and BERT demonstrate the effectiveness of using pre-trained langua...
research
08/02/2020

Multilingual Translation with Extensible Multilingual Pretraining and Finetuning

Recent work demonstrates the potential of multilingual pretraining of cr...
research
09/10/2021

AfroMT: Pretraining Strategies and Reproducible Benchmarks for Translation of 8 African Languages

Reproducible benchmarks are crucial in driving progress of machine trans...
research
03/19/2022

Pretraining with Artificial Language: Studying Transferable Knowledge in Language Models

We investigate what kind of structural knowledge learned in neural netwo...
research
07/23/2021

Modelling Latent Translations for Cross-Lingual Transfer

While achieving state-of-the-art results in multiple tasks and languages...
research
10/03/2020

Mining Knowledge for Natural Language Inference from Wikipedia Categories

Accurate lexical entailment (LE) and natural language inference (NLI) of...

Please sign up or login with your details

Forgot password? Click here to reset