Examining Scaling and Transfer of Language Model Architectures for Machine Translation

02/01/2022
by   Biao Zhang, et al.
7

Natural language understanding and generation models follow one of the two dominant architectural paradigms: language models (LMs) that process concatenated sequences in a single stack of layers, and encoder-decoder models (EncDec) that utilize separate layer stacks for input and output processing. In machine translation, EncDec has long been the favoured approach, but with few studies investigating the performance of LMs. In this work, we thoroughly examine the role of several architectural design choices on the performance of LMs on bilingual, (massively) multilingual and zero-shot translation tasks, under systematic variations of data conditions and model sizes. Our results show that: (i) Different LMs have different scaling properties, where architectural differences often have a significant impact on model performance at small scales, but the performance gap narrows as the number of parameters increases, (ii) Several design choices, including causal masking and language-modeling objectives for the source sequence, have detrimental effects on translation quality, and (iii) When paired with full-visible masking for source sequences, LMs could perform on par with EncDec on supervised bilingual and multilingual translation tasks, and improve greatly on zero-shot directions by facilitating the reduction of off-target translations.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/15/2021

Language Tags Matter for Zero-Shot Neural Machine Translation

Multilingual Neural Machine Translation (MNMT) has aroused widespread in...
research
04/04/2019

Consistency by Agreement in Zero-shot Neural Machine Translation

Generalization and reliability of multilingual translation often highly ...
research
10/01/2022

MALM: Mixing Augmented Language Modeling for Zero-Shot Machine Translation

Large pre-trained language models have brought remarkable progress in NL...
research
05/22/2023

Extrapolating Multilingual Understanding Models as Multilingual Generators

Multilingual understanding models (or encoder-based), pre-trained via ma...
research
10/27/2022

What Language Model to Train if You Have One Million GPU Hours?

The crystallization of modeling methods around the Transformer architect...
research
03/03/2023

Investigating the Translation Performance of a Large Multilingual Language Model: the Case of BLOOM

The NLP community recently saw the release of a new large open-access mu...
research
06/04/2019

KERMIT: Generative Insertion-Based Modeling for Sequences

We present KERMIT, a simple insertion-based approach to generative model...

Please sign up or login with your details

Forgot password? Click here to reset