AlexaTM 20B: Few-Shot Learning Using a Large-Scale Multilingual Seq2Seq Model

08/02/2022
by   Saleh Soltan, et al.
1

In this work, we demonstrate that multilingual large-scale sequence-to-sequence (seq2seq) models, pre-trained on a mixture of denoising and Causal Language Modeling (CLM) tasks, are more efficient few-shot learners than decoder-only models on various tasks. In particular, we train a 20 billion parameter multilingual seq2seq model called Alexa Teacher Model (AlexaTM 20B) and show that it achieves state-of-the-art (SOTA) performance on 1-shot summarization tasks, outperforming a much larger 540B PaLM decoder model. AlexaTM 20B also achieves SOTA in 1-shot machine translation, especially for low-resource languages, across almost all language pairs supported by the model (Arabic, English, French, German, Hindi, Italian, Japanese, Marathi, Portuguese, Spanish, Tamil, and Telugu) on Flores-101 dataset. We also show in zero-shot setting, AlexaTM 20B outperforms GPT3 (175B) on SuperGLUE and SQuADv2 datasets and provides SOTA performance on multilingual tasks such as XNLI, XCOPA, Paws-X, and XWinograd. Overall, our results present a compelling case for seq2seq models as a powerful alternative to decoder-only models for Large-scale Language Model (LLM) training.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/23/2023

Exploring Representational Disparities Between Multilingual and Bilingual Translation Models

Multilingual machine translation has proven immensely useful for low-res...
research
10/16/2021

Towards Making the Most of Multilingual Pretraining for Zero-Shot Neural Machine Translation

This paper demonstrates that multilingual pretraining, a proper fine-tun...
research
08/30/2021

On the Multilingual Capabilities of Very Large-Scale English Language Models

Generative Pre-trained Transformers (GPTs) have recently been scaled to ...
research
02/02/2023

The unreasonable effectiveness of few-shot learning for machine translation

We demonstrate the potential of few-shot translation systems, trained wi...
research
07/27/2023

Exploiting the Potential of Seq2Seq Models as Robust Few-Shot Learners

In-context learning, which offers substantial advantages over fine-tunin...
research
04/15/2022

mGPT: Few-Shot Learners Go Multilingual

Recent studies report that autoregressive language models can successful...
research
10/27/2022

What Language Model to Train if You Have One Million GPU Hours?

The crystallization of modeling methods around the Transformer architect...

Please sign up or login with your details

Forgot password? Click here to reset