Few-shot Learning with Multilingual Language Models

12/20/2021
by   Xi Victoria Lin, et al.
8

Large-scale autoregressive language models such as GPT-3 are few-shot learners that can perform a wide range of language tasks without fine-tuning. While these models are known to be able to jointly represent many different languages, their training data is dominated by English, potentially limiting their cross-lingual generalization. In this work, we train multilingual autoregressive language models on a balanced corpus covering a diverse set of languages, and study their few- and zero-shot learning capabilities in a wide range of tasks. Our largest model with 7.5 billion parameters sets new state of the art in few-shot learning in more than 20 representative languages, outperforming GPT-3 of comparable size in multilingual commonsense reasoning (with +7.4 4-shot settings) and natural language inference (+5.4 4-shot settings). On the FLORES-101 machine translation benchmark, our model outperforms GPT-3 on 171 out of 182 translation directions with 32 training examples, while surpassing the official supervised baseline in 45 directions. We present a detailed analysis of where the model succeeds and fails, showing in particular that it enables cross-lingual in-context learning on some tasks, while there is still room for improvement on surface form robustness and adaptation to tasks that do not have a natural cloze form. Finally, we evaluate our models in social value tasks such as hate speech detection in five languages and find it has limitations similar to comparable sized GPT-3 models.

READ FULL TEXT

page 11

page 27

page 28

page 29

page 31

research
05/24/2023

BUFFET: Benchmarking Large Language Models for Few-shot Cross-lingual Transfer

Despite remarkable advancements in few-shot generalization in natural la...
research
02/07/2022

Cedille: A large autoregressive French language model

Scaling up the size and training of autoregressive language models has e...
research
09/08/2021

Discrete and Soft Prompting for Multilingual Models

It has been shown for English that discrete and soft prompting perform s...
research
04/15/2022

mGPT: Few-Shot Learners Go Multilingual

Recent studies report that autoregressive language models can successful...
research
12/21/2022

JASMINE: Arabic GPT Models for Few-Shot Learning

Task agnostic generative pretraining (GPT) has recently proved promising...
research
02/08/2023

A Multitask, Multilingual, Multimodal Evaluation of ChatGPT on Reasoning, Hallucination, and Interactivity

This paper proposes a framework for quantitatively evaluating interactiv...
research
09/16/2021

Language Models are Few-shot Multilingual Learners

General-purpose language models have demonstrated impressive capabilitie...

Please sign up or login with your details

Forgot password? Click here to reset