We present Belebele, a multiple-choice machine reading comprehension (MR...
In this work, we develop and release Llama 2, a collection of pretrained...
We present a theory for the previously unexplained divergent behavior no...
We introduce LLaMA, a collection of foundation language models ranging f...
Large multilingual language models typically rely on a single vocabulary...
Generative language models define distributions over sequences of tokens...
Recently self supervised learning has seen explosive growth and use in
v...
We present BlenderBot 3, a 175B parameter dialogue model capable of
open...
Prior work on language model pre-training has explored different
archite...
Multilingual pre-trained models are known to suffer from the curse of
mu...
Large language models, which are often trained for hundreds of thousands...
A multilingual tokenizer is a fundamental component of multilingual neur...
In this paper, we will evaluate the performance of graph neural networks...
We introduce CM3, a family of causally masked generative models trained ...
Mixture of Experts layers (MoEs) enable efficient scaling of language mo...
Large-scale autoregressive language models such as GPT-3 are few-shot
le...
This paper presents XLS-R, a large-scale model for cross-lingual speech
...
In this paper, we describe our end-to-end multilingual speech translatio...
One of the biggest challenges hindering progress in low-resource and
mul...
The scarcity of parallel data is a major obstacle for training high-qual...
Recent work has demonstrated the effectiveness of cross-lingual language...
We introduce a new balanced assignment of experts (BASE) layer for large...
We present mGENRE, a sequence-to-sequence system for the Multilingual En...
This paper describes Facebook AI's submission to WMT20 shared news
trans...
Existing work in translation demonstrated the potential of massively
mul...
Although widely adopted, existing approaches for fine-tuning pre-trained...
Recent work demonstrates the potential of multilingual pretraining of
cr...
Large pre-trained language models have been shown to store factual knowl...
Building open-domain chatbots is a challenging area for machine learning...
This paper demonstrates that multilingual denoising pre-training produce...
This paper shows that pretraining multilingual language models at scale ...
We present BART, a denoising autoencoder for pretraining sequence-to-seq...
Language model pretraining has led to significant performance gains but
...
Language change is a complex social phenomenon, revealing pathways of
co...