It is widely acknowledged that large models have the potential to delive...
Large language models, which are often trained for hundreds of thousands...
All-MLP architectures have attracted increasing interest as an alternati...
Mixture of Experts layers (MoEs) enable efficient scaling of language mo...
Large-scale autoregressive language models such as GPT-3 are few-shot
le...
This paper explores the environmental impact of the super-linear growth
...
During pretraining, the Pre-LayerNorm transformer suffers from a gradien...
Classical machine learning frameworks assume access to a possibly large
...
Recent work has demonstrated the effectiveness of cross-lingual language...
Few-shot algorithms aim at learning new tasks provided only a handful of...
The state of the art on many NLP tasks is currently achieved by large
pr...
Building open-domain chatbots is a challenging area for machine learning...
Text generation is ubiquitous in many NLP tasks, from summarization, to
...
Language models are of considerable importance. They are used for
pretra...
This paper shows that pretraining multilingual language models at scale ...
In this work, we study how the large-scale pretrain-finetune framework
c...
This paper describes Facebook AI's submission to the WAT 2019 Myanmar-En...
While we live in an increasingly interconnected world, different places ...
Back-translation is a widely used data augmentation technique which leve...
Language model pretraining has led to significant performance gains but
...
This paper describes Facebook FAIR's submission to the WMT19 shared news...
Recent advances in generative modeling of text have demonstrated remarka...
fairseq is an open-source sequence modeling toolkit that allows research...
Mixture models trained via EM are among the simplest, most widely used a...
The vast majority of language pairs in the world are low-resource becaus...
An effective method to improve neural machine translation with monolingu...
Sequence to sequence learning models still require several days to reach...
Machine translation systems achieve near human-level performance on some...
Machine translation is a popular test bed for research in neural
sequenc...
There has been much recent work on training neural attention models at t...