Over recent years, an increasing amount of compute and data has been pou...
Autoregressive transformers are spectacular models for short sequences b...
Generative language models define distributions over sequences of tokens...
We discover a robust self-supervised strategy tailored towards molecular...
Recent multimodal models such as DALL-E and CM3 have achieved remarkable...
Despite their wide adoption, the underlying training and memorization
dy...
We propose a simple and effective re-ranking method for improving passag...
Code is seldom written in a single left-to-right pass and is instead
rep...
We introduce CM3, a family of causally masked generative models trained ...
With the rise of large-scale pre-trained language models, open-domain
qu...
We present VideoCLIP, a contrastive approach to pre-train a unified mode...
We introduce HTLM, a hyper-text language model trained on a large-scale ...
Semantic parsing using sequence-to-sequence models allows parsing of dee...
We propose pre-finetuning, an additional large-scale learning stage betw...
Although pretrained language models can be fine-tuned to produce
state-o...
The structured representation for semantic parsing in task-oriented assi...
Although widely adopted, existing approaches for fine-tuning pre-trained...
We introduce MARGE, a pre-trained sequence-to-sequence model learned wit...
When a bilingual student learns to solve word problems in math, we expec...
Initialization of parameters in deep neural networks has been shown to h...