b'Sam Shleifer'

research

∙ 04/21/2023

PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel

It is widely acknowledged that large models have the potential to delive...

0 Yanli Zhao, et al. ∙

research

∙ 05/02/2022

OPT: Open Pre-trained Transformer Language Models

Large language models, which are often trained for hundreds of thousands...

8 Susan Zhang, et al. ∙

research

∙ 03/14/2022

Efficient Language Modeling with Sparse all-MLP

All-MLP architectures have attracted increasing interest as an alternati...

7 Ping Yu, et al. ∙

research

∙ 12/20/2021

Efficient Large Scale Language Modeling with Mixtures of Experts

Mixture of Experts layers (MoEs) enable efficient scaling of language mo...

10 Mikel Artetxe, et al. ∙

research

∙ 12/20/2021

Few-shot Learning with Multilingual Language Models

Large-scale autoregressive language models such as GPT-3 are few-shot le...

8 Xi Victoria Lin, et al. ∙

research

∙ 10/18/2021

NormFormer: Improved Transformer Pretraining with Extra Normalization

During pretraining, the Pre-LayerNorm transformer suffers from a gradien...

0 Sam Shleifer, et al. ∙

research

∙ 10/06/2021

8-bit Optimizers via Block-wise Quantization

Stateful optimizers maintain gradient statistics over time, e.g., the ex...

0 Tim Dettmers, et al. ∙

research

∙ 10/24/2020

Pre-trained Summarization Distillation

Recent state-of-the-art approaches to summarization utilize large pre-tr...

0 Sam Shleifer, et al. ∙

research

∙ 12/11/2019

Incrementally Improving Graph WaveNet Performance on Traffic Prediction

We present a series of modifications which improve upon Graph WaveNet's ...

0 Sam Shleifer, et al. ∙

research

∙ 11/16/2019

Classification as Decoder: Trading Flexibility for Control in Medical Dialogue

Generative seq2seq dialogue systems are trained to predict the next word...

0 Sam Shleifer, et al. ∙

research

∙ 10/04/2019

Classification As Decoder: Trading Flexibility For Control In Neural Dialogue

Generative seq2seq dialogue systems are trained to predict the next word...

0 Sam Shleifer, et al. ∙

research

∙ 06/12/2019

Using Small Proxy Datasets to Accelerate Hyperparameter Search

One of the biggest bottlenecks in a machine learning workflow is waiting...

0 Sam Shleifer, et al. ∙

research

∙ 03/21/2019

Low Resource Text Classification with ULMFit and Backtranslation

In computer vision, virtually every state of the art deep learning syste...

0 Sam Shleifer, et al. ∙

Sam Shleifer

Featured Co-authors

Sign in with Google

Consider DeepAI Pro