Sparse is Enough in Scaling Transformers

11/24/2021
by   Sebastian Jaszczur, et al.
6

Large Transformer models yield impressive results on many tasks, but are expensive to train, or even fine-tune, and so slow at decoding that their use and study becomes out of reach. We address this problem by leveraging sparsity. We study sparse variants for all layers in the Transformer and propose Scaling Transformers, a family of next generation Transformer models that use sparse layers to scale efficiently and perform unbatched decoding much faster than the standard Transformer as we scale up the model size. Surprisingly, the sparse layers are enough to obtain the same perplexity as the standard Transformer with the same number of parameters. We also integrate with prior sparsity approaches to attention and enable fast inference on long sequences even with limited memory. This results in performance competitive to the state-of-the-art on long text summarization.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/26/2021

Hierarchical Transformers Are More Efficient Language Models

Transformer models yield impressive results on many NLP and sequence mod...
research
01/13/2020

Reformer: The Efficient Transformer

Large Transformer models routinely achieve state-of-the-art results on a...
research
06/05/2023

Representational Strengths and Limitations of Transformers

Attention layers, as commonly used in transformers, form the backbone of...
research
10/22/2020

AdapterDrop: On the Efficiency of Adapters in Transformers

Massively pre-trained transformer models are computationally expensive t...
research
02/14/2023

Energy Transformer

Transformers have become the de facto models of choice in machine learni...
research
07/07/2022

Training Transformers Together

The infrastructure necessary for training state-of-the-art models is bec...
research
06/26/2019

Sharing Attention Weights for Fast Transformer

Recently, the Transformer machine translation system has shown strong re...

Please sign up or login with your details

Forgot password? Click here to reset