CoLT5: Faster Long-Range Transformers with Conditional Computation

03/17/2023
by   Joshua Ainslie, et al.
0

Many natural language processing tasks benefit from long inputs, but processing long documents with Transformers is expensive – not only due to quadratic attention complexity but also from applying feedforward and projection layers to every token. However, not all tokens are equally important, especially for longer documents. We propose CoLT5, a long-input Transformer model that builds on this intuition by employing conditional computation, devoting more resources to important tokens in both feedforward and attention layers. We show that CoLT5 achieves stronger performance than LongT5 with much faster training and inference, achieving SOTA on the long-input SCROLLS benchmark. Moreover, CoLT5 can effectively and tractably make use of extremely long inputs, showing strong gains up to 64k input length.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/30/2023

Blockwise Parallel Transformer for Long Context Large Models

Transformers have emerged as the cornerstone of state-of-the-art natural...
research
09/11/2023

Long-Range Transformer Architectures for Document Understanding

Since their release, Transformers have revolutionized many fields from N...
research
05/02/2023

Unlimiformer: Long-Range Transformers with Unlimited Length Input

Transformer-based models typically have a predefined bound to their inpu...
research
05/07/2023

Vcc: Scaling Transformers to 128K Tokens or More by Prioritizing Important Tokens

Transformer models are foundational to natural language processing (NLP)...
research
05/30/2022

Attention Flows for General Transformers

In this paper, we study the computation of how much an input token in a ...
research
08/25/2023

Chunk, Align, Select: A Simple Long-sequence Processing Method for Transformers

Although dominant in natural language processing, transformer-based mode...
research
01/16/2022

Video Transformers: A Survey

Transformer models have shown great success modeling long-range interact...

Please sign up or login with your details

Forgot password? Click here to reset