ETC: Encoding Long and Structured Data in Transformers

04/17/2020
by   Joshua Ainslie, et al.
46

Transformer-based models have pushed the state of the art in many natural language processing tasks. However, one of their main limitations is the quadratic computational and memory cost of the standard attention mechanism. In this paper, we present a new family of Transformer models, which we call the Extended Transformer Construction (ETC), that allows for significant increases in input sequence length by introducing a new global-local attention mechanism between a global memory and the standard input tokens. We also show that combining global-local attention with relative position encodings allows ETC to handle structured data with ease. Empirical results on the Natural Questions data set show the promise of the approach.

READ FULL TEXT

page 5

page 6

research
07/14/2022

Recurrent Memory Transformer

Transformer-based models show their effectiveness across multiple domain...
research
12/15/2021

LongT5: Efficient Text-To-Text Transformer for Long Sequences

Recent work has shown that either (1) increasing the input length or (2)...
research
07/28/2020

Big Bird: Transformers for Longer Sequences

Transformers-based models, such as BERT, have been one of the most succe...
research
12/30/2021

ChunkFormer: Learning Long Time Series with Multi-stage Chunked Transformer

The analysis of long sequence data remains challenging in many real-worl...
research
05/04/2023

AttentionViz: A Global View of Transformer Attention

Transformer models are revolutionizing machine learning, but their inner...
research
05/25/2023

Landmark Attention: Random-Access Infinite Context Length for Transformers

While transformers have shown remarkable success in natural language pro...
research
06/05/2020

GMAT: Global Memory Augmentation for Transformers

Transformer-based models have become ubiquitous in natural language proc...

Please sign up or login with your details

Forgot password? Click here to reset