Longformer: The Long-Document Transformer

04/10/2020
by   Iz Beltagy, et al.
0

Transformer-based models are unable to process long sequences due to their self-attention operation, which scales quadratically with the sequence length. To address this limitation, we introduce the Longformer with an attention mechanism that scales linearly with sequence length, making it easy to process documents of thousands of tokens or longer. Longformer's attention mechanism is a drop-in replacement for the standard self-attention and combines a local windowed attention with a task motivated global attention. Following prior work on long-sequence transformers, we evaluate Longformer on character-level language modeling and achieve state-of-the-art results on text8 and enwik8. In contrast to most prior work, we also pretrain Longformer and finetune it on a variety of downstream tasks. Our pretrained Longformer consistently outperforms RoBERTa on long document tasks and sets new state-of-the-art results on WikiHop and TriviaQA.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/07/2021

Nyströmformer: A Nyström-Based Algorithm for Approximating Self-Attention

Transformers have emerged as a powerful tool for a broad range of natura...
research
04/19/2022

On the Locality of Attention in Direct Speech Translation

Transformers have achieved state-of-the-art results across multiple NLP ...
research
05/19/2019

Adaptive Attention Span in Transformers

We propose a novel self-attention mechanism that can learn its optimal a...
research
02/21/2023

Hyena Hierarchy: Towards Larger Convolutional Language Models

Recent advances in deep learning have relied heavily on the use of large...
research
01/02/2021

Cross-Document Language Modeling

We introduce a new pretraining approach for language models that are gea...
research
11/07/2022

How Much Does Attention Actually Attend? Questioning the Importance of Attention in Pretrained Transformers

The attention mechanism is considered the backbone of the widely-used Tr...
research
07/05/2023

LongNet: Scaling Transformers to 1,000,000,000 Tokens

Scaling sequence length has become a critical demand in the era of large...

Please sign up or login with your details

Forgot password? Click here to reset