The NLP Task Effectiveness of Long-Range Transformers

02/16/2022
by   Guanghui Qin, et al.
4

Transformer models cannot easily scale to long sequences due to their O(N^2) time and space complexity. This has led to Transformer variants seeking to lessen computational complexity, such as Longformer and Performer. While such models have theoretically greater efficiency, their effectiveness on real NLP tasks has not been well studied. We benchmark 7 variants of Transformer models on 5 difficult NLP tasks and 7 datasets. We design experiments to isolate the effect of pretraining and hyperparameter settings, to focus on their capacity for long-range attention. Moreover, we present various methods to investigate attention behaviors, to illuminate model details beyond metric scores. We find that attention of long-range transformers has advantages on content selection and query-guided decoding, but they come with previously unrecognized drawbacks such as insufficient attention to distant tokens.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/19/2021

Do Long-Range Language Models Actually Use Long-Range Context?

Language models are generally trained on short, truncated input sequence...
research
06/08/2023

RRWKV: Capturing Long-range Dependencies in RWKV

Owing to the impressive dot-product attention, the Transformers have bee...
research
12/14/2021

Simple Local Attentions Remain Competitive for Long-Context Tasks

Many NLP tasks require processing long contexts beyond the length limit ...
research
04/12/2021

Updater-Extractor Architecture for Inductive World State Representations

Developing NLP models traditionally involves two stages - training and a...
research
07/01/2019

Do Transformer Attention Heads Provide Transparency in Abstractive Summarization?

Learning algorithms become more powerful, often at the cost of increased...
research
06/09/2022

Unveiling Transformers with LEGO: a synthetic reasoning task

We propose a synthetic task, LEGO (Learning Equality and Group Operation...
research
04/26/2019

Transformers with convolutional context for ASR

The recent success of transformer networks for neural machine translatio...

Please sign up or login with your details

Forgot password? Click here to reset