Nyströmformer: A Nyström-Based Algorithm for Approximating Self-Attention

02/07/2021
by   Yunyang Xiong, et al.
0

Transformers have emerged as a powerful tool for a broad range of natural language processing tasks. A key component that drives the impressive performance of Transformers is the self-attention mechanism that encodes the influence or dependence of other tokens on each specific token. While beneficial, the quadratic complexity of self-attention on the input sequence length has limited its application to longer sequences – a topic being actively studied in the community. To address this limitation, we propose Nyströmformer – a model that exhibits favorable scalability as a function of sequence length. Our idea is based on adapting the Nyström method to approximate standard self-attention with O(n) complexity. The scalability of Nyströmformer enables application to longer sequences with thousands of tokens. We perform evaluations on multiple downstream tasks on the GLUE benchmark and IMDB reviews with standard sequence length, and find that our Nyströmformer performs comparably, or in a few cases, even slightly better, than standard Transformer. Our code is at https://github.com/mlpen/Nystromformer.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/18/2021

You Only Sample (Almost) Once: Linear Cost Self-Attention Via Bernoulli Sampling

Transformer-based models are widely used in natural language processing ...
research
04/10/2020

Longformer: The Long-Document Transformer

Transformer-based models are unable to process long sequences due to the...
research
05/30/2023

Blockwise Parallel Transformer for Long Context Large Models

Transformers have emerged as the cornerstone of state-of-the-art natural...
research
07/21/2022

Multi Resolution Analysis (MRA) for Approximate Self-Attention

Transformers have emerged as a preferred model for many tasks in natural...
research
05/22/2023

FIT: Far-reaching Interleaved Transformers

We present FIT: a transformer-based architecture with efficient self-att...
research
03/30/2020

Code Prediction by Feeding Trees to Transformers

In this paper, we describe how to leverage Transformer, a recent neural ...
research
05/31/2023

Recasting Self-Attention with Holographic Reduced Representations

In recent years, self-attention has become the dominant paradigm for seq...

Please sign up or login with your details

Forgot password? Click here to reset