FastRPB: a Scalable Relative Positional Encoding for Long Sequence Tasks

02/23/2022
by   Maksim Zubkov, et al.
0

Transformers achieve remarkable performance in various domains, including NLP, CV, audio processing, and graph analysis. However, they do not scale well on long sequence tasks due to their quadratic complexity w.r.t. the inputs length. Linear Transformers were proposed to address this limitation. However, these models have shown weaker performance on the long sequence tasks comparing to the original one. In this paper, we explore Linear Transformer models, rethinking their two core components. Firstly, we improved Linear Transformer with Shift-Invariant Kernel Function SIKF, which achieve higher accuracy without loss in speed. Secondly, we introduce FastRPB which stands for Fast Relative Positional Bias, which efficiently adds positional information to self-attention using Fast Fourier Transformation. FastRPB is independent of the self-attention mechanism and can be combined with an original self-attention and all its efficient variants. FastRPB has O(N log(N)) computational complexity, requiring O(N) memory w.r.t. input sequence length N.

READ FULL TEXT

page 2

page 12

research
06/29/2020

Transformers are RNNs: Fast Autoregressive Transformers with Linear Attention

Transformers achieve remarkable performance in several tasks but due to ...
research
06/23/2021

Stable, Fast and Accurate: Kernelized Attention with Relative Positional Encoding

The attention module, which is a crucial component in Transformer, canno...
research
05/27/2022

What Dense Graph Do You Need for Self-Attention?

Transformers have made progress in miscellaneous tasks, but suffer from ...
research
06/14/2023

When to Use Efficient Self Attention? Profiling Text, Speech and Image Transformer Variants

We present the first unified study of the efficiency of self-attention-b...
research
11/29/2022

Lightweight Structure-Aware Attention for Visual Understanding

Vision Transformers (ViTs) have become a dominant paradigm for visual re...
research
10/04/2022

Memory in humans and deep language models: Linking hypotheses for model augmentation

The computational complexity of the self-attention mechanism in Transfor...
research
08/01/2022

Momentum Transformer: Closing the Performance Gap Between Self-attention and Its Linearization

Transformers have achieved remarkable success in sequence modeling and b...

Please sign up or login with your details

Forgot password? Click here to reset