On The Computational Complexity of Self-Attention

09/11/2022
by   Feyza Duman Keles, et al.
0

Transformer architectures have led to remarkable progress in many state-of-art applications. However, despite their successes, modern transformers rely on the self-attention mechanism, whose time- and space-complexity is quadratic in the length of the input. Several approaches have been proposed to speed up self-attention mechanisms to achieve sub-quadratic running time; however, the large majority of these works are not accompanied by rigorous error guarantees. In this work, we establish lower bounds on the computational complexity of self-attention in a number of scenarios. We prove that the time complexity of self-attention is necessarily quadratic in the input length, unless the Strong Exponential Time Hypothesis (SETH) is false. This argument holds even if the attention computation is performed only approximately, and for a variety of attention mechanisms. As a complement to our lower bounds, we show that it is indeed possible to approximate dot-product self-attention using finite Taylor series in linear-time, at the cost of having an exponential dependence on the polynomial order.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/08/2020

Linformer: Self-Attention with Linear Complexity

Large transformer models have shown extraordinary success in achieving s...
research
12/21/2020

Sub-Linear Memory: How to Make Performers SLiM

The Transformer architecture has revolutionized deep learning on sequent...
research
11/08/2022

Linear Self-Attention Approximation via Trainable Feedforward Kernel

In pursuit of faster computation, Efficient Transformers demonstrate an ...
research
04/10/2022

Linear Complexity Randomized Self-attention Mechanism

Recently, random feature attentions (RFAs) are proposed to approximate t...
research
03/22/2020

SAC: Accelerating and Structuring Self-Attention via Sparse Adaptive Connection

While the self-attention mechanism has been widely used in a wide variet...
research
05/20/2019

Less Memory, Faster Speed: Refining Self-Attention Module for Image Reconstruction

Self-attention (SA) mechanisms can capture effectively global dependenci...
research
11/15/2022

Latent Bottlenecked Attentive Neural Processes

Neural Processes (NPs) are popular methods in meta-learning that can est...

Please sign up or login with your details

Forgot password? Click here to reset