Predicting Attention Sparsity in Transformers

09/24/2021
by   Marcos Treviso, et al.
29

A bottleneck in transformer architectures is their quadratic complexity with respect to the input sequence, which has motivated a body of work on efficient sparse approximations to softmax. An alternative path, used by entmax transformers, consists of having built-in exact sparse attention; however this approach still requires quadratic computation. In this paper, we propose Sparsefinder, a simple model trained to identify the sparsity pattern of entmax attention before computing it. We experiment with three variants of our method, based on distances, quantization, and clustering, on two tasks: machine translation (attention in the decoder) and masked language modeling (encoder-only). Our work provides a new angle to study model efficiency by doing extensive analysis of the tradeoff between the sparsity and recall of the predicted attention graph. This allows for detailed comparison between different models, and may guide future benchmarks for sparse models.

READ FULL TEXT
research
03/03/2021

Random Feature Attention

Transformers are state-of-the-art models for a variety of sequence model...
research
10/21/2021

Transformer Acceleration with Dynamic Sparse Attention

Transformers are the mainstream of NLP applications and are becoming inc...
research
08/30/2019

Adaptively Sparse Transformers

Attention mechanisms have become ubiquitous in NLP. Recent architectures...
research
05/15/2020

Adaptive Transformers for Learning Multimodal Representations

The usage of transformers has grown from learning about language semanti...
research
03/17/2021

Value-aware Approximate Attention

Following the success of dot-product attention in Transformers, numerous...
research
04/14/2021

Sparse Attention with Linear Units

Recently, it has been argued that encoder-decoder models can be made mor...
research
10/27/2022

Transformers meet Stochastic Block Models: Attention with Data-Adaptive Sparsity and Cost

To overcome the quadratic cost of self-attention, recent works have prop...

Please sign up or login with your details

Forgot password? Click here to reset