Skip-Attention: Improving Vision Transformers by Paying Less Attention

This work aims to improve the efficiency of vision transformers (ViT). While ViTs use computationally expensive self-attention operations in every layer, we identify that these operations are highly correlated across layers – a key redundancy that causes unnecessary computations. Based on this observation, we propose SkipAt, a method to reuse self-attention computation from preceding layers to approximate attention at one or more subsequent layers. To ensure that reusing self-attention blocks across layers does not degrade the performance, we introduce a simple parametric function, which outperforms the baseline transformer's performance while running computationally faster. We show the effectiveness of our method in image classification and self-supervised learning on ImageNet-1K, semantic segmentation on ADE20K, image denoising on SIDD, and video denoising on DAVIS. We achieve improved throughput at the same-or-higher accuracy levels in all these tasks.

READ FULL TEXT

page 3

page 7

page 14

research
02/25/2021

LazyFormer: Self Attention with Lazy Update

Improving the efficiency of Transformer-based language pre-training is a...
research
02/17/2021

Centroid Transformers: Learning to Abstract with Attention

Self-attention, as the key block of transformers, is a powerful mechanis...
research
12/17/2020

Transformer Interpretability Beyond Attention Visualization

Self-attention techniques, and specifically Transformers, are dominating...
research
10/06/2020

Guiding Attention for Self-Supervised Learning with Transformers

In this paper, we propose a simple and effective technique to allow for ...
research
03/22/2021

DeepViT: Towards Deeper Vision Transformer

Vision transformers (ViTs) have been successfully applied in image class...
research
08/04/2022

DropKey

In this paper, we focus on analyzing and improving the dropout technique...
research
06/02/2023

The Information Pathways Hypothesis: Transformers are Dynamic Self-Ensembles

Transformers use the dense self-attention mechanism which gives a lot of...

Please sign up or login with your details

Forgot password? Click here to reset