When Shift Operation Meets Vision Transformer: An Extremely Simple Alternative to Attention Mechanism

01/26/2022
by   Guangting Wang, et al.
0

Attention mechanism has been widely believed as the key to success of vision transformers (ViTs), since it provides a flexible and powerful way to model spatial relationships. However, is the attention mechanism truly an indispensable part of ViT? Can it be replaced by some other alternatives? To demystify the role of attention mechanism, we simplify it into an extremely simple case: ZERO FLOP and ZERO parameter. Concretely, we revisit the shift operation. It does not contain any parameter or arithmetic calculation. The only operation is to exchange a small portion of the channels between neighboring features. Based on this simple operation, we construct a new backbone network, namely ShiftViT, where the attention layers in ViT are substituted by shift operations. Surprisingly, ShiftViT works quite well in several mainstream tasks, e.g., classification, detection, and segmentation. The performance is on par with or even better than the strong baseline Swin Transformer. These results suggest that the attention mechanism might not be the vital factor that makes ViT successful. It can be even replaced by a zero-parameter operation. We should pay more attentions to the remaining parts of ViT in the future work. Code is available at github.com/microsoft/SPACH.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/29/2020

Linear Attention Mechanism: An Efficient Attention for Semantic Segmentation

In this paper, to remedy this deficiency, we propose a Linear Attention ...
research
04/28/2021

Twins: Revisiting Spatial Attention Design in Vision Transformers

Very recently, a variety of vision transformer architectures for dense p...
research
11/07/2022

How Much Does Attention Actually Attend? Questioning the Importance of Attention in Pretrained Transformers

The attention mechanism is considered the backbone of the widely-used Tr...
research
12/10/2021

Couplformer:Rethinking Vision Transformer with Coupling Attention Map

With the development of the self-attention mechanism, the Transformer mo...
research
09/30/2020

Learning Hard Retrieval Cross Attention for Transformer

The Transformer translation model that based on the multi-head attention...
research
05/28/2022

MDMLP: Image Classification from Scratch on Small Datasets with MLP

The attention mechanism has become a go-to technique for natural languag...
research
11/22/2017

Shift: A Zero FLOP, Zero Parameter Alternative to Spatial Convolutions

Neural networks rely on convolutions to aggregate spatial information. H...

Please sign up or login with your details

Forgot password? Click here to reset