ViT-LSLA: Vision Transformer with Light Self-Limited-Attention

10/31/2022
by   Zhenzhe Hechen, et al.
0

Transformers have demonstrated a competitive performance across a wide range of vision tasks, while it is very expensive to compute the global self-attention. Many methods limit the range of attention within a local window to reduce computation complexity. However, their approaches cannot save the number of parameters; meanwhile, the self-attention and inner position bias (inside the softmax function) cause each query to focus on similar and close patches. Consequently, this paper presents a light self-limited-attention (LSLA) consisting of a light self-attention mechanism (LSA) to save the computation cost and the number of parameters, and a self-limited-attention mechanism (SLA) to improve the performance. Firstly, the LSA replaces the K (Key) and V (Value) of self-attention with the X(origin input). Applying it in vision Transformers which have encoder architecture and self-attention mechanism, can simplify the computation. Secondly, the SLA has a positional information module and a limited-attention module. The former contains a dynamic scale and an inner position bias to adjust the distribution of the self-attention scores and enhance the positional information. The latter uses an outer position bias after the softmax function to limit some large values of attention weights. Finally, a hierarchical Vision Transformer with Light self-Limited-attention (ViT-LSLA) is presented. The experiments show that ViT-LSLA achieves 71.6 Swin-T); 87.2 Swin-T). Furthermore, it greatly reduces FLOPs (3.5GFLOPs vs. 4.5GFLOPs of Swin-T) and parameters (18.9M vs. 27.6M of Swin-T).

READ FULL TEXT
research
09/19/2022

Axially Expanded Windows for Local-Global Interaction in Vision Transformers

Recently, Transformers have shown promising performance in various visio...
research
08/24/2023

Easy attention: A simple self-attention mechanism for Transformers

To improve the robustness of transformer neural networks used for tempor...
research
04/07/2023

PSLT: A Light-weight Vision Transformer with Ladder Self-Attention and Progressive Shift

Vision Transformer (ViT) has shown great potential for various visual ta...
research
04/10/2022

Linear Complexity Randomized Self-attention Mechanism

Recently, random feature attentions (RFAs) are proposed to approximate t...
research
01/05/2022

Synthesizing Tensor Transformations for Visual Self-attention

Self-attention shows outstanding competence in capturing long-range rela...
research
08/01/2023

FLatten Transformer: Vision Transformer using Focused Linear Attention

The quadratic computation complexity of self-attention has been a persis...
research
09/05/2023

A survey on efficient vision transformers: algorithms, techniques, and performance benchmarking

Vision Transformer (ViT) architectures are becoming increasingly popular...

Please sign up or login with your details

Forgot password? Click here to reset