Axially Expanded Windows for Local-Global Interaction in Vision Transformers

09/19/2022
by   Zhemin Zhang, et al.
0

Recently, Transformers have shown promising performance in various vision tasks. A challenging issue in Transformer design is that global self-attention is very expensive to compute, especially for the high-resolution vision tasks. Local self-attention performs attention computation within a local region to improve its efficiency, which leads to their receptive fields in a single attention layer are not large enough, resulting in insufficient context modeling. When observing a scene, humans usually focus on a local region while attending to non-attentional regions at coarse granularity. Based on this observation, we develop the axially expanded window self-attention mechanism that performs fine-grained self-attention within the local window and coarse-grained self-attention in the horizontal and vertical axes, and thus can effectively capturing both short- and long-range visual dependencies.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/01/2021

Focal Self-attention for Local-Global Interactions in Vision Transformers

Recently, Vision Transformer and its variants have shown great promise o...
research
04/13/2023

RSIR Transformer: Hierarchical Vision Transformer using Random Sampling Windows and Important Region Windows

Recently, Transformers have shown promising performance in various visio...
research
11/27/2022

Semantic-Aware Local-Global Vision Transformer

Vision Transformers have achieved remarkable progresses, among which Swi...
research
01/31/2022

BOAT: Bilateral Local Attention Vision Transformer

Vision Transformers achieved outstanding performance in many computer vi...
research
06/01/2023

Lightweight Vision Transformer with Bidirectional Interaction

Recent advancements in vision backbones have significantly improved thei...
research
10/31/2022

ViT-LSLA: Vision Transformer with Light Self-Limited-Attention

Transformers have demonstrated a competitive performance across a wide r...
research
05/15/2023

Ripple sparse self-attention for monaural speech enhancement

The use of Transformer represents a recent success in speech enhancement...

Please sign up or login with your details

Forgot password? Click here to reset