AxWin Transformer: A Context-Aware Vision Transformer Backbone with Axial Windows

05/02/2023
by   Fangjian Lin, et al.
0

Recently Transformer has shown good performance in several vision tasks due to its powerful modeling capabilities. To reduce the quadratic complexity caused by the attention, some outstanding work restricts attention to local regions or extends axial interactions. However, these methos often lack the interaction of local and global information, balancing coarse and fine-grained information. To address this problem, we propose AxWin Attention, which models context information in both local windows and axial views. Based on the AxWin Attention, we develop a context-aware vision transformer backbone, named AxWin Transformer, which outperforming the state-of-the-art methods in both classification and downstream segmentation and detection tasks.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/28/2021

Pale Transformer: A General Vision Transformer Backbone with Pale-Shaped Attention

Recently, Transformers have shown promising performance in various visio...
research
03/31/2023

Rethinking Local Perception in Lightweight Vision Transformer

Vision Transformers (ViTs) have been shown to be effective in various vi...
research
03/08/2022

Dynamic Group Transformer: A General Vision Transformer Backbone with Dynamic Group Attention

Recently, Transformers have shown promising performance in various visio...
research
07/12/2023

UGCANet: A Unified Global Context-Aware Transformer-based Network with Feature Alignment for Endoscopic Image Analysis

Gastrointestinal endoscopy is a medical procedure that utilizes a flexib...
research
06/01/2023

Lightweight Vision Transformer with Bidirectional Interaction

Recent advancements in vision backbones have significantly improved thei...
research
03/15/2023

BiFormer: Vision Transformer with Bi-Level Routing Attention

As the core building block of vision transformers, attention is a powerf...
research
09/14/2023

HIGT: Hierarchical Interaction Graph-Transformer for Whole Slide Image Analysis

In computation pathology, the pyramid structure of gigapixel Whole Slide...

Please sign up or login with your details

Forgot password? Click here to reset