QuadTree Attention for Vision Transformers

01/08/2022
by   Shitao Tang, et al.
0

Transformers have been successful in many vision tasks, thanks to their capability of capturing long-range dependency. However, their quadratic computational complexity poses a major obstacle for applying them to vision tasks requiring dense predictions, such as object detection, feature matching, stereo, etc. We introduce QuadTree Attention, which reduces the computational complexity from quadratic to linear. Our quadtree transformer builds token pyramids and computes attention in a coarse-to-fine manner. At each level, the top K patches with the highest attention scores are selected, such that at the next level, attention is only evaluated within the relevant regions corresponding to these top K patches. We demonstrate that quadtree attention achieves state-of-the-art performance in various vision tasks, e.g. with 4.0 improvement in feature matching on ScanNet, about 50 matching, 0.4-1.5 1.2-1.8 semantic segmentation over previous state-of-the-art transformers. The codes are available at https://github.com/Tangshitao/QuadtreeAttentionhttps://github.com/Tangshitao/QuadtreeAttention.

READ FULL TEXT

page 4

page 17

research
06/04/2021

Glance-and-Gaze Vision Transformer

Recently, there emerges a series of vision Transformers, which show supe...
research
06/21/2022

Vicinity Vision Transformer

Vision transformers have shown great success on numerous computer vision...
research
03/15/2023

BiFormer: Vision Transformer with Bi-Level Routing Attention

As the core building block of vision transformers, attention is a powerf...
research
06/15/2023

Seeing the Pose in the Pixels: Learning Pose-Aware Representations in Vision Transformers

Human perception of surroundings is often guided by the various poses pr...
research
04/16/2022

Efficient Linear Attention for Fast and Accurate Keypoint Matching

Recently Transformers have provided state-of-the-art performance in spar...
research
05/08/2023

Understanding Gaussian Attention Bias of Vision Transformers Using Effective Receptive Fields

Vision transformers (ViTs) that model an image as a sequence of partitio...
research
04/07/2022

Surface Vision Transformers: Flexible Attention-Based Modelling of Biomedical Surfaces

Recent state-of-the-art performances of Vision Transformers (ViT) in com...

Please sign up or login with your details

Forgot password? Click here to reset