DAT++: Spatially Dynamic Vision Transformer with Deformable Attention

09/04/2023
by   Zhuofan Xia, et al.
0

Transformers have shown superior performance on various vision tasks. Their large receptive field endows Transformer models with higher representation power than their CNN counterparts. Nevertheless, simply enlarging the receptive field also raises several concerns. On the one hand, using dense attention in ViT leads to excessive memory and computational cost, and features can be influenced by irrelevant parts that are beyond the region of interests. On the other hand, the handcrafted attention adopted in PVT or Swin Transformer is data agnostic and may limit the ability to model long-range relations. To solve this dilemma, we propose a novel deformable multi-head attention module, where the positions of key and value pairs in self-attention are adaptively allocated in a data-dependent way. This flexible scheme enables the proposed deformable attention to dynamically focus on relevant regions while maintains the representation power of global attention. On this basis, we present Deformable Attention Transformer (DAT), a general vision backbone efficient and effective for visual recognition. We further build an enhanced version DAT++. Extensive experiments show that our DAT++ achieves state-of-the-art results on various visual recognition benchmarks, with 85.9 MS-COCO instance segmentation mAP, and 51.5 ADE20K semantic segmentation mIoU.

READ FULL TEXT
research
01/03/2022

Vision Transformer with Deformable Attention

Transformers have recently shown superior performances on various vision...
research
02/03/2023

DilateFormer: Multi-Scale Dilated Transformer for Visual Recognition

As a de facto solution, the vanilla Vision Transformers (ViTs) are encou...
research
11/27/2018

Deformable ConvNets v2: More Deformable, Better Results

The superior performance of Deformable Convolutional Networks arises fro...
research
03/31/2022

Deformable Video Transformer

Video transformers have recently emerged as an effective alternative to ...
research
03/24/2023

Efficient Mixed-Type Wafer Defect Pattern Recognition Using Compact Deformable Convolutional Transformers

Manufacturing wafers is an intricate task involving thousands of steps. ...
research
03/18/2022

Laneformer: Object-aware Row-Column Transformers for Lane Detection

We present Laneformer, a conceptually simple yet powerful transformer-ba...
research
07/13/2022

Trans4Map: Revisiting Holistic Top-down Mapping from Egocentric Images to Allocentric Semantics with Vision Transformers

Humans have an innate ability to sense their surroundings, as they can e...

Please sign up or login with your details

Forgot password? Click here to reset