Twins: Revisiting Spatial Attention Design in Vision Transformers

04/28/2021
by   Xiangxiang Chu, et al.
0

Very recently, a variety of vision transformer architectures for dense prediction tasks have been proposed and they show that the design of spatial attention is critical to their success in these tasks. In this work, we revisit the design of the spatial attention and demonstrate that a carefully-devised yet simple spatial attention mechanism performs favourably against the state-of-the-art schemes. As a result, we propose two vision transformer architectures, namely, Twins-PCPVT and Twins-SVT. Our proposed architectures are highly-efficient and easy to implement, only involving matrix multiplications that are highly optimized in modern deep learning frameworks. More importantly, the proposed architectures achieve excellent performance on a wide range of visual tasks including imagelevel classification as well as dense detection and segmentation. The simplicity and strong performance suggest that our proposed architectures may serve as stronger backbones for many vision tasks. Our code will be released soon at https://github.com/Meituan-AutoML/Twins .

READ FULL TEXT

page 1

page 2

page 3

page 4

06/07/2021

Shuffle Transformer: Rethinking Spatial Shuffle for Vision Transformer

Very recently, Window-based Transformers, which computed self-attention ...
06/25/2021

Vision Transformer Architecture Search

Recently, transformers have shown great superiority in solving computer ...
01/26/2022

When Shift Operation Meets Vision Transformer: An Extremely Simple Alternative to Attention Mechanism

Attention mechanism has been widely believed as the key to success of vi...
01/08/2022

QuadTree Attention for Vision Transformers

Transformers have been successful in many vision tasks, thanks to their ...
07/08/2022

k-means Mask Transformer

The rise of transformers in vision tasks not only advances network backb...
05/28/2022

MDMLP: Image Classification from Scratch on Small Datasets with MLP

The attention mechanism has become a go-to technique for natural languag...
03/17/2022

On Vision Features in Multimodal Machine Translation

Previous work on multimodal machine translation (MMT) has focused on the...

Code Repositories

Twins

Two simple and effective designs of vision transformer, which is on par with the Swin transformer


view repo