Twins: Revisiting Spatial Attention Design in Vision Transformers

04/28/2021
by   Xiangxiang Chu, et al.
0

Very recently, a variety of vision transformer architectures for dense prediction tasks have been proposed and they show that the design of spatial attention is critical to their success in these tasks. In this work, we revisit the design of the spatial attention and demonstrate that a carefully-devised yet simple spatial attention mechanism performs favourably against the state-of-the-art schemes. As a result, we propose two vision transformer architectures, namely, Twins-PCPVT and Twins-SVT. Our proposed architectures are highly-efficient and easy to implement, only involving matrix multiplications that are highly optimized in modern deep learning frameworks. More importantly, the proposed architectures achieve excellent performance on a wide range of visual tasks including imagelevel classification as well as dense detection and segmentation. The simplicity and strong performance suggest that our proposed architectures may serve as stronger backbones for many vision tasks. Our code will be released soon at https://github.com/Meituan-AutoML/Twins .

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/07/2021

Shuffle Transformer: Rethinking Spatial Shuffle for Vision Transformer

Very recently, Window-based Transformers, which computed self-attention ...
research
06/05/2023

Learning Probabilistic Symmetrization for Architecture Agnostic Equivariance

We present a novel framework to overcome the limitations of equivariant ...
research
01/26/2022

When Shift Operation Meets Vision Transformer: An Extremely Simple Alternative to Attention Mechanism

Attention mechanism has been widely believed as the key to success of vi...
research
08/17/2022

Transformer Vs. MLP-Mixer Exponential Expressive Gap For NLP Problems

Vision-Transformers are widely used in various vision tasks. Meanwhile, ...
research
10/14/2022

Parameter-Free Average Attention Improves Convolutional Neural Network Performance (Almost) Free of Charge

Visual perception is driven by the focus on relevant aspects in the surr...
research
08/30/2023

Emergence of Segmentation with Minimalistic White-Box Transformers

Transformer-like models for vision tasks have recently proven effective ...
research
07/24/2023

Entropy Transformer Networks: A Learning Approach via Tangent Bundle Data Manifold

This paper focuses on an accurate and fast interpolation approach for im...

Please sign up or login with your details

Forgot password? Click here to reset