MSG-Transformer: Exchanging Local Spatial Information by Manipulating Messenger Tokens

05/31/2021
by   Jiemin Fang, et al.
0

Transformers have offered a new methodology of designing neural networks for visual recognition. Compared to convolutional networks, Transformers enjoy the ability of referring to global features at each stage, yet the attention module brings higher computational overhead that obstructs the application of Transformers to process high-resolution visual data. This paper aims to alleviate the conflict between efficiency and flexibility, for which we propose a specialized token for each region that serves as a messenger (MSG). Hence, by manipulating these MSG tokens, one can flexibly exchange visual information across regions and the computational complexity is reduced. We then integrate the MSG token into a multi-scale architecture named MSG-Transformer. In standard image classification and object detection, MSG-Transformer achieves competitive performance and the inference on both GPU and CPU is accelerated. The code will be available at https://github.com/hustvl/MSG-Transformer.

READ FULL TEXT
research
06/27/2022

Kernel Attention Transformer (KAT) for Histopathology Whole Slide Image Classification

Transformer has been widely used in histopathology whole slide image (WS...
research
10/21/2022

Boosting vision transformers for image retrieval

Vision transformers have achieved remarkable progress in vision tasks su...
research
02/22/2021

Do We Really Need Explicit Position Encodings for Vision Transformers?

Almost all visual transformers such as ViT or DeiT rely on predefined po...
research
05/10/2022

Reduce Information Loss in Transformers for Pluralistic Image Inpainting

Transformers have achieved great success in pluralistic image inpainting...
research
08/24/2022

Addressing Token Uniformity in Transformers via Singular Value Transformation

Token uniformity is commonly observed in transformer-based models, in wh...
research
03/20/2022

Iwin: Human-Object Interaction Detection via Transformer with Irregular Windows

This paper presents a new vision Transformer, named Iwin Transformer, wh...
research
04/01/2023

Vision Transformers with Mixed-Resolution Tokenization

Vision Transformer models process input images by dividing them into a s...

Please sign up or login with your details

Forgot password? Click here to reset