Co-Scale Conv-Attentional Image Transformers

04/13/2021
by   Weijian Xu, et al.
0

In this paper, we present Co-scale conv-attentional image Transformers (CoaT), a Transformer-based image classifier equipped with co-scale and conv-attentional mechanisms. First, the co-scale mechanism maintains the integrity of Transformers' encoder branches at individual scales, while allowing representations learned at different scales to effectively communicate with each other; we design a series of serial and parallel blocks to realize the co-scale attention mechanism. Second, we devise a conv-attentional mechanism by realizing a relative position embedding formulation in the factorized attention module with an efficient convolution-like implementation. CoaT empowers image Transformers with enriched multi-scale and contextual modeling capabilities. On ImageNet, relatively small CoaT models attain superior classification results compared with the similar-sized convolutional neural networks and image/vision Transformers. The effectiveness of CoaT's backbone is also illustrated on object detection and instance segmentation, demonstrating its applicability to the downstream computer vision tasks.

READ FULL TEXT

page 3

page 10

research
12/21/2021

MPViT: Multi-Path Vision Transformer for Dense Prediction

Dense computer vision tasks such as object detection and segmentation re...
research
10/28/2022

Grafting Vision Transformers

Vision Transformers (ViTs) have recently become the state-of-the-art acr...
research
05/31/2021

Dual-stream Network for Visual Recognition

Transformers with remarkable global representation capacities achieve co...
research
03/20/2022

Vision Transformer with Convolutions Architecture Search

Transformers exhibit great advantages in handling computer vision tasks....
research
05/12/2023

ROI-based Deep Image Compression with Swin Transformers

Encoding the Region Of Interest (ROI) with better quality than the backg...
research
05/24/2023

ViTMatte: Boosting Image Matting with Pretrained Plain Vision Transformers

Recently, plain vision Transformers (ViTs) have shown impressive perform...
research
11/14/2022

ParCNetV2: Oversized Kernel with Enhanced Attention

Transformers have achieved tremendous success in various computer vision...

Please sign up or login with your details

Forgot password? Click here to reset