Dual Vision Transformer

07/11/2022
by   Ting Yao, et al.
0

Prior works have proposed several strategies to reduce the computational cost of self-attention mechanism. Many of these works consider decomposing the self-attention procedure into regional and local feature extraction procedures that each incurs a much smaller computational complexity. However, regional information is typically only achieved at the expense of undesirable information lost owing to down-sampling. In this paper, we propose a novel Transformer architecture that aims to mitigate the cost issue, named Dual Vision Transformer (Dual-ViT). The new architecture incorporates a critical semantic pathway that can more efficiently compress token vectors into global semantics with reduced order of complexity. Such compressed global semantics then serve as useful prior information in learning finer pixel level details, through another constructed pixel pathway. The semantic pathway and pixel pathway are then integrated together and are jointly trained, spreading the enhanced self-attention information in parallel through both of the pathways. Dual-ViT is henceforth able to reduce the computational complexity without compromising much accuracy. We empirically demonstrate that Dual-ViT provides superior accuracy than SOTA Transformer architectures with reduced training complexity. Source code is available at <https://github.com/YehLi/ImageNetModel>.

READ FULL TEXT

page 2

page 9

research
12/27/2022

DAE-Former: Dual Attention-guided Efficient Transformer for Medical Image Segmentation

Transformers have recently gained attention in the computer vision domai...
research
06/01/2022

Dynamic Linear Transformer for 3D Biomedical Image Segmentation

Transformer-based neural networks have surpassed promising performance o...
research
08/09/2021

RaftMLP: Do MLP-based Models Dream of Winning Over Computer Vision?

For the past ten years, CNN has reigned supreme in the world of computer...
research
03/01/2023

Efficient and Explicit Modelling of Image Hierarchies for Image Restoration

The aim of this paper is to propose a mechanism to efficiently and expli...
research
01/11/2023

Head-Free Lightweight Semantic Segmentation with Linear Transformer

Existing semantic segmentation works have been mainly focused on designi...
research
08/24/2023

Towards Hierarchical Regional Transformer-based Multiple Instance Learning

The classification of gigapixel histopathology images with deep multiple...
research
02/28/2023

Sampled Transformer for Point Sets

The sparse transformer can reduce the computational complexity of the se...

Please sign up or login with your details

Forgot password? Click here to reset