A Survey on Visual Transformer

12/23/2020
by   Kai Han, et al.
0

Transformer is a type of deep neural network mainly based on self-attention mechanism which is originally applied in natural language processing field. Inspired by the strong representation ability of transformer, researchers propose to extend transformer for computer vision tasks. Transformer-based models show competitive and even better performance on various visual benchmarks compared to other network types such as convolutional networks and recurrent networks. In this paper we provide a literature review of these visual transformer models by categorizing them in different tasks and analyze the advantages and disadvantages of these methods. In particular, the main categories include the basic image classification, high-level vision, low-level vision and video processing. Self-attention in computer vision is also briefly revisited as self-attention is the base component in transformer. Efficient transformer methods are included for pushing transformer into real applications. Finally, we give a discussion about the further research directions for visual transformer.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/29/2021

UFO-ViT: High Performance Linear Vision Transformer without Softmax

Vision transformers have become one of the most important models for com...
research
03/24/2022

Transformers Meet Visual Learning Understanding: A Comprehensive Review

Dynamic attention mechanism and global modeling ability make Transformer...
research
01/04/2021

Transformers in Vision: A Survey

Astounding results from transformer models on natural language tasks hav...
research
11/13/2022

Demystify Self-Attention in Vision Transformers from a Semantic Perspective: Analysis and Application

Self-attention mechanisms, especially multi-head self-attention (MSA), h...
research
12/10/2021

Couplformer:Rethinking Vision Transformer with Coupling Attention Map

With the development of the self-attention mechanism, the Transformer mo...
research
07/19/2023

Improving Domain Generalization for Sound Classification with Sparse Frequency-Regularized Transformer

Sound classification models' performance suffers from generalizing on ou...
research
06/28/2022

ZoDIAC: Zoneout Dropout Injection Attention Calculation

Recently the use of self-attention has yielded to state-of-the-art resul...

Please sign up or login with your details

Forgot password? Click here to reset