Visformer: The Vision-friendly Transformer

04/26/2021
by   Zhengsu Chen, et al.
0

The past year has witnessed the rapid development of applying the Transformer module to vision problems. While some researchers have demonstrated that Transformer-based models enjoy a favorable ability of fitting data, there are still growing number of evidences showing that these models suffer over-fitting especially when the training data is limited. This paper offers an empirical study by performing step-by-step operations to gradually transit a Transformer-based model to a convolution-based model. The results we obtain during the transition process deliver useful messages for improving visual recognition. Based on these observations, we propose a new architecture named Visformer, which is abbreviated from the `Vision-friendly Transformer'. With the same computational complexity, Visformer outperforms both the Transformer-based and convolution-based models in terms of ImageNet classification accuracy, and the advantage becomes more significant when the model complexity is lower or the training set is smaller. The code is available at https://github.com/danczs/Visformer.

READ FULL TEXT
research
05/03/2023

A Lightweight CNN-Transformer Model for Learning Traveling Salesman Problems

Transformer-based models show state-of-the-art performance even for larg...
research
07/01/2021

AutoFormer: Searching Transformers for Visual Recognition

Recently, pure transformer-based models have shown great potentials for ...
research
06/01/2023

Hiera: A Hierarchical Vision Transformer without the Bells-and-Whistles

Modern hierarchical vision transformers have added several vision-specif...
research
08/30/2021

A Battle of Network Structures: An Empirical Study of CNN, Transformer, and MLP

Convolutional neural networks (CNN) are the dominant deep neural network...
research
08/09/2021

RaftMLP: Do MLP-based Models Dream of Winning Over Computer Vision?

For the past ten years, CNN has reigned supreme in the world of computer...
research
06/08/2021

FastSeq: Make Sequence Generation Faster

Transformer-based models have made tremendous impacts in natural languag...
research
06/23/2022

Agriculture-Vision Challenge 2022 – The Runner-Up Solution for Agricultural Pattern Recognition via Transformer-based Models

The Agriculture-Vision Challenge in CVPR is one of the most famous and c...

Please sign up or login with your details

Forgot password? Click here to reset