Augmented Shortcuts for Vision Transformers

06/30/2021
by   Yehui Tang, et al.
0

Transformer models have achieved great progress on computer vision tasks recently. The rapid development of vision transformers is mainly contributed by their high representation ability for extracting informative features from input images. However, the mainstream transformer models are designed with deep architectures, and the feature diversity will be continuously reduced as the depth increases, i.e., feature collapse. In this paper, we theoretically analyze the feature collapse phenomenon and study the relationship between shortcuts and feature diversity in these transformer models. Then, we present an augmented shortcut scheme, which inserts additional paths with learnable parameters in parallel on the original shortcuts. To save the computational costs, we further explore an efficient approach that uses the block-circulant projection to implement augmented shortcuts. Extensive experiments conducted on benchmark datasets demonstrate the effectiveness of the proposed method, which brings about 1 without obviously increasing their parameters and FLOPs.

READ FULL TEXT
research
06/27/2021

Post-Training Quantization for Vision Transformer

Recently, transformer has achieved remarkable performance on a variety o...
research
06/05/2021

Patch Slimming for Efficient Vision Transformers

This paper studies the efficiency problem for visual transformers by exc...
research
12/21/2022

What Makes for Good Tokenizers in Vision Transformer?

The architecture of transformers, which recently witness booming applica...
research
06/12/2023

Mitigating Transformer Overconfidence via Lipschitz Regularization

Though Transformers have achieved promising results in many computer vis...
research
01/04/2022

PyramidTNT: Improved Transformer-in-Transformer Baselines with Pyramid Architecture

Transformer networks have achieved great progress for computer vision ta...
research
03/04/2022

Patch Similarity Aware Data-Free Quantization for Vision Transformers

Vision transformers have recently gained great success on various comput...
research
10/12/2022

Large Models are Parsimonious Learners: Activation Sparsity in Trained Transformers

This paper studies the curious phenomenon for machine learning models wi...

Please sign up or login with your details

Forgot password? Click here to reset