Sliced Recursive Transformer

11/09/2021
by   Zhiqiang Shen, et al.
8

We present a neat yet effective recursive operation on vision transformers that can improve parameter utilization without involving additional parameters. This is achieved by sharing weights across depth of transformer networks. The proposed method can obtain a substantial gain ( 2 recursive operation, requires no special or sophisticated knowledge for designing principles of networks, and introduces minimum computational overhead to the training procedure. To reduce the additional computation caused by recursive operation while maintaining the superior accuracy, we propose an approximating method through multiple sliced group self-attentions across recursive layers which can reduce the cost consumption by 10 30 performance loss. We call our model Sliced Recursive Transformer (SReT), which is compatible with a broad range of other designs for efficient vision transformers. Our best model establishes significant improvement on ImageNet over state-of-the-art methods while containing fewer parameters. The proposed sliced recursive operation allows us to build a transformer with more than 100 or even 1000 layers effortlessly under a still small size (13 15M), to avoid difficulties in optimization when the model size is too large. The flexible scalability has shown great potential for scaling up and constructing extremely deep and large dimensionality vision transformers. Our code and models are available at https://github.com/szq0214/SReT.

READ FULL TEXT
research
05/21/2022

Vision Transformers in 2022: An Update on Tiny ImageNet

The recent advances in image transformers have shown impressive results ...
research
07/28/2022

HorNet: Efficient High-Order Spatial Interactions with Recursive Gated Convolutions

Recent progress in vision Transformers exhibits great success in various...
research
07/01/2021

AutoFormer: Searching Transformers for Visual Recognition

Recently, pure transformer-based models have shown great potentials for ...
research
06/01/2023

Hiera: A Hierarchical Vision Transformer without the Bells-and-Whistles

Modern hierarchical vision transformers have added several vision-specif...
research
02/24/2022

Auto-scaling Vision Transformers without Training

This work targets automated designing and scaling of Vision Transformers...
research
05/23/2022

Super Vision Transformer

We attempt to reduce the computational costs in vision transformers (ViT...
research
05/30/2023

Are Large Kernels Better Teachers than Transformers for ConvNets?

This paper reveals a new appeal of the recently emerged large-kernel Con...

Please sign up or login with your details

Forgot password? Click here to reset