Super Vision Transformer

05/23/2022
by   Mingbao Lin, et al.
0

We attempt to reduce the computational costs in vision transformers (ViTs), which increase quadratically in the token number. We present a novel training paradigm that trains only one ViT model at a time, but is capable of providing improved image recognition performance with various computational costs. Here, the trained ViT model, termed super vision transformer (SuperViT), is empowered with the versatile ability to solve incoming patches of multiple sizes as well as preserve informative tokens with multiple keeping rates (the ratio of keeping tokens) to achieve good hardware efficiency for inference, given that the available hardware resources often change from time to time. Experimental results on ImageNet demonstrate that our SuperViT can considerably reduce the computational costs of ViT models with even performance increase. For example, we reduce 2x FLOPs of DeiT-S while increasing the Top-1 accuracy by 0.2 0.7 studies on efficient vision transformers. For example, when consuming the same amount of FLOPs, our SuperViT surpasses the recent state-of-the-art (SoTA) EViT by 1.1 publicly available at https://github.com/lmbxmu/SuperViT.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/22/2021

Token Labeling: Training a 85.4 56M Parameters on ImageNet

This paper provides a strong baseline for vision transformers on the Ima...
research
11/09/2022

Training a Vision Transformer from scratch in less than 24 hours with 1 GPU

Transformers have become central to recent advances in computer vision. ...
research
04/07/2023

SparseFormer: Sparse Visual Recognition via Limited Latent Tokens

Human visual recognition is a sparse process, where only a few salient v...
research
03/08/2023

X-Pruner: eXplainable Pruning for Vision Transformers

Recently vision transformer models have become prominent models for a ra...
research
04/01/2023

Vision Transformers with Mixed-Resolution Tokenization

Vision Transformer models process input images by dividing them into a s...
research
08/09/2022

CoViT: Real-time phylogenetics for the SARS-CoV-2 pandemic using Vision Transformers

Real-time viral genome detection, taxonomic classification and phylogene...
research
11/09/2021

Sliced Recursive Transformer

We present a neat yet effective recursive operation on vision transforme...

Please sign up or login with your details

Forgot password? Click here to reset