Patch Slimming for Efficient Vision Transformers

06/05/2021
by   Yehui Tang, et al.
0

This paper studies the efficiency problem for visual transformers by excavating redundant calculation in given networks. The recent transformer architecture has demonstrated its effectiveness for achieving excellent performance on a series of computer vision tasks. However, similar to that of convolutional neural networks, the huge computational cost of vision transformers is still a severe issue. Considering that the attention mechanism aggregates different patches layer-by-layer, we present a novel patch slimming approach that discards useless patches in a top-down paradigm. We first identify the effective patches in the last layer and then use them to guide the patch selection process of previous layers. For each layer, the impact of a patch on the final output feature is approximated and patches with less impact will be removed. Experimental results on benchmark datasets demonstrate that the proposed method can significantly reduce the computational costs of vision transformers without affecting their performances. For example, over 45 of the ViT-Ti model can be reduced with only 0.2 ImageNet dataset.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/11/2023

Life Regression based Patch Slimming for Vision Transformers

Vision transformers have achieved remarkable success in computer vision ...
research
05/06/2021

Do You Even Need Attention? A Stack of Feed-Forward Layers Does Surprisingly Well on ImageNet

The strong performance of vision transformers on image classification an...
research
05/16/2023

Ray-Patch: An Efficient Decoder for Light Field Transformers

In this paper we propose the Ray-Patch decoder, a novel model to efficie...
research
12/09/2021

Locally Shifted Attention With Early Global Integration

Recent work has shown the potential of transformers for computer vision ...
research
12/13/2022

OAMixer: Object-aware Mixing Layer for Vision Transformers

Patch-based models, e.g., Vision Transformers (ViTs) and Mixers, have sh...
research
06/30/2021

Augmented Shortcuts for Vision Transformers

Transformer models have achieved great progress on computer vision tasks...

Please sign up or login with your details

Forgot password? Click here to reset