Life Regression based Patch Slimming for Vision Transformers

04/11/2023
by   Jiawei Chen, et al.
0

Vision transformers have achieved remarkable success in computer vision tasks by using multi-head self-attention modules to capture long-range dependencies within images. However, the high inference computation cost poses a new challenge. Several methods have been proposed to address this problem, mainly by slimming patches. In the inference stage, these methods classify patches into two classes, one to keep and the other to discard in multiple layers. This approach results in additional computation at every layer where patches are discarded, which hinders inference acceleration. In this study, we tackle the patch slimming problem from a different perspective by proposing a life regression module that determines the lifespan of each image patch in one go. During inference, the patch is discarded once the current layer index exceeds its life. Our proposed method avoids additional computation and parameters in multiple layers to enhance inference speed while maintaining competitive performance. Additionally, our approach requires fewer training epochs than other patch slimming methods.

READ FULL TEXT

page 1

page 7

research
06/05/2021

Patch Slimming for Efficient Vision Transformers

This paper studies the efficiency problem for visual transformers by exc...
research
07/27/2022

Spatiotemporal Self-attention Modeling with Temporal Patch Shift for Action Recognition

Transformer-based methods have recently achieved great advancement on 2D...
research
05/25/2023

Making Vision Transformers Truly Shift-Equivariant

For computer vision tasks, Vision Transformers (ViTs) have become one of...
research
08/25/2021

TransFER: Learning Relation-aware Facial Expression Representations with Transformers

Facial expression recognition (FER) has received increasing interest in ...
research
06/30/2023

Hardwiring ViT Patch Selectivity into CNNs using Patch Mixing

Vision transformers (ViTs) have significantly changed the computer visio...
research
05/16/2023

Ray-Patch: An Efficient Decoder for Light Field Transformers

In this paper we propose the Ray-Patch decoder, a novel model to efficie...
research
07/18/2023

FlexiAST: Flexibility is What AST Needs

The objective of this work is to give patch-size flexibility to Audio Sp...

Please sign up or login with your details

Forgot password? Click here to reset