A Unified Pruning Framework for Vision Transformers

11/30/2021
by   Hao Yu, et al.
2

Recently, vision transformer (ViT) and its variants have achieved promising performances in various computer vision tasks. Yet the high computational costs and training data requirements of ViTs limit their application in resource-constrained settings. Model compression is an effective method to speed up deep learning models, but the research of compressing ViTs has been less explored. Many previous works concentrate on reducing the number of tokens. However, this line of attack breaks down the spatial structure of ViTs and is hard to be generalized into downstream tasks. In this paper, we design a unified framework for structural pruning of both ViTs and its variants, namely UP-ViTs. Our method focuses on pruning all ViTs components while maintaining the consistency of the model structure. Abundant experimental results show that our method can achieve high accuracy on compressed ViTs and variants, e.g., UP-DeiT-T achieves 75.79 DeiT-T by 3.59 accuracy of PVTv2-B0 by 4.83 maintains the consistency of the token representation and gains consistent improvements on object detection tasks.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/21/2023

Joint Token Pruning and Squeezing Towards More Aggressive Compression of Vision Transformers

Although vision transformers (ViTs) have shown promising results in vari...
research
01/13/2023

GOHSP: A Unified Framework of Graph and Optimization-based Heterogeneous Structured Pruning for Vision Transformer

The recently proposed Vision transformers (ViTs) have shown very impress...
research
05/26/2023

COMCAT: Towards Efficient Compression and Customization of Attention-Based Vision Models

Attention-based vision models, such as Vision Transformer (ViT) and its ...
research
08/03/2021

Evo-ViT: Slow-Fast Token Evolution for Dynamic Vision Transformer

Vision transformers have recently received explosive popularity, but the...
research
07/20/2022

Model Compression for Resource-Constrained Mobile Robots

The number of mobile robots with constrained computing resources that ne...
research
09/05/2023

Compressing Vision Transformers for Low-Resource Visual Learning

Vision transformer (ViT) and its variants have swept through visual lear...
research
12/31/2021

Multi-Dimensional Model Compression of Vision Transformer

Vision transformers (ViT) have recently attracted considerable attention...

Please sign up or login with your details

Forgot password? Click here to reset