A Fast Training-Free Compression Framework for Vision Transformers

03/04/2023
by   Jung Hwan Heo, et al.
0

Token pruning has emerged as an effective solution to speed up the inference of large Transformer models. However, prior work on accelerating Vision Transformer (ViT) models requires training from scratch or fine-tuning with additional parameters, which prevents a simple plug-and-play. To avoid high training costs during the deployment stage, we present a fast training-free compression framework enabled by (i) a dense feature extractor in the initial layers; (ii) a sharpness-minimized model which is more compressible; and (iii) a local-global token merger that can exploit spatial relationships at various contexts. We applied our framework to various ViT and DeiT models and achieved up to 2x reduction in FLOPS and 1.8x speedup in inference throughput with <1 accuracy loss, while saving two orders of magnitude shorter training times than existing approaches. Code will be available at https://github.com/johnheo/fast-compress-vit

READ FULL TEXT

page 3

page 5

page 8

research
07/10/2021

Local-to-Global Self-Attention in Vision Transformers

Transformers have demonstrated great potential in computer vision tasks....
research
07/20/2023

Learned Thresholds Token Merging and Pruning for Vision Transformers

Vision transformers have demonstrated remarkable success in a wide range...
research
06/09/2023

Error Feedback Can Accurately Compress Preconditioners

Leveraging second-order information at the scale of deep networks is one...
research
02/20/2020

Neural Network Compression Framework for fast model inference

In this work we present a new framework for neural networks compression ...
research
05/29/2023

DiffRate : Differentiable Compression Rate for Efficient Vision Transformers

Token compression aims to speed up large-scale vision transformers (e.g....
research
10/17/2022

Token Merging: Your ViT But Faster

We introduce Token Merging (ToMe), a simple method to increase the throu...
research
10/10/2021

NViT: Vision Transformer Compression and Parameter Redistribution

Transformers yield state-of-the-art results across many tasks. However, ...

Please sign up or login with your details

Forgot password? Click here to reset