DiffRate : Differentiable Compression Rate for Efficient Vision Transformers

05/29/2023
by   Mengzhao Chen, et al.
0

Token compression aims to speed up large-scale vision transformers (e.g. ViTs) by pruning (dropping) or merging tokens. It is an important but challenging task. Although recent advanced approaches achieved great success, they need to carefully handcraft a compression rate (i.e. number of tokens to remove), which is tedious and leads to sub-optimal performance. To tackle this problem, we propose Differentiable Compression Rate (DiffRate), a novel token compression method that has several appealing properties prior arts do not have. First, DiffRate enables propagating the loss function's gradient onto the compression ratio, which is considered as a non-differentiable hyperparameter in previous work. In this case, different layers can automatically learn different compression rates layer-wisely without extra overhead. Second, token pruning and merging can be naturally performed simultaneously in DiffRate, while they were isolated in previous works. Third, extensive experiments demonstrate that DiffRate achieves state-of-the-art performance. For example, by applying the learned layer-wise compression rates to an off-the-shelf ViT-H (MAE) model, we achieve a 40 improvement, with a minor accuracy drop of 0.16 fine-tuning, even outperforming previous methods with fine-tuning. Codes and models are available at https://github.com/OpenGVLab/DiffRate.

READ FULL TEXT

page 1

page 8

page 14

research
07/20/2023

Learned Thresholds Token Merging and Pruning for Vision Transformers

Vision transformers have demonstrated remarkable success in a wide range...
research
10/17/2022

Token Merging: Your ViT But Faster

We introduce Token Merging (ToMe), a simple method to increase the throu...
research
05/27/2023

Zero-TPrune: Zero-Shot Token Pruning through Leveraging of the Attention Graph in Pre-Trained Transformers

Deployment of Transformer models on the edge is increasingly challenging...
research
03/04/2023

A Fast Training-Free Compression Framework for Vision Transformers

Token pruning has emerged as an effective solution to speed up the infer...
research
12/31/2021

Multi-Dimensional Model Compression of Vision Transformer

Vision transformers (ViT) have recently attracted considerable attention...
research
02/28/2020

Learned Threshold Pruning

This paper presents a novel differentiable method for unstructured weigh...
research
03/20/2023

SeiT: Storage-Efficient Vision Training with Tokens Using 1 Storage

We need billion-scale images to achieve more generalizable and ground-br...

Please sign up or login with your details

Forgot password? Click here to reset