Revisiting Token Pruning for Object Detection and Instance Segmentation

06/12/2023
by   Yifei Liu, et al.
0

Vision Transformers (ViTs) have shown impressive performance in computer vision, but their high computational cost, quadratic in the number of tokens, limits their adoption in computation-constrained applications. However, this large number of tokens may not be necessary, as not all tokens are equally important. In this paper, we investigate token pruning to accelerate inference for object detection and instance segmentation, extending prior works from image classification. Through extensive experiments, we offer four insights for dense tasks: (i) tokens should not be completely pruned and discarded, but rather preserved in the feature maps for later use. (ii) reactivating previously pruned tokens can further enhance model performance. (iii) a dynamic pruning rate based on images is better than a fixed pruning rate. (iv) a lightweight, 2-layer MLP can effectively prune tokens, achieving accuracy comparable with complex gating networks with a simpler design. We evaluate the impact of these design choices on COCO dataset and present a method integrating these insights that outperforms prior art token pruning models, significantly reducing performance drop from  1.5 mAP to  0.3 mAP for both boxes and masks. Compared to the dense counterpart that uses all tokens, our method achieves up to 34

READ FULL TEXT
research
04/21/2023

Joint Token Pruning and Squeezing Towards More Aggressive Compression of Vision Transformers

Although vision transformers (ViTs) have shown promising results in vari...
research
06/08/2023

Muti-Scale And Token Mergence: Make Your ViT More Efficient

Since its inception, Vision Transformer (ViT) has emerged as a prevalent...
research
11/21/2022

Beyond Attentive Tokens: Incorporating Token Importance and Diversity for Efficient Vision Transformers

Vision transformers have achieved significant improvements on various vi...
research
08/03/2021

Evo-ViT: Slow-Fast Token Evolution for Dynamic Vision Transformer

Vision transformers have recently received explosive popularity, but the...
research
08/07/2023

DiT: Efficient Vision Transformers with Dynamic Token Routing

Recently, the tokens of images share the same static data flow in many d...
research
12/13/2021

A Study on Token Pruning for ColBERT

The ColBERT model has recently been proposed as an effective BERT based ...
research
08/09/2023

Which Tokens to Use? Investigating Token Reduction in Vision Transformers

Since the introduction of the Vision Transformer (ViT), researchers have...

Please sign up or login with your details

Forgot password? Click here to reset