SaiT: Sparse Vision Transformers through Adaptive Token Pruning

10/11/2022
by   Ling Li, et al.
0

While vision transformers have achieved impressive results, effectively and efficiently accelerating these models can further boost performances. In this work, we propose a dense/sparse training framework to obtain a unified model, enabling weight sharing across various token densities. Thus one model offers a range of accuracy and throughput tradeoffs for different applications. Besides, we introduce adaptive token pruning to optimize the patch token sparsity based on the input image. In addition, we investigate knowledge distillation to enhance token selection capability in early transformer modules. Sparse adaptive image Transformer (SaiT) offers varying levels of model acceleration by merely changing the token sparsity on the fly. Specifically, SaiT reduces the computation complexity (FLOPs) by 39 67 models. Meanwhile, the same model also provides the zero accuracy drop option by skipping the sparsification step. SaiT achieves better accuracy and computation tradeoffs than state-of-the-art transformer and convolutional models.

READ FULL TEXT
research
10/08/2021

Adversarial Token Attacks on Vision Transformers

Vision transformers rely on a patch token based self attention mechanism...
research
11/15/2022

HeatViT: Hardware-Efficient Adaptive Token Pruning for Vision Transformers

While vision transformers (ViTs) have continuously achieved new mileston...
research
08/21/2023

Patch Is Not All You Need

Vision Transformers have achieved great success in computer visions, del...
research
11/24/2021

Self-slimmed Vision Transformer

Vision transformers (ViTs) have become the popular structures and outper...
research
10/17/2022

Token Merging: Your ViT But Faster

We introduce Token Merging (ToMe), a simple method to increase the throu...
research
05/18/2023

Boost Vision Transformer with GPU-Friendly Sparsity and Quantization

The transformer extends its success from the language to the vision doma...
research
12/17/2020

SpAtten: Efficient Sparse Attention Architecture with Cascade Token and Head Pruning

The attention mechanism is becoming increasingly popular in Natural Lang...

Please sign up or login with your details

Forgot password? Click here to reset