Searching Intrinsic Dimensions of Vision Transformers

04/16/2022
by   Fanghui Xue, et al.
0

It has been shown by many researchers that transformers perform as well as convolutional neural networks in many computer vision tasks. Meanwhile, the large computational costs of its attention module hinder further studies and applications on edge devices. Some pruning methods have been developed to construct efficient vision transformers, but most of them have considered image classification tasks only. Inspired by these results, we propose SiDT, a method for pruning vision transformer backbones on more complicated vision tasks like object detection, based on the search of transformer dimensions. Experiments on CIFAR-100 and COCO datasets show that the backbones with 20% or 40% dimensions/parameters pruned can have similar or even better performance than the unpruned models. Moreover, we have also provided the complexity analysis and comparisons with the previous pruning methods.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/30/2021

Rethinking Spatial Dimensions of Vision Transformers

Vision Transformer (ViT) extends the application range of transformers f...
research
12/17/2020

Toward Transformer-Based Object Detection

Transformers have become the dominant model in natural language processi...
research
01/10/2022

A ConvNet for the 2020s

The "Roaring 20s" of visual recognition began with the introduction of V...
research
03/07/2022

Knowledge Amalgamation for Object Detection with Transformers

Knowledge amalgamation (KA) is a novel deep model reusing task aiming to...
research
05/26/2023

COMCAT: Towards Efficient Compression and Customization of Attention-Based Vision Models

Attention-based vision models, such as Vision Transformer (ViT) and its ...
research
01/31/2023

UPop: Unified and Progressive Pruning for Compressing Vision-Language Transformers

Real-world data contains a vast amount of multimodal information, among ...
research
02/16/2023

Efficient 3D Object Reconstruction using Visual Transformers

Reconstructing a 3D object from a 2D image is a well-researched vision p...

Please sign up or login with your details

Forgot password? Click here to reset