Vision Transformer Slimming: Multi-Dimension Searching in Continuous Optimization Space

01/03/2022
by   Arnav Chavan, et al.
17

This paper explores the feasibility of finding an optimal sub-model from a vision transformer and introduces a pure vision transformer slimming (ViT-Slim) framework that can search such a sub-structure from the original model end-to-end across multiple dimensions, including the input tokens, MHSA and MLP modules with state-of-the-art performance. Our method is based on a learnable and unified l1 sparsity constraint with pre-defined factors to reflect the global importance in the continuous searching space of different dimensions. The searching process is highly efficient through a single-shot training scheme. For instance, on DeiT-S, ViT-Slim only takes  43 GPU hours for searching process, and the searched structure is flexible with diverse dimensionalities in different modules. Then, a budget threshold is employed according to the requirements of accuracy-FLOPs trade-off on running devices, and a re-training process is performed to obtain the final models. The extensive experiments show that our ViT-Slim can compress up to 40 parameters and 40 accuracy by  0.6 searched models on several downstream datasets. Our source code will be publicly available.

READ FULL TEXT
research
11/29/2021

Searching the Search Space of Vision Transformer

Vision Transformer has shown great visual representation power in substa...
research
07/01/2021

AutoFormer: Searching Transformers for Visual Recognition

Recently, pure transformer-based models have shown great potentials for ...
research
06/09/2022

Neural Prompt Search

The size of vision models has grown exponentially over the last few year...
research
03/12/2021

Searching by Generating: Flexible and Efficient One-Shot NAS with Architecture Generator

In one-shot NAS, sub-networks need to be searched from the supernet to m...
research
03/29/2021

ViViT: A Video Vision Transformer

We present pure-transformer based models for video classification, drawi...
research
03/13/2023

Dynamic Neural Network for Multi-Task Learning Searching across Diverse Network Topologies

In this paper, we present a new MTL framework that searches for structur...
research
03/14/2019

A Deep Patent Landscaping Model using Transformer and Graph Embedding

Patent landscaping is a method that is employed for searching related pa...

Please sign up or login with your details

Forgot password? Click here to reset