Searching the Search Space of Vision Transformer

11/29/2021
by   Minghao Chen, et al.
0

Vision Transformer has shown great visual representation power in substantial vision tasks such as recognition and detection, and thus been attracting fast-growing efforts on manually designing more effective architectures. In this paper, we propose to use neural architecture search to automate this process, by searching not only the architecture but also the search space. The central idea is to gradually evolve different search dimensions guided by their E-T Error computed using a weight-sharing supernet. Moreover, we provide design guidelines of general vision transformers with extensive analysis according to the space searching process, which could promote the understanding of vision transformer. Remarkably, the searched models, named S3 (short for Searching the Search Space), from the searched space achieve superior performance to recently proposed models, such as Swin, DeiT and ViT, when evaluated on ImageNet. The effectiveness of S3 is also illustrated on object detection, semantic segmentation and visual question answering, demonstrating its generality to downstream vision and vision-language tasks. Code and models will be available at https://github.com/microsoft/Cream.

READ FULL TEXT

page 6

page 14

research
07/01/2021

AutoFormer: Searching Transformers for Visual Recognition

Recently, pure transformer-based models have shown great potentials for ...
research
06/25/2021

Vision Transformer Architecture Search

Recently, transformers have shown great superiority in solving computer ...
research
01/03/2022

Vision Transformer Slimming: Multi-Dimension Searching in Continuous Optimization Space

This paper explores the feasibility of finding an optimal sub-model from...
research
03/23/2022

Training-free Transformer Architecture Search

Recently, Vision Transformer (ViT) has achieved remarkable success in se...
research
10/29/2020

Cream of the Crop: Distilling Prioritized Paths For One-Shot Neural Architecture Search

One-shot weight sharing methods have recently drawn great attention in n...
research
04/01/2021

One-Shot Neural Ensemble Architecture Search by Diversity-Guided Search Space Shrinking

Despite remarkable progress achieved, most neural architecture search (N...
research
09/01/2021

Searching for Efficient Multi-Stage Vision Transformers

Vision Transformer (ViT) demonstrates that Transformer for natural langu...

Please sign up or login with your details

Forgot password? Click here to reset