Searching for Efficient Multi-Stage Vision Transformers

09/01/2021
by   Yi-Lun Liao, et al.
0

Vision Transformer (ViT) demonstrates that Transformer for natural language processing can be applied to computer vision tasks and result in comparable performance to convolutional neural networks (CNN), which have been studied and adopted in computer vision for years. This naturally raises the question of how the performance of ViT can be advanced with design techniques of CNN. To this end, we propose to incorporate two techniques and present ViT-ResNAS, an efficient multi-stage ViT architecture designed with neural architecture search (NAS). First, we propose residual spatial reduction to decrease sequence lengths for deeper layers and utilize a multi-stage architecture. When reducing lengths, we add skip connections to improve performance and stabilize training deeper networks. Second, we propose weight-sharing NAS with multi-architectural sampling. We enlarge a network and utilize its sub-networks to define a search space. A super-network covering all sub-networks is then trained for fast evaluation of their performance. To efficiently train the super-network, we propose to sample and train multiple sub-networks with one forward-backward pass. After that, evolutionary search is performed to discover high-performance network architectures. Experiments on ImageNet demonstrate that ViT-ResNAS achieves better accuracy-MACs and accuracy-throughput trade-offs than the original DeiT and other strong baselines of ViT. Code is available at https://github.com/yilunliao/vit-search.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/08/2020

Fast Neural Network Adaptation via Parameter Remapping and Architecture Search

Deep neural networks achieve remarkable performance in many computer vis...
research
06/25/2021

Vision Transformer Architecture Search

Recently, transformers have shown great superiority in solving computer ...
research
03/08/2022

Evolutionary Neural Cascade Search across Supernetworks

To achieve excellent performance with modern neural networks, having the...
research
03/23/2021

BossNAS: Exploring Hybrid CNN-transformers with Block-wisely Self-supervised Neural Architecture Search

A myriad of recent breakthroughs in hand-crafted neural architectures fo...
research
03/07/2021

Auto-tuning of Deep Neural Networks by Conflicting Layer Removal

Designing neural network architectures is a challenging task and knowing...
research
11/29/2021

Searching the Search Space of Vision Transformer

Vision Transformer has shown great visual representation power in substa...
research
03/31/2021

NetAdaptV2: Efficient Neural Architecture Search with Fast Super-Network Training and Architecture Optimization

Neural architecture search (NAS) typically consists of three main steps:...

Please sign up or login with your details

Forgot password? Click here to reset