FocusFormer: Focusing on What We Need via Architecture Sampler

08/23/2022
by   Jing Liu, et al.
0

Vision Transformers (ViTs) have underpinned the recent breakthroughs in computer vision. However, designing the architectures of ViTs is laborious and heavily relies on expert knowledge. To automate the design process and incorporate deployment flexibility, one-shot neural architecture search decouples the supernet training and architecture specialization for diverse deployment scenarios. To cope with an enormous number of sub-networks in the supernet, existing methods treat all architectures equally important and randomly sample some of them in each update step during training. During architecture search, these methods focus on finding architectures on the Pareto frontier of performance and resource consumption, which forms a gap between training and deployment. In this paper, we devise a simple yet effective method, called FocusFormer, to bridge such a gap. To this end, we propose to learn an architecture sampler to assign higher sampling probabilities to those architectures on the Pareto frontier under different resource constraints during supernet training, making them sufficiently optimized and hence improving their performance. During specialization, we can directly use the well-trained architecture sampler to obtain accurate architectures satisfying the given resource constraint, which significantly improves the search efficiency. Extensive experiments on CIFAR-100 and ImageNet show that our FocusFormer is able to improve the performance of the searched architectures while significantly reducing the search cost. For example, on ImageNet, our FocusFormer-Ti with 1.4G FLOPs outperforms AutoFormer-Ti by 0.5 the Top-1 accuracy.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/14/2022

Pareto-aware Neural Architecture Generation for Diverse Computational Budgets

Designing feasible and effective architectures under diverse computation...
research
11/18/2020

AttentiveNAS: Improving Neural Architecture Search via Attentive Sampling

Neural architecture search (NAS) has shown great promise designing state...
research
02/27/2021

Pareto-Frontier-aware Neural Architecture Generation for Diverse Budgets

Designing feasible and effective architectures under diverse computation...
research
04/26/2021

CompOFA: Compound Once-For-All Networks for Faster Multi-Platform Deployment

The emergence of CNNs in mainstream deployment has necessitated methods ...
research
09/14/2021

How to Simplify Search: Classification-wise Pareto Evolution for One-shot Neural Architecture Search

In the deployment of deep neural models, how to effectively and automati...
research
12/28/2022

Breaking the Architecture Barrier: A Method for Efficient Knowledge Transfer Across Networks

Transfer learning is a popular technique for improving the performance o...
research
03/27/2023

TOFA: Transfer-Once-for-All

Weight-sharing neural architecture search aims to optimize a configurabl...

Please sign up or login with your details

Forgot password? Click here to reset