Deeper Insights into Weight Sharing in Neural Architecture Search

01/06/2020
by   Yuge Zhang, et al.
19

With the success of deep neural networks, Neural Architecture Search (NAS) as a way of automatic model design has attracted wide attention. As training every child model from scratch is very time-consuming, recent works leverage weight-sharing to speed up the model evaluation procedure. These approaches greatly reduce computation by maintaining a single copy of weights on the super-net and share the weights among every child model. However, weight-sharing has no theoretical guarantee and its impact has not been well studied before. In this paper, we conduct comprehensive experiments to reveal the impact of weight-sharing: (1) The best-performing models from different runs or even from consecutive epochs within the same run have significant variance; (2) Even with high variance, we can extract valuable information from training the super-net with shared weights; (3) The interference between child models is a main factor that induces high variance; (4) Properly reducing the degree of weight sharing could effectively reduce variance and improve performance.

READ FULL TEXT

page 7

page 14

research
10/04/2021

An Analysis of Super-Net Heuristics in Weight-Sharing NAS

Weight sharing promises to make neural architecture search (NAS) tractab...
research
04/12/2021

Landmark Regularization: Ranking Guided Super-Net Training in Neural Architecture Search

Weight sharing has become a de facto standard in neural architecture sea...
research
03/09/2020

How to Train Your Super-Net: An Analysis of Training Heuristics in Weight-Sharing NAS

Weight sharing promises to make neural architecture search (NAS) tractab...
research
08/29/2021

Analyzing and Mitigating Interference in Neural Architecture Search

Weight sharing has become the de facto approach to reduce the training c...
research
03/18/2023

Weight-sharing Supernet for Searching Specialized Acoustic Event Classification Networks Across Device Constraints

Acoustic Event Classification (AEC) has been widely used in devices such...
research
08/12/2021

DARTS for Inverse Problems: a Study on Hyperparameter Sensitivity

Differentiable architecture search (DARTS) is a widely researched tool f...
research
02/21/2019

Overcoming Multi-Model Forgetting

We identify a phenomenon, which we refer to as multi-model forgetting, t...

Please sign up or login with your details

Forgot password? Click here to reset