PA DA: Jointly Sampling PAth and DAta for Consistent NAS

02/28/2023
by   Shun Lu, et al.
0

Based on the weight-sharing mechanism, one-shot NAS methods train a supernet and then inherit the pre-trained weights to evaluate sub-models, largely reducing the search cost. However, several works have pointed out that the shared weights suffer from different gradient descent directions during training. And we further find that large gradient variance occurs during supernet training, which degrades the supernet ranking consistency. To mitigate this issue, we propose to explicitly minimize the gradient variance of the supernet training by jointly optimizing the sampling distributions of PAth and DAta (PA DA). We theoretically derive the relationship between the gradient variance and the sampling distributions, and reveal that the optimal sampling probability is proportional to the normalized gradient norm of path and training data. Hence, we use the normalized gradient norm as the importance indicator for path and training data, and adopt an importance sampling strategy for the supernet training. Our method only requires negligible computation cost for optimizing the sampling distributions of path and data, but achieves lower gradient variance during supernet training and better generalization performance for the supernet, resulting in a more consistent NAS. We conduct comprehensive comparisons with other improved approaches in various search spaces. Results show that our method surpasses others with more reliable ranking performance and higher accuracy of searched architectures, showing the effectiveness of our method. Code is available at https://github.com/ShunLu91/PA-DA.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/20/2022

Shapley-NAS: Discovering Operation Contribution for Neural Architecture Search

In this paper, we propose a Shapley value based method to evaluate opera...
research
06/13/2022

Improve Ranking Correlation of Super-net through Training Scheme from One-shot NAS to Few-shot NAS

The algorithms of one-shot neural architecture search(NAS) have been wid...
research
08/22/2021

Pi-NAS: Improving Neural Architecture Search by Reducing Supernet Training Consistency Shift

Recently proposed neural architecture search (NAS) methods co-train bill...
research
07/17/2023

ShiftNAS: Improving One-shot NAS via Probability Shift

One-shot Neural architecture search (One-shot NAS) has been proposed as ...
research
11/20/2015

Variance Reduction in SGD by Distributed Importance Sampling

Humans are able to accelerate their learning by selecting training mater...
research
07/16/2022

CLOSE: Curriculum Learning On the Sharing Extent Towards Better One-shot NAS

One-shot Neural Architecture Search (NAS) has been widely used to discov...
research
03/03/2021

On the Importance of Sampling in Learning Graph Convolutional Networks

Graph Convolutional Networks (GCNs) have achieved impressive empirical a...

Please sign up or login with your details

Forgot password? Click here to reset