Analyzing and Mitigating Interference in Neural Architecture Search

08/29/2021
by   Jin Xu, et al.
0

Weight sharing has become the de facto approach to reduce the training cost of neural architecture search (NAS) by reusing the weights of shared operators from previously trained child models. However, the estimated accuracy of those child models has a low rank correlation with the ground truth accuracy due to the interference among different child models caused by weight sharing. In this paper, we investigate the interference issue by sampling different child models and calculating the gradient similarity of shared operators, and observe that: 1) the interference on a shared operator between two child models is positively correlated to the number of different operators between them; 2) the interference is smaller when the inputs and outputs of the shared operator are more similar. Inspired by these two observations, we propose two approaches to mitigate the interference: 1) rather than randomly sampling child models for optimization, we propose a gradual modification scheme by modifying one operator between adjacent optimization steps to minimize the interference on the shared operators; 2) forcing the inputs and outputs of the operator across all child models to be similar to reduce the interference. Experiments on a BERT search space verify that mitigating interference via each of our proposed methods improves the rank correlation of super-pet and combining both methods can achieve better results. Our searched architecture outperforms RoBERTa_ base by 1.1 and 0.6 scores and ELECTRA_ base by 1.6 and 1.1 scores on the dev and test set of GLUE benchmark. Extensive results on the BERT compression task, SQuAD datasets and other search spaces also demonstrate the effectiveness and generality of our proposed methods.

READ FULL TEXT
research
11/23/2022

NAS-LID: Efficient Neural Architecture Search with Local Intrinsic Dimension

One-shot neural architecture search (NAS) substantially improves the sea...
research
04/12/2021

Landmark Regularization: Ranking Guided Super-Net Training in Neural Architecture Search

Weight sharing has become a de facto standard in neural architecture sea...
research
01/06/2020

Deeper Insights into Weight Sharing in Neural Architecture Search

With the success of deep neural networks, Neural Architecture Search (NA...
research
06/16/2021

Redefining Neural Architecture Search of Heterogeneous Multi-Network Models by Characterizing Variation Operators and Model Components

With neural architecture search methods gaining ground on manually desig...
research
07/14/2022

NASRec: Weight Sharing Neural Architecture Search for Recommender Systems

The rise of deep neural networks provides an important driver in optimiz...
research
03/29/2022

Generalizing Few-Shot NAS with Gradient Matching

Efficient performance estimation of architectures drawn from large searc...
research
02/21/2019

Overcoming Multi-Model Forgetting

We identify a phenomenon, which we refer to as multi-model forgetting, t...

Please sign up or login with your details

Forgot password? Click here to reset