1 Introduction
Significant progress made by convolution neural networks (CNN) in challenging computer vision tasks has raised the demand to design powerful neural networks. Instead of manually design, Neural architecture search (NAS) has demonstrated great potentials in recent years. Early works of NAS by Real
et al[29, 28] and Elsken et al[11] achieved promising results but can only be applied to small datasets due to their large computation expenses. To this end, oneshot based methods have drawn much interest thanks to its promising training efficiency and remarkable ability to discover highperforming models. Oneshot method usually utilizes a hypernetwork, which subsumes all architectures in the search space, and use shared weights to evaluate different architectures.However, the search space of previous works (e.g, shown in Fig. 2) were usually carefully designed and did not enjoy too much flexibility on the network topology. For example, as one of mostly applied search spaces in the oneshot literature, the chainstructured search space[14] has sequentially connected intermediate feature maps, between which the edges are chosen from a set of computation operations. Networks with better operations can be discovered on this search space, but the network topology remains trivial. However, previous works [10, 17]
on network architecture design proved that complex topology will tremendously enhance the performance of deep learning models. We argue that complex topological structures added in search space will improve the performance of the searched network architectures as shown in Table
1.In this work, we are interested in exploring complex network typologies with oneshot method. We propose a novel network architecture search space shown in Fig. 2 which contains over
different network topologies, enabling the discovery of complex topology networks. The search space is obtained by introducing numerous computation modules as edges between nodes. A topology based architecture sampler is also introduced to sample architectures during oneshot training stage from the hypernetwork. However, the great diversity introduced by topologies brings difficulties to the oneshot approach. Specifically, we observe high variance of performance estimation through the oneshot shared parameters in two cases: estimation through shared parameters at different epochs of a single run and estimation through shared parameters obtain in different runs. Zhang
et al[43] explore the reason behind the variance of ranking under weight sharing strategy. Thus the ranking ability of shared parameters is compromised.To eliminate the interference of complex topologies, we estimate the expectation of architecture performance in additional training epochs of hypernetwork via multiple samples of shared parameters. An fast weights sampling methods based on Stochastic Gradient Langevin Dynamics is developed to sample shared parameters efficiently.
The resulted Stabilized Topological Neural Architecture Search (STNAS) achieves compatible performance with the stateoftheart NAS method. The resulted architecture STNASA obtains 76.4% top1 accuracy with only 326M MAdds. A larger architecture STNASB obtains 77.9% top1 accuracy with around 503M MAdds.
To summarize, our main contributions are as follows:

We introduce a topology augmented neural architecture search space that enables the discovery of efficient architectures with complex topology.

To relieve the complex topology’s interference on model ranking, we modified model evaluation based on the expectation of the sharing parameters’ performance.

We empirically demonstrate improvements on ImageNet classification under the same MAdds constraints compared with previous work, and show that the searched architectures transfer well to COCO object detection.
2 Related Work
Recently, auto machine learning methods have received a lot of attention due to its ability to design augmentation
[9, 22][19] and network architectures [45, 28, 4, 44, 25, 14, 13, 21, 20].Early neural architecture search (NAS) works normally involves reinforcement learning
[1, 45, 44, 46, 13, 36] or evolution algorithm [29, 26] to search for highperforming network architectures. However, these methods are usually computationally expensive which limits its uses in real scenarios.Recent attentions have been focused on alleviating the computation cost via weight sharing method. This method usually contains a single training process of an over parameterized hypernetwork which subsumes all candidate models, i.e, weights of the same operators are shared across different submodels. Notably, Liu et al[25] proposes continuous relaxations which enables optimizing network architectures with gradient decent, Cai et al[5] proposes a proxyless method to search on target datasets directly and Bender et al[2] introduces oneshot method to decouple training and searching stages. Our NAS work take the use of the weight sharing hypernetwork but relieve the variance during model training.
Early handcraft neural networks [15, 35, 33] tend to stack repeated motifs. Works in [34, 15, 17, 16] introduce different manual designed network topologies and result in performance gain.
Motivated by manual designed architectures [15, 35, 33], a widely used search space in works [45, 25, 44, 26, 13] are proposed to search for such motifs, dubbed cells or blocks, rather than all possible architectures. This search space is called cellbased space. Another widely used search space adopted in [5, 14, 36, 42] is called chainstructured space. This space sequentially stacks several operation layers where each layer serves its output as next layer’s input. NAS methods are adopted to search for operation layers in different position of this space. Work in [41] explores random wiring networks with less human prior and achieves comparable performance with manual designed networks.
3 Approach
Methods for NAS usually consist of three basic components: search space, performance estimation and search strategy. In this section, we first introduce our novel Topology Augmented Search Space and a new sampling strategy for hypernetwork training in this particular space. Secondly, we provide new model performance estimation approach to relieve the variance of model ranks during the training of hypernetwork. Finally the evolution algorithm for network search is described.
3.1 Topology Augmented Search Space
3.1.1 Motivation
To demonstrate the improvement of complex topology against a sequential structure, we take ResNet18 as a baseline and shows a subtle change on the topology obtains obvious performance boost. We randomly add 4 residual blocks to connect the feature maps of blocks in ResNet18’s [15] chain structure with 3 random seeds, and rescale the width to keep the same FLOPs, the results are in Table 1. The 3 complex structures imply the great potential of topologybased structure search.
Architecture 
Res18  Rand0  Rand1  Rand2 

Accuracy (%)  70.2  71.5  72.0  69.6 

3.1.2 Search Space
A neural network is denoted as a directed acyclic graph (DAG) defined by , where the node indicates the feature connected by edge and edges represent CNN operators. The nodes are indexed by the order of computation of their corresponding feature maps.
In our formulation, each is a minimum search unit, also referred as a choice block, which contains a set of candidates computation blocks. A hypernetwork is the network which subsumes all the subarchitectures in the search space. Following the previous works, we divide our search space into several sub DAGs (stages), each of which downsamples the input by a factor .
To enable the discovery of complex topology architectures, a novel topology augmented search space is proposed. In our search space, edges are divided into two categories, stem edges and branch edges, detailed in Fig. 4.
Stem Edges are nonremovable edges which always appear in candidate architectures. The stem edges exist between all node pairs , where . Stem edges are chainstructured, which sequentially connect all consecutive nodes in each stages. We use the 9 kinds of linear bottlenecks (LB) [31] as the candidate choices of stem edges. Further, on stem edges between feature maps with the same resolution, identity operation is added as an extra candidate to enhance topological diversity and depth flexibility. Therefore, there are choices in the sequential structures.
Branch Edges are optional to contribute to topology diversity in the search space. The branch edges exist between all node pairs , where . The candidate choices of branch edges are the same to stem edges. Differently, the branch edges could be abandoned flexibly.
When and
has different resolution, the stride of convolution operation in the edge is automatically adapted to align the feature maps. The number of nodes in a single stage is required to define the search space. Based on previous method, we set the number of nodes in each stage as 2, 2, 4, 8, 4.
The search space we proposed ensures network topology complexity. Network topology in this work is defined as the DAG formed by nodes and edges. For nodes, the total number of topology is . The search space we used in the experiment contains 20 nodes in total, which is around topologies. For comparison, the topologies contained in cellbased search space is around .
3.2 Training the Oneshot Hypernetwork
Oneshot method uses the hypernetwork to estimate the performance of architectures. Since huge amount of architectures exists in the hypernetwork concurrently, training the hypernetwork in whole will make the parameters of different architectures correlated with each other. To reduce the correlation, oneshot method samples a new network architectures at each gradient step and update the only the activated part of the shared parameters.
(1) 
makes prediction of input utilizing sampled model . Thus the gradient of parameters unused by remains zero. The architecture sampling distribution is usually set to trivial uniform sampling [14] across the choice for each single edge.
Suppose there are choices for stem edges and for branch edges other than none, a simple uniform sampling strategy in our search space can be described as:
(2)  
(3) 
However, the network sampled under this strategy in our space tends to sample architectures with high computational cost, because each of the large amount of branch edges has a low probability to be
none. Consequently, the architecture with low computational cost in the hypernetwork will underfit, which would cause a bias in evaluation stage. Thus, the sampling strategy needs further consideration. The whole training process of hypernetwork can be finded in Algo. 1. Suppose that is our target MAdds and is the MAdds of architecture , the sampling strategy should meet:(4) 
To meet the constraints on expected computation, the sampling probability of none choice in branch edges, , is defined to adjust the expected computation cost of sampled networks:
(5) 
3.3 Stabilizing Performance Estimation
In search stage, evaluating an architecture through the shared parameters is essential for exploring promising results. Previous work on oneshot method usually measure the network performance with fully trained hypernetwork weights directly. In this section, we first demonstrate our observation on random shuffling of candidates architectures ranking in our search space. Then we introduce our approach to improve the ranking stability.
3.3.1 Instability of Oneshot NAS
Since the hypernetwork is trained iterations, the shared parameter obtained after training is denoted as . We define a accuracy function which maps the model architecture and hypernetwork weights to the validation set accuracy. The value of function can be estimated by simply loading the weight used by and testing the model performance on validation set. The score function, denoted as , of previous approach is simply
(6) 
However, the true score function should be the actual performance of the model on validation set: , where denotes the weight obtained by sampling and training only. Oneshot approach takes a approximation to reuse the shared parameters for different architectures. Although this is empirically useful, we observe high variance of the model ranking in two cases: rankings at different epochs and rankings by different runs.
We randomly sample a set of architecture and obtain their independent weight. We rank their performance under shared parameters on validation set by different checkpoints at the last 20 epochs training of hypernetwork. As shown in Fig. 5 , the rank of a single checkpoint fluctuates a lot during hypernetwork training process and hardly distinguishes the performance of architectures. If we repeat the hypernetwork training with different random seeds for 20 times and obtain shared parameters . We quantify the correlation between rank of each and ground truth rank by Kendall’s coefficient [18]. Here, we show the ranking performance of the best and worst runs in Fig. 4.
These two observations imply the necessity of a stabilized evaluation strategy. To present our strategy, formulation of the instability need to be introduced. In this paper, we model the performance estimation randomness as an unbiased noise. Since the shared parameters is fundamentally different from the parameters trained independently, we use a function to model the affect of weight sharing. General consensus has been reached: empirical provides inaccurate but useful ranking, which demonstrates the desired rank preserving effect of . In summary, our model to describe the quantity relationship is:
(7)  
(8) 
It is obvious that the existence of the noise term would hurt the model ranking. The most trivial approach to alleviate the negative effects of is to train multiple hypernetworks, and eliminate the noise by taking expectation. However, this approach requires several times more computation resources for hypernetwork training.
3.3.2 SGMCMC Sampling
The sampling process is described in Algo. 2
. In order to obtain highquality low correlation samples of optimized shared parameters efficiently, we investigate the rich literature of Markov Chain Monte Carlo (MCMC) sampling methods
[3]. Recently, a few works demonstrate that constant learning rate stochastic gradient decent could be modified to Stochastic Gradient Langevin Dynamics (SGLD) to realize a Stochastic Gradient MCMC method under mild assumption[6, 39]. Here, we apply SGLD [39, 38] to approximate iid samples of share parameters posterior. The update rule we use, is simply(9) 
Here is the number of data used to compute gradients (batch size). The step size is set to the final learning rate of subnet training. To ensure the independence, we generate each sample after SGLD update iterates for a data epoch.
To generate the iid samples of shared weights, we load the weights of the hypernetwork after its training finishes, and set as the initial sample. Then, for each , we apply SGLD to obtain the next sample of parameter posterior with the rule in Eq. (9). Thus we can obtain multiple samples of hypernetwork parameters.
3.3.3 Average Accuracy and Parameter
Once we have samples of which approximates the parameters obtained by different run. To eliminate the effect of random noise and stabilize the performance estimation, we propose two approaches: score expectation and parameter expectation.
Expectation over scores approach is to define the score of each model as the expectation of validation accuracy over sampled shared weights.
(10) 
Expectation over parameters approach is to take the average of sampled shared parameters and use average parameters to evaluate the performance of each model.
(11) 
3.3.4 Independent finetuning
When evaluating the single architecture performance, loads the weights from the hypernetwork and resuming training the architecture independently should be able to get more architecturerelevant weights. Thus we test this approach in our experiment.
3.4 Evolution Algorithm
Inspired by recent work [26, 36], we apply evolution algorithm NSGAII as the search agent. In this section, we first introduce some basic concept of NSGAII. Next we discuss how we apply NSGAII to our search space.
3.4.1 NsgaIi
We seek to obtain the model architecture with excellent performance under the constraint of computational expense. NSGAII is the most popular choice among multiobjective evolutionary method. The core component of NSGAII, is the Non Dominated Sorting which benefits the trade off between conflicting objectives. Since our optimization target is to minimize MAdds and maximize performance of architecture under different computational constraints.
3.4.2 Initialization
To reduce manual bias and explore the search space better, we use random initialization for all individuals of the first generation. More specifically, each architecture randomly select basic operators for each block in the search space.
3.4.3 Crossover and Mutation
Singlepoint crossover on random position is adopted in our evolution algorithm. For two certain individuals and , a singlepoint crossover strategy on position will result in a new individual .
We use random choice mutation to enhance generation diversity. When a mutation happens to an individual, a selected operation block in it is changed to another available choice randomly.
4 Experiments and Results
We verify the effectiveness of our method on a large classification benchmark, ImageNet [30]. In this section, we firstly describe our implementation details. Secondly, we present the performance of searching results on ImageNet as well as comparison with state of the art methods., Finally, we demonstrate the advantage of our designs via ablation study.
4.1 Experiments Settings
Datasets We conduct experiments on the ImageNet, a standard benchmark for classification task. It has training images and validation images.
Train Details of Hypernetwork For the training of hypernetwork, we adopt cosine learning rate scheduler with learning rate initialized as 0.1 and decaying to 2.5e4 during epochs. A L2 regularization is used and its weight is set to 1e4. The optimizer is minibatch stochastic gradient decent (SGD) with batch size 512 and we set momentum as 0 to decouple the gradients of architectures sampled in different batches. Hyperparameter is set to . The hypernetwork is trained on 32 GTX1080Ti GPUs. We implement the stabilized evaluation of our method by saving checkpoints at epochs described in Sec. 3.3.2. The finetune strategy mentioned above is conducted with learning rate 2.5e4.
Search Details The evolution agent randomly generates individuals for initialization. Then it repeats the exploitation and exploration loop where it generates individuals via singlepoint crossover and random mutation. It conducts loops and evaluates models. At last, we choose the top ranked 2 models under different MAdds constraints.
Model 
Search space  Params  MAdds  Top1 acc  Top5 acc 
(M)  (M)  (%)  (%)  
DARTS [25] 
Cellbased  4.7  574  73.3  91.3 
ProxylessR [5]  Chainstructured    320  74.6  92.2 
Singlepath NAS [32]  Chainstructured  4.3  365  75.0  92.2 
FairNASA [8]  Chainstructured  4.6  388  75.3  92.4 
FBNetC [40]  Chainstructured  5.5  375  74.9   
SPOS [14] 
Chainstructured    328  74.7   
BetaNetA [12]  Chainstructured  7.2  333  75.9  92.8 
STNASA(ours) 
Topology augmented  5.2  326  76.4  93.1 

Training Details of Resulted Architecture For the independent training of resulted architectures, we use cosine learning rate scheduler with initial learning rate . We train the model for 300 epochs with batch size 2048 and adopt SGD optimizer with nesterov and momentum value 0.9. To prevent overfitting, we use L2 regularization with weight 1e4 and standard augmentations including random crop and colorjitter.
Model 
Search space  Params  MAdds  Top1 acc  Top5 acc 
(M)  (M)  (%)  (%)  
*MNASNetA1 [36] 
Chainstructured  3.9  312  75.2  92.5 
*MNASNetA2 [36]  Chainstructured  4.8  340  75.6  92.7 
*RCNetB [42]  Chainstructured  4.7  471  74.7  92.0 
*NASNetB [46]  Cellbased  5.3  488  72.8  91.3 
*EfficientNetB0 [37])  Chainstructured  5.3  390  76.3  93.2 
STNASA (ours)  Topology augmented  5.2  326  76.4  93.1 
1.4MobileNetV2 [31] 
Chainstructured  6.9  585  74.7  92.5 
2.0ShuffleNetV2 [27]  Chainstructured  7.4  591  74.9   
*NASNetC [46]  Cellbased  4.9  558  72.5  91.0 
*NASNetA [46]  Cellbased  5.3  564  74.0  91.6 
*1.4MNASNetA1 [36]  Chainstructured    600  77.2  93.7 
*RENASNet [7]  Cellbased  5.4  580  75.7  92.6 
*PNASNet [24]  Cellbased  5.1  588  74.2  91.9 
STNASB (ours)  Topology augmented  7.8  503  77.9  93.8 

4.2 Main Results
STNAS looks for models with objectives of low MAdds and high accuracy. We select two resulted models separately under small and large MAdds constraints, namely, STNASA and STNASB. Architectures and performance of them compared with stateoftheart methods are discussed in this subsection.
Performance on ImageNet. We compare STNAS method with efficient NAS methods, including DARTS, ProxyLessNAS and FBNet, in Table 2. Our model STNASA outperforms all of them while with the least MAdds and comparable parameter number.
For architectures resulted from high cost, i.e, manually designed networks and networks obtained by samplebased methods, we compare STNAS with them in two groups divided by MAdds, as shown in Table 3. At a much less search cost, our STNAS outperform all the methods in both MAdds groups.
Model 
MAdds (backbone)  mAP 

(G)  
MobileNetV2 
0.33  31.7 
STNASA  0.33  33.2 
ResNet18 
1.81  32.2 
STNASB  0.50  35.3 
ResNet50  4.09  36.9 
STNASB  1.03  37.7 

Performance on COCO. Our implementation is based on feature pyramid network (FPN)[23]. Different models pretrained on ImageNet is utilized as feature extractor. All the models are trained for 13 epochs, known as schedule. The results are shown in Table 4. Our STNASA backbone outperforms MobileNetV2. The STNASB performs comparably with ResNet50 with much less MAdds.
4.3 Ablation Studies
4.3.1 Rank Fluctuation
To explain the importance of our stabilization mechanism, we randomly sample a set of architecture and rank their performance under shared parameters on validation set at the last 10 epochs training of hypernetwork. As shown in Fig. 5, the rank of a single checkpoint fluctuates a lot during hypernetwork training process and hardly distinguishes the performance of architectures, implying the necessity of a stabilized evaluation strategy.
Estimation approach  

Single checkpoint 
0.71 
Finetune 
0.64 
SGLDparam 
0.84 
SGLDacc 
0.81 
4.3.2 Ranking Verification
We further verify this reduction by quantifying the ranking ability of different evaluation strategies by correlation coefficient between ranks in hypernetwork and the ground truth ranks. Kendall’s tau coefficient is adopted as the metric in our verification. We randomly sample 12 networks and train them form scratch to obtain the ground truth rank. To compare with single checkpoint, we make use of checkpoints of 10 epochs at the 591th, 592th, …, 600th epoch to generate 10 ranks and get 10 correlation coefficients with the ground truth rank. The median of the 10 correlation coefficients is adopted to compare with other strategies. It is observed in Table 5 that SGLD consistently achieves higher correlation coefficients than finetune and single checkpoint, which verifies the effectiveness of SGLD in the reduction of parameter variance.
5 Conclusion
We proposed a topologydiverse search space and a novel search method, STNAS. In STNAS, we improve both the sampling strategy during hypernetwork training and the architecture evaluation approach by rigorous theoretical analysis. Sound experiments demonstrate the effectiveness of our designs and achieve consistent improvements under different computation cost constraints.
References
 [1] (2016) Designing neural network architectures using reinforcement learning. arXiv preprint arXiv:1611.02167. Cited by: §2.
 [2] (2018) Understanding and simplifying oneshot architecture search. In International Conference on Machine Learning, pp. 549–558. Cited by: §2.
 [3] (2006) Pattern recognition and machine learning. springer. Cited by: §3.3.2.
 [4] (2017) SMASH: oneshot model architecture search through hypernetworks. arXiv preprint arXiv:1708.05344. Cited by: §2.
 [5] (2018) Proxylessnas: direct neural architecture search on target task and hardware. arXiv preprint arXiv:1812.00332. Cited by: §2, §2, Table 2.
 [6] (2016) Bridging the gap between stochastic gradient mcmc and stochastic optimization. In Artificial Intelligence and Statistics, pp. 1051–1060. Cited by: §3.3.2.
 [7] (2018) Reinforced evolutionary neural architecture search. arXiv preprint arXiv:1808.00193. Cited by: Table 3.
 [8] (2019) FairNAS: rethinking evaluation fairness of weight sharing neural architecture search. arXiv preprint arXiv:1907.01845. Cited by: Table 2.
 [9] (2018) Autoaugment: learning augmentation policies from data. arXiv preprint arXiv:1805.09501. Cited by: §2.
 [10] (2019) SpineNet: learning scalepermuted backbone for recognition and localization. arXiv preprint arXiv:1912.05027. Cited by: §1.
 [11] (2017) Simple and efficient architecture search for convolutional neural networks. arXiv preprint arXiv:1711.04528. Cited by: §1.
 [12] (2019) BETANAS: balanced training and selective drop for neural architecture search. arXiv preprint arXiv:1912.11191. Cited by: Table 2.
 [13] (2019) Irlas: inverse reinforcement learning for architecture search. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 9021–9029. Cited by: §2, §2, §2.
 [14] (2019) Single path oneshot neural architecture search with uniform sampling. arXiv preprint arXiv:1904.00420. Cited by: §1, §2, §2, §3.2, Table 2.
 [15] (2016) Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770–778. Cited by: §2, §2, §3.1.1.
 [16] (2018) Squeezeandexcitation networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 7132–7141. Cited by: §2.
 [17] (2017) Densely connected convolutional networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 4700–4708. Cited by: §1, §2.
 [18] (1938) A new measure of rank correlation. Biometrika 30 (1/2), pp. 81–93. Cited by: §3.3.1.
 [19] (2019) AMlfs: automl for loss function search. In Proceedings of the IEEE International Conference on Computer Vision, pp. 8410–8419. Cited by: §2.
 [20] (2019) Improving oneshot nas by suppressing the posterior fading. arXiv preprint arXiv:1910.02543. Cited by: §2.
 [21] (2019) Computation reallocation for object detection. arXiv preprint arXiv:1912.11234. Cited by: §2.
 [22] (2019) Online hyperparameter learning for autoaugmentation strategy. In Proceedings of the IEEE International Conference on Computer Vision, pp. 6579–6588. Cited by: §2.
 [23] (2017) Feature pyramid networks for object detection. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 2117–2125. Cited by: §4.2.
 [24] (2018) Progressive neural architecture search. In Proceedings of the European Conference on Computer Vision (ECCV), pp. 19–34. Cited by: Table 3.
 [25] (2018) Darts: differentiable architecture search. arXiv preprint arXiv:1806.09055. Cited by: §2, §2, §2, Table 2.

[26]
(2018)
NSGAnet: a multiobjective genetic algorithm for neural architecture search
. arXiv preprint arXiv:1810.03522. Cited by: §2, §2, §3.4.  [27] (2018) Shufflenet v2: practical guidelines for efficient cnn architecture design. In Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131. Cited by: Table 3.

[28]
(2019)
Regularized evolution for image classifier architecture search
. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 33, pp. 4780–4789. Cited by: §1, §2.  [29] (2017) Largescale evolution of image classifiers. In Proceedings of the 34th International Conference on Machine LearningVolume 70, pp. 2902–2911. Cited by: §1, §2.
 [30] (2015) Imagenet large scale visual recognition challenge. International journal of computer vision 115 (3), pp. 211–252. Cited by: §4.
 [31] (2018) Mobilenetv2: inverted residuals and linear bottlenecks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4510–4520. Cited by: §3.1.2, Table 3.
 [32] (2019) Singlepath nas: designing hardwareefficient convnets in less than 4 hours. arXiv preprint arXiv:1904.02877. Cited by: Table 2.

[33]
(2017)
Inceptionv4, inceptionresnet and the impact of residual connections on learning
. In ThirtyFirst AAAI Conference on Artificial Intelligence, Cited by: §2, §2.  [34] (2015) Going deeper with convolutions. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1–9. Cited by: §2.
 [35] (2016) Rethinking the inception architecture for computer vision. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 2818–2826. Cited by: §2, §2.
 [36] (2019) Mnasnet: platformaware neural architecture search for mobile. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2820–2828. Cited by: §2, §2, §3.4, Table 3.
 [37] (2019) EfficientNet: rethinking model scaling for convolutional neural networks. arXiv preprint arXiv:1905.11946. Cited by: Table 3.
 [38] (2016) Consistency and fluctuations for stochastic gradient langevin dynamics. The Journal of Machine Learning Research 17 (1), pp. 193–225. Cited by: §3.3.2.
 [39] (2011) Bayesian learning via stochastic gradient langevin dynamics. In Proceedings of the 28th international conference on machine learning (ICML11), pp. 681–688. Cited by: §3.3.2.
 [40] (2019) Fbnet: hardwareaware efficient convnet design via differentiable neural architecture search. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 10734–10742. Cited by: Table 2.
 [41] (2019) Exploring randomly wired neural networks for image recognition. arXiv preprint arXiv:1904.01569. Cited by: §2.
 [42] (2019) Resource constrained neural network architecture search: will a submodularity assumption help?. In Proceedings of the IEEE International Conference on Computer Vision, pp. 1901–1910. Cited by: §2, Table 3.
 [43] (2020) Deeper insights into weight sharing in neural architecture search. arXiv preprint arXiv:2001.01431. Cited by: §1.
 [44] (2018) Blockqnn: efficient blockwise neural network architecture generation. arXiv preprint arXiv:1808.05584. Cited by: §2, §2, §2.
 [45] (2016) Neural architecture search with reinforcement learning. arXiv preprint arXiv:1611.01578. Cited by: §2, §2, §2.
 [46] (2018) Learning transferable architectures for scalable image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 8697–8710. Cited by: §2, Table 3.
Comments
There are no comments yet.