1 Introduction
Designing successful handcrafted convolutional neural networks (CNN) is a laborious task due to the heavy reliance on expert experience and large amount of trials. To reduce the labour of human experts, neural architecture search (NAS) approaches DBLP:conf/cvpr/ZophVSL18/NasNet ; DBLP:conf/nips/NaymanNRFJZ19/XNAS ; DBLP:conf/cvpr/ChenMZXHMW19/RENAS ; DBLP:conf/iclr/LiuSY19/DARTS are proposed to automatically discover effective CNN architectures. The main idea of existing NAS approaches is to define a search space and design a search strategy to find CNN architectures with high performance, e.g., high validation accuracy. Since the search space of CNN is huge DBLP:conf/cvpr/DongY19/GDAS , most NAS algorithms choose to use fixed outer network level structure, as is shown in Figure 1, and search the repeatable cell structure only, so as to reduce searching cost. This kind of fixed structures perform well when enough cells and channels are used. However, when the architecture becomes more lightweight (with less parameters), its accuracy decreases significantly DBLP:conf/cvpr/ZophVSL18/NasNet ; DBLP:conf/nips/NaymanNRFJZ19/XNAS ; DBLP:conf/icml/TanL19/ArcICML19 ; DBLP:journals/corr/abs190404123/ASAP . For example, when reducing the initial channel number from 44 to 36, the number of parameters in ASAP reduces 1.2M, and the test accuracy of ASAP on CIFAR10 decreases 0.25% DBLP:journals/corr/abs190404123/ASAP .
Obviously, such inflexible structures prevent us from getting CNN with less parameters and higher accuracy. To get better lightweight architectures, we need to consider more flexible outer level structures and more diversified cell structures. MNasNet DBLP:conf/cvpr/TanCPVSHL19/MNasNet also noticed this problem. MNasNet pointed out that the cell structure diversity is significant in the resourceconstrained CNN models, and studied more flexible CNN architectures, where each block of cells is allowed to contain different
structures and can repeat for different times. It searched the optimal setting of cell structures and cell numbers of different blocks, and achieved good results. Such solution breaks the traditional inflexible structures, but also has a defect, i.e. the search cost is too high. The search space of one cell is large, let alone that of more cells combined with parameters related to the outer level structures. The huge search space brings MNasNets considserable
search cost, i.e., MNasNet took hundreds of TPU days to accomplish the search process.
In order to control the size of search space thus to reduce search cost, and at the same time explore more flexible architectures for better models, i.e., Pareto optimal CNN models with high accuracy and small parameter number, in this paper, we put forward the idea of highperformance cell stacking (HCS). That is, to utilize highperformance cells discovered by existing NAS algorithms to construct flexible architectures, as shown in Figure 2.2, and search for the optimal cell stacking method to obtain better lightweight CNN. The introduction of existing highperformance cells, on the one hand, ensures the effectiveness of components, effectively reduces the search cost caused by cell design, and greatly reduces search space to avoid the search of invalid architecture, thus ensuring the search efficiency; on the other hand, increases the cell diversity as well as flexibility of CNN architectures. Our HCSbased search space could make full use of existing research results, and thus make it possible to explore more flexible CNN architectures efficiently, which is superior to existing search spaces.
In addition, in order to efficiently find Pareto optimal architectures that are lightweight and accurate in our newly designed search space, we design a multiobjective optimization (MOO) algorithm, called MultiObjective Optimization based on Adaptive Reverse Recommendation (MoARR). The idea of MoARR is to avoid selecting worse architectures by effectively analyzing our historical evaluation information, thus reduce evaluation cost and accelerate the optimization. More specifically, MoARR utilizes historical information to study the potential relationship between the parameter quantity, accuracy and the architecture encoding, thus adaptively learn the reverse recommendation model (RRModel) that is capable of selecting the most suitable architecture code according to the target performance. Then, MoARR recommends better architectures to be evaluated under the guidance of RRModel, i.e., inputting higher accuracy and smaller parameter number to RRModel for better architectures. With the increase of the evaluated architectures, RRModel becomes more reliable, and the architectures recommended by it approach to the Pareto Optimality. Using RRModel, MoARR can pertinently optimize architectures, and thus greatly reduce useless architecture evaluations.
Compared with the existing MOO approaches, our MoARR is more suitable for dealing with the MOO NAS problems, where architecture evaluations are expensive and timeconsuming. More specifically, the existing approaches for seeking with Paretooptimal front can be classified into two categories, approaches based on mathematical programming
DBLP:conf/nips/SenerK18/GredMO18and those based on genetic algorithm
DBLP:journals/tec/ChengJOS16/MO16 ; DBLP:journals/tec/DebJ14/MO14 ; DBLP:journals/isci/QuS10/MO10 ; DBLP:journals/tec/DebAPM02/MO02 . The first class of methods are unable to cope with our blackbox MOO NAS problem, where expression and gradient information of two optimization objective are unknown. The genetic methods can deal with the blackbox problem, but could evaluate many useless architectures due to the uncertainties brought by many random operations and the neglect of valuable rules provided by historical evaluation information. They may require many samples and generations for good results, which is not suitable for dealing with expensive MOO NAS problems.We compare MoARR with the classic NAS algorithms (Section 3). Experimental results show that MoARR can find a powerful and lightweight model (with 1.9% error rate and 2.3M parameters) on CIFAR10 in 6 GPU hours, which outperforms the stateofthearts. The explored network architecture is transferable to ImageNet and 5 additional datasets, and achieves good results, e.g., 76.0% top1 accuracy with only 4.9M parameters on ImageNet.
2 Proposed Approach
To lay out our approach, we first give the specific definition of our research objective (Section 2.1), and define a new search space of NAS (Section 2.2). We then introduce MoARR that views NAS as a multiobjective optimization task, and makes full use of historical evaluation information to obtain highperformance lightweight CNN models (Section 2.3
). In order to accelerate the evaluation and reduce computational cost, we also design the acceleration strategy to use a small number of epochs and a few samples to quickly obtain accuracy scores (Section
2.4). Figure 2 is our overall framework.2.1 Target
In this paper, we aim to increase the flexibility and diversity of CNN architectures, so as to obtain lightweight architectures with higher accuracy. Formally, our search target is defined as follows:
(1) 
where denotes all CNN architecture codes in our new search space, which is described in Section 2.2, denotes the accuracy score, denotes the number of parameters, and is the upper limit of parameter amount. This is a multiobjective optimization task, and our goal is to obtain architectures that provide the best accuracy/parameter amount tradeoff.
2.2 Search Space
Structural flexibility and cell diversity are two key points for the design of our search space. To achieve structural flexibility, we make the number of cells and channels in each stage to be adjustable. In this way, we can get architectures with diversified width and height. As for the cell diversity, we allow cells in different
stages to have different structures, and take the existing highperformance cells structures discovered by previous NAS work as available options. Details are as follows.
We provide a general network architecture for our search space, as shown in Figure 2.2. It consists of 5 stages. Stage 1 extracts common lowlevel features, stages 2
4 downsample the spatial resolution of the input tensor with a stride of 2, and stage 5 produces the final prediction with a global pooling layer and a fully connected layer. Previous NAS approaches generally choose to use 2 reduction cells in CNN architectures, whereas some
DBLP:conf/icml/PhamGZLD18/ENAS uses 3. In pursuit of a more general search space, we use to decide whether to use the 3rd reduction cell in stage 4. For stages 1, 2, 3, each consists of , , normal cells, where , , are integers, i.e. . Different settings of L/M/N/ lead to different network depths. The width (number of output channels) of the cells in Stage 1 is denoted as , and that of the cells in Stage (2,3,4) is denoted by , where represents the growth ratio of width compared to the previous stage. The name of the ith normal cell in stage s is denoted as ^{1}^{1}1The normal cells used in the same stage have the same name., the name of reduction cell used in stage s is denoted as , and the type of global pooling used in Stage 5 is denoted as . The options of , and are shown in Table 1. Therefore, an architecture can be encoded as shown in Figure 2.2. The set of all possible codes is denoted as and is referred to as our search space.Cell Source 




DARTS (1st) DBLP:conf/iclr/LiuSY19/DARTS  Darts_V1_NC  DARTS_V1_RC  
DARTS (2nd) DBLP:conf/iclr/LiuSY19/DARTS  Darts_V2_NC  DARTS_V2_RC  
NasNetA DBLP:conf/cvpr/ZophVSL18/NasNet  NasNet_NC  NasNet_RC  
AmoebaNetA DBLP:conf/aaai/RealAHL19/AmoebaNet  AmoebaNet_NC  AmoebaNet_RC  
ENAS DBLP:conf/icml/PhamGZLD18/ENAS  ENAS_NC  ENAS_RC  
RENAS DBLP:conf/cvpr/ChenMZXHMW19/RENAS  RENAS_NC  RENAS_RC  
GDAS DBLP:conf/cvpr/DongY19/GDAS  GDAS_V1_NC  GDAS_V1_RC  
GDAS (FRC) DBLP:conf/cvpr/DongY19/GDAS  GDAS_V2_NC  GDAS_V2_RC  
ASAP DBLP:journals/corr/abs190404123/ASAP  ASAP_NC  ASAP_RC  
ShuffleNet DBLP:conf/cvpr/ZhangZLS18/ShuffleNet  ShuffleNet_NC  ShuffleNet_RC 
Global Polling Definition 



Global average polling  Avg_GP  
Global max polling  Max_GP  

AvgMax_GP 
2.3 MoARR: MultiObjective Optimization based on Adaptive Reverse Recommendation
Let denote two elements in set . If and , we say that architecture Pareto dominates ( is better than ), denoted as . For elements in set that are not Pareto dominated by other elements in , we call them the Pareto boundary of , denoted as . Then, the Pareto optimal solutions for our multiobjective NAS problem is denoted by .
In MoARR, our target is to quickly optimize the elements in Pareto boundary , where denotes the set of evaluated architectures, and finally obtain . More specifically, we aim at selecting the best possible architectures to evaluate for each iteration, avoiding selecting worse architectures as much as possible, thus accelerate optimization process and reduce evaluation cost. To achieve this goal, we put forward Adaptive Reverse Recommendation (ARR), an architecture selection strategy which is capable of utilizing historical evaluation information of for effective and targeted architecture recommendation, i.e., recommending the most suitable architecture code according to the performance demands. Such performanceoriented architecture selection strategy can greatly reduce useless architecture evaluations and improve the quality of the selected architectures by setting superior performance scores, which is coincident with the goal of MoARR. Besides, ARR avoids the defects of genetic MOO methods mentioned in Section 1, which makes MoARR more suitable for dealing with expensive NAS problem. We further discuss ARR as follows.
ARR. The model, which maps the performance pair (,) to a suitable architecture code that has the closest performance scores, is called the reverse recommendation model (RRModel) in ARR. And the core idea of ARR is to make full use of historical evaluation information = to adaptively build effective RRModel, and then utilize RRModel to select superior architectures directly by setting better performance values.
We note that the construction of effective RRModel is the key point of ARR. A straightforward solution for this task is to utilize to construct a performancetocode training data
, where the performance scores are considered as input and corresponding codes as the target output. Then use
to train a MultiLayer Perception (MLP) to obtain RRModel. Note that there may exists different codes with totally the same or very similar accuracy score and parameter amount in, and the contradictory outputs may mislead the loss function and thus makes RRModel less effective. To eliminate the influence of contradictory values, this solution would only preserve one code for each performance pair in
. However, such operation would also result in two defects: (1) Information loss, the valuable information contained in the deleted records is underutilized; (2) Difficulty in selection, how to select the most suitable code to preserve thus achieve the best recommendation effect is unknown, and many trails should be done to achieve the best results, which is timeconsuming.To avoid the above two defects, we propose an auxiliary modelbased loss function for RRModel, which helps RRModel adaptively learn the most suitable output values by making full use of all historical information in . Suppose forward evaluation model (FEModel) is capable of mapping the given architecture code to its performance pair, i.e., (accuracy score, parameter amount)^{2}^{2}2The mappings are opposite in FEModel and RRModel. Then, the new loss function of RRModel is defined as follows:
(2) 
where is a set of accuracyparameter performance scores, and denotes the suitable architecture codes recommended by RRModel. Equation 2 measures the differences between the target performance and the performance of the codes recommended by RRModel. It can help RRModel to automatically determine suitable outputs under the guidance of the auxiliary model FEModel. More specifically, we can input enough accuracyparameter performance scores to RRModel without giving outputs, and RRModel can adjust its outputs adaptively according to the performance feedback provided by FEModel, and thus achieve reasonable recommendation. As for the FEModel, it is unknown since neural architecture evaluation is a blackbox. However, we could utilize to construct codetoperformance training data =, and use to train a MLP so as to approximate FEModel. Note that, different from , does not exist contradictory problems. Therefore, with the usage of the new loss function, RRModel can be built automatically and effectively by making full use of all historical data , and the two defectes are avoided.
The next step is to use the obtained RRModel to select superior architectures. Since the target is to optimize = , we need to find the architecture codes that are not Pareto dominated by the evaluated codes . Thus, we should input more competitive performance scores, i.e., performance scores with higher accuracy or lower parameter amount than the scores of , to RRModel. We denote these performance scores as , and its formula is given as follows:
(3) 
Figure 4 is an example of . Suppose shaped points are performance scores of elements in , then shaded area is . After getting , we sample some superior performance scores randomly from as the inputs of RRModel, and thus get superior architectures.
MoARR. With the usage of ARR, we develop MoARR algorithm, which deals with our multiobjective NAS problem effectively. Algorithm 1 is the pseudo code of MoARR.
2.4 Fast Evaluation Strategy
In MoARR, CNN code evaluation is very timeconsuming due to the huge training dataset and large number of training epochs. In order to reduce the evaluation cost and thus speed up MoARR, we propose the fast evaluation s
trategy (FES) to quickly estimate the final validation accuracy of CNN architectures in
using only a few training epochs and less training dataset.FES. The core idea of FES is to use the following three types of characteristic attributes of an architecture to predict its , which is the validation accuracy of obtained after is fully trained using all training dataset:

Model complexity, including FLOPs and parameter amount of ;

Structural attributes, including Density, layer number and reduction cell number of , where Density() is the edge number divided by the dot number in DAG of ;

Quick evaluation scores, including top1 accuracy, top5 accuracy and loss value obtained by after training for 12 epochs using 1% training dataset.
This attributebased prediction method comes from the Easy Stop Strategy (ESS) DBLP:conf/cvpr/ZhongYWSL18/BlockQNN , which is successfully applied to CNN architectures, that are stacked with many copies of a discovered cell, to reduce the evaluation cost. In FES, we make some adjustments to ESS, making this method suitable for our more complex CNN architectures that are stacked with diversified cell structures. Our adjustments fall into two categories. Their reasons are as follows: (1) Replacing cell attributes to architecture ones. ESS uses the complexity and structural attributes of the used cell to stand for that of the whole architecture, and FES can only use that of the whole architecture to achieve the same description. (2) Involving more attributes. Our architectures are more flexible, and we use less training dataset for fast evaluation, thus we need more complexity features and structural attributes to distinguish different architectures, and need more performance features to substitute for .
To make this prediction method work for the architectures in our search space , in FES we sample some architectures from to study the relationship between these attributes and . And finally we build a MLP regression model, denoted by , to predict according to the above 8 attribute values of . In MoARR, we utilize the obtained to efficiently estimate the of candidate architectures recommended by ARR selection strategy. With the usage of , we only cost about 20 seconds to estimate the final accuracy of an architecture , which greatly reduces our evaluation cost and speeds up our algorithm.
3 Experiments and Results
In this section, we test MoARR on common image classification benchmarks, and show its effectiveness compared to other stateoftheart models (Section 3.1 to Section 3.3). We use CIFAR10 DBLP:/Cifar10 dataset for the main search and evaluation phase, and do transferability experiments on the wellknown benchmarks using the architecture found on CIFAR10. In addition, we conduct an ablation study which asserts the role of MoARR in discovering novel architectures (Section 3.4).
3.1 Details of Architecture Search on CIFAR10
Using MoARR, we search on CIFAR10 DBLP:/Cifar10 for better lightweight CNN architectures. Considering that CNN architectures discovered by existing NAS work are generally have more than 2.5M parameters DBLP:conf/cvpr/DongY19/GDAS , we set the upper limit of parameter amount in equation 2 to 2.5M. During the search phase, we select 50 architectures to evaluate for each iteration, and train each selected network for a fixed 12 epochs on CIFAR10 using the FES as described in Section 2.4. Following DBLP:conf/cvpr/ZhongYWSL18/BlockQNN , we set the batch size to 256 and use Adam optimizer with =0.9, =0.999, =. The initial learning rate is set to 0.001 and is reduced with a factor of 0.2 every 2 epochs. Our MoARR takes about six hours to accomplish the search phase on a single NVIDIA Tesla V100 GPU.
After the search phase, we extract four excellent lightweight architectures from the Pareto boundary obtained by MoARR: (1) Two architectures with the highest accuracy score in , which are represented by MoARRSmall1 and MoARRSmall2; (2) Two architectures with the highest accuracy score in , which are represented by MoARRTiny1 and MoARRTiny2. We then use these architectures (whose encodings are shown in the supplementary material) to test the effectiveness of MoARR.
3.2 CIFAR10 Evaluation Results
We train the MoARRSmall and MoARRTiny networks for 600 epochs using a batch size of 96 and SGD optimizer with nesterovmomentum and a weight decay of
. We start the learning rate of 0.025 and reduce it to 0 with the cosine learning rate scheduler. For regularization we use cutout DBLP:journals/corr/abs170804552/CutOut , scheduled droppath DBLP:conf/iclr/LarssonMS17/DropPath , auxiliary towers DBLP:conf/cvpr/SzegedyLJSRAEVR15/Tower and randomly cropping. All the training parameters are the same as DARTS DBLP:conf/iclr/LiuSY19/DARTS . Table 3.2 shows the performance of MoARR architectures compared to other stateoftheart NAS approaches. From the experimental results, our MoARRSmall1 networkoutperforms previous NAS methods by a large margin, with an error rate of 2.61% and only 2.3M parameters. Moreover, MoARRSmall1 could reach 1.9% error rate under the settings of ASAP DBLP:journals/corr/abs190404123/ASAP , where 1500 epochs and more regularization methods are applied. Our smallest network variant, MoARRTiny2, outperforms most previous models on CIFAR10, while having much less parameters, i.e., it contains 33.3% to 94.4% fewer parameters than previous models. With the consideration of more flexible and diversified structures, we discover CNN models with less parameters and higher accuracy using existing cell structures, which demonstrates the significance of cell diversity and structural flexibility. In addition, our MoARR is the second fastest among the

Venue 

Params 


PNAS DBLP:conf/eccv/LiuZNSHLFYHM18/PNAS  ECCV18  3.41  3.2M  150  
AmoebaNetA DBLP:conf/aaai/RealAHL19/AmoebaNet  AAAI19  3.12  3.1M  3150  
DARTS (1st) DBLP:conf/iclr/LiuSY19/DARTS  ICLR19  3.00  3.3M  1.5  
CARSA DBLP:journals/corr/abs190904977/CARS  CVPR20  3.00  2.4M  0.4  
NAONet DBLP:conf/nips/LuoTQCL18/NAONet  NeurIPS18  2.98  28.6M  200  
GDAS DBLP:conf/cvpr/DongY19/GDAS  CVPR19  2.93  3.4M  0.21  
ENAS DBLP:conf/icml/PhamGZLD18/ENAS  ICML18  2.89  4.6M  0.5  
RENAS DBLP:conf/cvpr/ChenMZXHMW19/RENAS  CVPR19  2.88  3.5M  6  
CARSB DBLP:journals/corr/abs190904977/CARS  CVPR20  2.87  2.7M  0.4  
GDAS (FRC) DBLP:conf/cvpr/DongY19/GDAS  CVPR19  2.82  2.5M  0.17  
DARTS (2nd) DBLP:conf/iclr/LiuSY19/DARTS  ICLR19  2.76  3.3M  4  
DATA DBLP:conf/nips/ChangZGMXP19/DATA  NeurIPS19  2.70  3.2M  1  
NasNetA DBLP:conf/cvpr/ZophVSL18/NasNet  CVPR18  2.65  3.3M  1800  
CARSI DBLP:journals/corr/abs190904977/CARS  CVPR20  2.62  3.6M  0.4  
ASAP DBLP:journals/corr/abs190404123/ASAP  ArXiv19  1.99  2.5M  0.5  
MoARRSmall1  NeurIPS20  2.61  2.3M  0.27  
MoARRSmall2  NeurIPS20  2.69  2.2M  0.27  
MoARRTiny1  NeurIPS20  2.74  1.9M  0.27  
MoARRTiny2  NeurIPS20  2.76  1.6M  0.27 
NAS methods listed in Table 3.2, next to GDAS DBLP:conf/cvpr/DongY19/GDAS .
3.3 Transferability Evaluation
Using the architecture found by MoARR searched on CIFAR10, we preform transferability tests on 6 popular classification benchmarks.
ImageNet Results.
Our ImageNet network is composed of two initial stem cells for downscaling and a new variant of MoARRSmall1 architecture for feature extraction and image classification. Following previous work, in the new variant of MoARRSmall1, we set
to 184 and reduce 1 normal cell in each stage (stage 1 to 3), so the total number of network FLOPs is below 600M. Figure 3 from supplementary material shows our ImageNet architecture. We train the network for 250



Params 




DARTS DBLP:conf/iclr/LiuSY19/DARTS  26.9  9.0  4.9M  595M  1.5  
ShuffleNet (2x) DBLP:conf/cvpr/ZhangZLS18/ShuffleNet  26.3    5.4M  524M    
GDAS DBLP:conf/cvpr/DongY19/GDAS  26.0  8.5  5.3M  581M  0.21  
NasNetA DBLP:conf/cvpr/ZophVSL18/NasNet  26.0  8.4  5.3M  564M  1800  
PNAS DBLP:conf/eccv/LiuZNSHLFYHM18/PNAS  25.8  8.1  5.1M  588M  150  
ENAS DBLP:conf/icml/PhamGZLD18/ENAS  25.7  8.1  5.1M  523M  0.5  
DATA DBLP:conf/nips/ChangZGMXP19/DATA  25.5  8.3  4.9M  568M  1  
AmoebaNetA DBLP:conf/aaai/RealAHL19/AmoebaNet  25.5  8.0  5.1M  555M  3150  
CARS DBLP:journals/corr/abs190904977/CARS  24.8  7.5  5.1M  591M  0.4  
ASAP DBLP:journals/corr/abs190404123/ASAP  24.4    5.1M    0.2  
RENAS DBLP:conf/cvpr/ChenMZXHMW19/RENAS  24.3  7.4  5.4M  580M  6  
MoARRSmall1  24.0  7.3  4.9M  546M  0.27 
epochs with one cycle of the power cosine learning rate DBLP:journals/corr/abs190309900/CosineLR and a nesterovmomentum optimizer. Results are shown in Table 3.3. We can observe from Table 3.3 that MoARR’s transferability results on ImageNet are highly competitive, outperforming all previous NAS models.
Additional Results. We further test MoARR transferability abilities on 5 smaller datasets: CIFAR100 DBLP:/Cifar10 , FashionMNIST DBLP:journals/corr/abs170807747/FashionMnist , SVHN DBLP:SVHN , Freiburg DBLP:journals/corr/JundAEB16/Freiburg and CINIC10 DBLP:journals/corr/abs181003505/CINIC10 . We choose to use the MoARRSmall1 architecture, with similar training scheme as DBLP:journals/corr/abs190404123/ASAP . Table 4 shows the performance of our model compared to other NAS methods. On FashionMNIST, MoARRSmall1 surpasses the next top architecture by 0.04%, achieving the second highest reported score on FashionMNIST, second only to DBLP:conf/iclr/LiuSY19/DARTS . On CIFAR100, Freiburg and CINIC10, MoARRSmall1 surpasses all the other 6 architectures, achieving the lowest test errors.










PNAS DBLP:conf/eccv/LiuZNSHLFYHM18/PNAS  15.9  3.72  1.83  12.3  7.03  3.2M  150  
AmoebaNetA DBLP:conf/aaai/RealAHL19/AmoebaNet  15.9  3.8  1.93  11.8  7.18  3.2M  3150  
NasNet DBLP:conf/cvpr/ZophVSL18/NasNet  15.8  3.71  1.96  13.4  6.93  3.3M  1800  
NAONet DBLP:conf/nips/LuoTQCL18/NAONet  15.7          10.6M  200  
DARTS DBLP:conf/iclr/LiuSY19/DARTS  15.7  3.68  1.95  10.8  6.88  3.4M  4  
ASAP DBLP:journals/corr/abs190404123/ASAP  15.6  3.73  1.81  10.7  6.83  2.5M  0.2  
MoARRSmall1  14.3  3.69  1.74  7.27  6.21  2.3M  0.27 
3.4 The Importance of MoARR
In this part, we analyze the importance of MoARR. We examine whether MoARR is actually capable of finding good CNN architectures, or whether it is the design of our new search space that leads to MoARR’s strong empirical performance.
Comparing to Guided Random Search. We uniformly sample a CNN code from our search space, and build the CNN according to it, then we train this random model to convergence using the same settings mentioned in Section 3.2. The random CNN model has 3.5M parameters and achieves the test error of 2.89% on CIFAR10 (using the setting of DBLP:journals/corr/abs190404123/ASAP yields 2.37% error rate), which is far worse than MoARRSmall1’s 2.61% and 1.90%. It shows that our search space contains not only excellent lightweight architectures, but also many architectures with poor performance. Thus, efficient search strategy is necessary for our MOO NAS problem. The ARR selection strategy designed in MoARR can recommend good CNN codes with less parameters by analyzing the known performance information, which is effective.
Comparing to Evolutionary MultiObjective Search. In addition to random search, we compare with the classic evolutionary multiobjective method RVEA* DBLP:journals/tec/ChengJOS16/MO16 . We set the population size to 50, and use RVEA* instead to deal with our multiobjective NAS problem. Figure 3.4 reports the performance scores of architectures that are evaluated by RVEA* or MoARR in five generations. We can observe that our MoARR evaluates much fewer useless architectures, and optimizes more quickly than RVEA*. Compared with the evolutionary method, our ARR selection strategy can recommend better architectures by utilizing potential relations learned from historical information. Our optimization process is more efficient and thus reduce the evaluation cost, which is more suitable to deal with the expensive multiobjective NAS problems, coinciding with the discussions in Section 1.
4 Related Work
NAS is a popular and important research topic in deep learning. Many effective algorithms have been proposed to tackle this problem. Majority of them
DBLP:conf/cvpr/DongY19/GDAS ; DBLP:conf/nips/NaymanNRFJZ19/XNAS ; DBLP:conf/cvpr/ChenMZXHMW19/RENAS ; DBLP:conf/iclr/LiuSY19/DARTS ; DBLP:conf/icml/PhamGZLD18/ENAS ; DBLP:conf/aaai/RealAHL19/AmoebaNet adopt the idea of micro search, which centers on learning cell structures and designs a neural architecture by stacking many copies of the discovered cells, and the minority are macro search methods DBLP:conf/icml/PhamGZLD18/ENAS ; DBLP:conf/iclr/BakerGNR17/MacroICLR17 ; DBLP:conf/aaai/CaiCZYW18/MacroAAAI18 ; DBLP:conf/iclr/BrockLRW18/MacroICLR18 , which directly discover the entire neural networks. The former ones greatly reduce the computation cost but may miss some good architectures due to the inflexible network structure used by them, and the latter ones consider more flexible structures but are incapable of finding good architectures within short time due to the huge search space. In this paper, we propose to construct more flexible network structures utilizing good cell structures discovered by previous work, and thus efficiently search huger space for better CNN architectures. Our idea combines the merits of two methods and achieves better results.More recently, with the increasing needs of deploying highquality deep neural networks on realworld devices, multiple objectives are considered in NAS for real applications. Some works DBLP:conf/cvpr/TanCPVSHL19/MNasNet ; DBLP:conf/cvpr/WuDZWSWTVJK19/MOtoSO1 ; DBLP:conf/iclr/CaiZH19/MOtoSO2
tried to converted the multiobjective NAS tasks into the singleobjective ones, and utilized the existing singleobjective search methods, such as reinforcement learning
DBLP:conf/cvpr/ZophVSL18/NasNet ; DBLP:conf/icml/PhamGZLD18/ENAS , to deal with them. However, the weights in the single objective function are hard to determine, besides, the dimensional disunity of multiple objectives may result in poor robustness of single objective optimization. In this paper, we design MoARR to directly optimize multiple objectives, and thus avoid these problems.5 Conclusion and Future Works
In this paper, we propose MoARR for finding good lightweight CNN architectures. We construct more flexible and diversified network architectures using existing cell structures, and adaptively learn CNN recommendation model utilizing the performance feedback for efficiently optimizing architectures. Experimental results show that our MoARR can discover more powerful and lightweight CNN model compared with the stateofthearts, which demonstrates the importance of structural diversity and effectiveness of our optimization method. Our cell reusing idea and the multiobjective NAS optimization method are not only applicable to CNN but also other kinds of nerural networks. In the future works, we will further explore more various kind of network structures such as GNN and RNN, and improve the efficiency of MoARR.
Broader Impact
To the best of our knowledge, we believe our work could benefit the society in areas that require image classification task. More specifically, our work could help to quickly generate high capacity models by utilizing current existing SOTA models when new problem come out and the image datasets are fresh. For example, to quickly generate robust chest CT image classification models after COVID19 break out. Moreover, our lightweight model could deploy in light embedding devices, which make the technique more accessible for public. However, due to the possible system failure (e.g. misclassification), there may exist problems associated with public safety. For example, falsely classified products could enter the market which could cause serious problem, (e.g. unqualified medicine and agricultural products) and misclassified medical images may do harm to both the patients and the society. Such problem is hard to avoid due to the limited accuracy in current SOTA models as well as the unavoidable data quality problem, and we strongly hold that the model should be carefully adjusted and exhaustively tested (sometimes necessary manual assistance should be required) before putting into social practice.
References
 (1) Bowen Baker, Otkrist Gupta, Nikhil Naik, and Ramesh Raskar. Designing neural network architectures using reinforcement learning. In 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 2426, 2017, Conference Track Proceedings, 2017.
 (2) Andrew Brock, Theodore Lim, James M. Ritchie, and Nick Weston. SMASH: oneshot model architecture search through hypernetworks. In 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30  May 3, 2018, Conference Track Proceedings, 2018.

(3)
Han Cai, Tianyao Chen, Weinan Zhang, Yong Yu, and Jun Wang.
Efficient architecture search by network transformation.
In
Proceedings of the ThirtySecond AAAI Conference on Artificial Intelligence, (AAAI18), the 30th innovative Applications of Artificial Intelligence (IAAI18), and the 8th AAAI Symposium on Educational Advances in Artificial Intelligence (EAAI18), New Orleans, Louisiana, USA, February 27, 2018
, pages 2787–2794, 2018.  (4) Han Cai, Ligeng Zhu, and Song Han. Proxylessnas: Direct neural architecture search on target task and hardware. In 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 69, 2019, 2019.
 (5) Jianlong Chang, Xinbang Zhang, Yiwen Guo, Gaofeng Meng, Shiming Xiang, and Chunhong Pan. DATA: differentiable architecture approximation. In Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, 814 December 2019, Vancouver, BC, Canada, pages 874–884, 2019.

(6)
Yukang Chen, Gaofeng Meng, Qian Zhang, Shiming Xiang, Chang Huang, Lisen Mu,
and Xinggang Wang.
RENAS: reinforced evolutionary neural architecture search.
In
IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2019, Long Beach, CA, USA, June 1620, 2019
, pages 4787–4796, 2019. 
(7)
Ran Cheng, Yaochu Jin, Markus Olhofer, and Bernhard Sendhoff.
A reference vector guided evolutionary algorithm for manyobjective optimization.
IEEE Trans. Evolutionary Computation
, 20(5):773–791, 2016.  (8) Luke Nicholas Darlow, Elliot J. Crowley, Antreas Antoniou, and Amos J. Storkey. CINIC10 is not imagenet or CIFAR10. CoRR, abs/1810.03505, 2018.
 (9) Kalyanmoy Deb, Samir Agrawal, Amrit Pratap, and T. Meyarivan. A fast and elitist multiobjective genetic algorithm: NSGAII. IEEE Trans. Evolutionary Computation, 6(2):182–197, 2002.
 (10) Kalyanmoy Deb and Himanshu Jain. An evolutionary manyobjective optimization algorithm using referencepointbased nondominated sorting approach, part I: solving problems with box constraints. IEEE Trans. Evolutionary Computation, 18(4):577–601, 2014.
 (11) Terrance Devries and Graham W. Taylor. Improved regularization of convolutional neural networks with cutout. CoRR, abs/1708.04552, 2017.
 (12) Xuanyi Dong and Yi Yang. Searching for a robust neural architecture in four GPU hours. In IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2019, Long Beach, CA, USA, June 1620, 2019, pages 1761–1770, 2019.
 (13) Andrew Hundt, Varun Jain, and Gregory D. Hager. sharpdarts: Faster and more accurate differentiable architecture search. CoRR, abs/1903.09900, 2019.
 (14) Philipp Jund, Nichola Abdo, Andreas Eitel, and Wolfram Burgard. The freiburg groceries dataset. CoRR, abs/1611.05799, 2016.
 (15) Alex Krizhevsky and Geoffrey Hinton. Learning multiple layers of features from tiny images. Technical report, University of Toronto, 2009.
 (16) Gustav Larsson, Michael Maire, and Gregory Shakhnarovich. Fractalnet: Ultradeep neural networks without residuals. In 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 2426, 2017, Conference Track Proceedings, 2017.
 (17) Chenxi Liu, Barret Zoph, Maxim Neumann, Jonathon Shlens, Wei Hua, LiJia Li, Li FeiFei, Alan L. Yuille, Jonathan Huang, and Kevin Murphy. Progressive neural architecture search. In Computer Vision  ECCV 2018  15th European Conference, Munich, Germany, September 814, 2018, Proceedings, Part I, pages 19–35, 2018.
 (18) Hanxiao Liu, Karen Simonyan, and Yiming Yang. DARTS: differentiable architecture search. In 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 69, 2019, 2019.
 (19) Renqian Luo, Fei Tian, Tao Qin, Enhong Chen, and TieYan Liu. Neural architecture optimization. In Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, NeurIPS 2018, 38 December 2018, Montréal, Canada, pages 7827–7838, 2018.
 (20) Niv Nayman, Asaf Noy, Tal Ridnik, Itamar Friedman, Rong Jin, and Lihi ZelnikManor. XNAS: neural architecture search with expert advice. In Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, 814 December 2019, Vancouver, BC, Canada, pages 1975–1985, 2019.
 (21) Yuval Netzer, Tao Wang, Adam Coates, Alessandro Bissacco, Bo Wu, and Andrew Y. Ng. Reading digits in natural images with unsupervised feature learning. In NIPS Workshop on Deep Learning and Unsupervised Feature Learning 2011, 2011.
 (22) Asaf Noy, Niv Nayman, Tal Ridnik, Nadav Zamir, Sivan Doveh, Itamar Friedman, Raja Giryes, and Lihi ZelnikManor. ASAP: architecture search, anneal and prune. CoRR, abs/1904.04123, 2019.

(23)
Hieu Pham, Melody Y. Guan, Barret Zoph, Quoc V. Le, and Jeff Dean.
Efficient neural architecture search via parameter sharing.
In
Proceedings of the 35th International Conference on Machine Learning, ICML 2018, Stockholmsmässan, Stockholm, Sweden, July 1015, 2018
, pages 4092–4101, 2018.  (24) BoYang Qu and Ponnuthurai N. Suganthan. Multiobjective evolutionary algorithms based on the summation of normalized objectives and diversified selection. Inf. Sci., 180(17):3170–3181, 2010.
 (25) Esteban Real, Alok Aggarwal, Yanping Huang, and Quoc V. Le. Regularized evolution for image classifier architecture search. In The ThirtyThird AAAI Conference on Artificial Intelligence, AAAI 2019, The ThirtyFirst Innovative Applications of Artificial Intelligence Conference, IAAI 2019, The Ninth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2019, Honolulu, Hawaii, USA, January 27  February 1, 2019, pages 4780–4789, 2019.
 (26) Ozan Sener and Vladlen Koltun. Multitask learning as multiobjective optimization. In Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, NeurIPS 2018, 38 December 2018, Montréal, Canada, pages 525–536, 2018.
 (27) Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott E. Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, and Andrew Rabinovich. Going deeper with convolutions. In IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2015, Boston, MA, USA, June 712, 2015, pages 1–9, 2015.
 (28) Mingxing Tan, Bo Chen, Ruoming Pang, Vijay Vasudevan, Mark Sandler, Andrew Howard, and Quoc V. Le. Mnasnet: Platformaware neural architecture search for mobile. In IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2019, Long Beach, CA, USA, June 1620, 2019, pages 2820–2828, 2019.
 (29) Mingxing Tan and Quoc V. Le. Efficientnet: Rethinking model scaling for convolutional neural networks. In Proceedings of the 36th International Conference on Machine Learning, ICML 2019, 915 June 2019, Long Beach, California, USA, pages 6105–6114, 2019.
 (30) Bichen Wu, Xiaoliang Dai, Peizhao Zhang, Yanghan Wang, Fei Sun, Yiming Wu, Yuandong Tian, Peter Vajda, Yangqing Jia, and Kurt Keutzer. Fbnet: Hardwareaware efficient convnet design via differentiable neural architecture search. In IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2019, Long Beach, CA, USA, June 1620, 2019, pages 10734–10742, 2019.
 (31) Han Xiao, Kashif Rasul, and Roland Vollgraf. Fashionmnist: a novel image dataset for benchmarking machine learning algorithms. CoRR, abs/1708.07747, 2017.
 (32) Zhaohui Yang, Yunhe Wang, Xinghao Chen, Boxin Shi, Chao Xu, Chunjing Xu, Qi Tian, and Chang Xu. CARS: continuous evolution for efficient neural architecture search. In IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2020, 2020.
 (33) Xiangyu Zhang, Xinyu Zhou, Mengxiao Lin, and Jian Sun. Shufflenet: An extremely efficient convolutional neural network for mobile devices. In 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 1822, 2018, pages 6848–6856, 2018.
 (34) Zhao Zhong, Junjie Yan, Wei Wu, Jing Shao, and ChengLin Liu. Practical blockwise neural network architecture generation. In 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 1822, 2018, pages 2423–2432, 2018.
 (35) Barret Zoph, Vijay Vasudevan, Jonathon Shlens, and Quoc V. Le. Learning transferable architectures for scalable image recognition. In 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 1822, 2018, pages 8697–8710, 2018.
Comments
There are no comments yet.