Multi-Objective Neural Architecture Search Based on Diverse Structures and Adaptive Recommendation

by   Chunnan Wang, et al.
Harbin Institute of Technology

The search space of neural architecture search (NAS) for convolutional neural network (CNN) is huge. To reduce searching cost, most NAS algorithms use fixed outer network level structure, and search the repeatable cell structure only. Such kind of fixed architecture performs well when enough cells and channels are used. However, when the architecture becomes more lightweight, the performance decreases significantly. To obtain better lightweight architectures, more flexible and diversified neural architectures are in demand, and more efficient methods should be designed for larger search space. Motivated by this, we propose MoARR algorithm, which utilizes the existing research results and historical information to quickly find architectures that are both lightweight and accurate. We use the discovered high-performance cells to construct network architectures. This method increases the network architecture diversity while also reduces the search space of cell structure design. In addition, we designs a novel multi-objective method to effectively analyze the historical evaluation information, so as to efficiently search for the Pareto optimal architectures with high accuracy and small parameter number. Experimental results show that our MoARR can achieve a powerful and lightweight model (with 1.9 which is better than the state-of-the-arts. The explored architecture is transferable to ImageNet and achieves 76.0 parameters.



There are no comments yet.


page 1

page 2

page 3

page 4


Evolutionary Neural Architecture Search for Image Restoration

Convolutional neural network (CNN) architectures have traditionally been...

MONAS: Multi-Objective Neural Architecture Search using Reinforcement Learning

Recent studies on neural architecture search have shown that automatical...

Dynamic Cell Structure via Recursive-Recurrent Neural Networks

In a recurrent setting, conventional approaches to neural architecture s...

Unchain the Search Space with Hierarchical Differentiable Architecture Search

Differentiable architecture search (DAS) has made great progress in sear...

DC-NAS: Divide-and-Conquer Neural Architecture Search

Most applications demand high-performance deep neural architectures cost...

Efficient Multi-objective Evolutionary 3D Neural Architecture Search for COVID-19 Detection with Chest CT Scans

COVID-19 pandemic has spread globally for months. Due to its long incuba...

Evolving Multi-Resolution Pooling CNN for Monaural Singing Voice Separation

Monaural Singing Voice Separation (MSVS) is a challenging task and has b...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Designing successful hand-crafted convolutional neural networks (CNN) is a laborious task due to the heavy reliance on expert experience and large amount of trials. To reduce the labour of human experts, neural architecture search (NAS) approaches DBLP:conf/cvpr/ZophVSL18/NasNet ; DBLP:conf/nips/NaymanNRFJZ19/XNAS ; DBLP:conf/cvpr/ChenMZXHMW19/RENAS ; DBLP:conf/iclr/LiuSY19/DARTS are proposed to automatically discover effective CNN architectures. The main idea of existing NAS approaches is to define a search space and design a search strategy to find CNN architectures with high performance, e.g., high validation accuracy. Since the search space of CNN is huge DBLP:conf/cvpr/DongY19/GDAS , most NAS algorithms choose to use fixed outer network level structure, as is shown in Figure 1, and search the repeatable cell structure only, so as to reduce searching cost. This kind of fixed structures perform well when enough cells and channels are used. However, when the architecture becomes more lightweight (with less parameters), its accuracy decreases significantly DBLP:conf/cvpr/ZophVSL18/NasNet ; DBLP:conf/nips/NaymanNRFJZ19/XNAS ; DBLP:conf/icml/TanL19/ArcICML19 ; DBLP:journals/corr/abs-1904-04123/ASAP . For example, when reducing the initial channel number from 44 to 36, the number of parameters in ASAP reduces 1.2M, and the test accuracy of ASAP on CIFAR-10 decreases 0.25% DBLP:journals/corr/abs-1904-04123/ASAP .

Obviously, such inflexible structures prevent us from getting CNN with less parameters and higher accuracy. To get better lightweight architectures, we need to consider more flexible outer level structures and more diversified cell structures. MNasNet DBLP:conf/cvpr/TanCPVSHL19/MNasNet also noticed this problem. MNasNet pointed out that the cell structure diversity is significant in the resource-constrained CNN models, and studied more flexible CNN architectures, where each block of cells is allowed to contain different

structures and can repeat for different times. It searched the optimal setting of cell structures and cell numbers of different blocks, and achieved good results. Such solution breaks the traditional inflexible structures, but also has a defect, i.e. the search cost is too high. The search space of one cell is large, let alone that of more cells combined with parameters related to the outer level structures. The huge search space brings MNasNets considserable

Figure 1: Examples of outer network level structures. Cell components are identical in these structures and are repeated to get deeper CNN.

search cost, i.e., MNasNet took hundreds of TPU days to accomplish the search process.

In order to control the size of search space thus to reduce search cost, and at the same time explore more flexible architectures for better models, i.e., Pareto optimal CNN models with high accuracy and small parameter number, in this paper, we put forward the idea of high-performance cell stacking (HCS). That is, to utilize high-performance cells discovered by existing NAS algorithms to construct flexible architectures, as shown in Figure 2.2, and search for the optimal cell stacking method to obtain better lightweight CNN. The introduction of existing high-performance cells, on the one hand, ensures the effectiveness of components, effectively reduces the search cost caused by cell design, and greatly reduces search space to avoid the search of invalid architecture, thus ensuring the search efficiency; on the other hand, increases the cell diversity as well as flexibility of CNN architectures. Our HCS-based search space could make full use of existing research results, and thus make it possible to explore more flexible CNN architectures efficiently, which is superior to existing search spaces.

In addition, in order to efficiently find Pareto optimal architectures that are lightweight and accurate in our newly designed search space, we design a multi-objective optimization (MOO) algorithm, called Multi-Objective Optimization based on Adaptive Reverse Recommendation (MoARR). The idea of MoARR is to avoid selecting worse architectures by effectively analyzing our historical evaluation information, thus reduce evaluation cost and accelerate the optimization. More specifically, MoARR utilizes historical information to study the potential relationship between the parameter quantity, accuracy and the architecture encoding, thus adaptively learn the reverse recommendation model (RRModel) that is capable of selecting the most suitable architecture code according to the target performance. Then, MoARR recommends better architectures to be evaluated under the guidance of RRModel, i.e., inputting higher accuracy and smaller parameter number to RRModel for better architectures. With the increase of the evaluated architectures, RRModel becomes more reliable, and the architectures recommended by it approach to the Pareto Optimality. Using RRModel, MoARR can pertinently optimize architectures, and thus greatly reduce useless architecture evaluations.

Compared with the existing MOO approaches, our MoARR is more suitable for dealing with the MOO NAS problems, where architecture evaluations are expensive and time-consuming. More specifically, the existing approaches for seeking with Pareto-optimal front can be classified into two categories, approaches based on mathematical programming 


and those based on genetic algorithm 

DBLP:journals/tec/ChengJOS16/MO16 ; DBLP:journals/tec/DebJ14/MO14 ; DBLP:journals/isci/QuS10/MO10 ; DBLP:journals/tec/DebAPM02/MO02 . The first class of methods are unable to cope with our black-box MOO NAS problem, where expression and gradient information of two optimization objective are unknown. The genetic methods can deal with the black-box problem, but could evaluate many useless architectures due to the uncertainties brought by many random operations and the neglect of valuable rules provided by historical evaluation information. They may require many samples and generations for good results, which is not suitable for dealing with expensive MOO NAS problems.

We compare MoARR with the classic NAS algorithms (Section 3). Experimental results show that MoARR can find a powerful and lightweight model (with 1.9% error rate and 2.3M parameters) on CIFAR-10 in 6 GPU hours, which outperforms the state-of-the-arts. The explored network architecture is transferable to ImageNet and 5 additional datasets, and achieves good results, e.g., 76.0% top-1 accuracy with only 4.9M parameters on ImageNet.

2 Proposed Approach

To lay out our approach, we first give the specific definition of our research objective (Section 2.1), and define a new search space of NAS (Section 2.2). We then introduce MoARR that views NAS as a multi-objective optimization task, and makes full use of historical evaluation information to obtain high-performance light-weight CNN models (Section 2.3

). In order to accelerate the evaluation and reduce computational cost, we also design the acceleration strategy to use a small number of epochs and a few samples to quickly obtain accuracy scores (Section 

2.4). Figure 2 is our overall framework.

2.1 Target

In this paper, we aim to increase the flexibility and diversity of CNN architectures, so as to obtain lightweight architectures with higher accuracy. Formally, our search target is defined as follows:


where denotes all CNN architecture codes in our new search space, which is described in Section 2.2, denotes the accuracy score, denotes the number of parameters, and is the upper limit of parameter amount. This is a multi-objective optimization task, and our goal is to obtain architectures that provide the best accuracy/parameter amount trade-off.

Figure 2: Overall framework of MoARR.

2.2 Search Space

Structural flexibility and cell diversity are two key points for the design of our search space. To achieve structural flexibility, we make the number of cells and channels in each stage to be adjustable. In this way, we can get architectures with diversified width and height. As for the cell diversity, we allow cells in different

Figure 3: General CIFAR network architecture.

stages to have different structures, and take the existing high-performance cells structures discovered by previous NAS work as available options. Details are as follows.

We provide a general network architecture for our search space, as shown in Figure 2.2. It consists of 5 stages. Stage 1 extracts common low-level features, stages 2

4 down-sample the spatial resolution of the input tensor with a stride of 2, and stage 5 produces the final prediction with a global pooling layer and a fully connected layer. Previous NAS approaches generally choose to use 2 reduction cells in CNN architectures, whereas some

DBLP:conf/icml/PhamGZLD18/ENAS uses 3. In pursuit of a more general search space, we use to decide whether to use the 3rd reduction cell in stage 4. For stages 1, 2, 3, each consists of , , normal cells, where , , are integers, i.e. . Different settings of L/M/N/ lead to different network depths. The width (number of output channels) of the cells in Stage 1 is denoted as , and that of the cells in Stage (2,3,4) is denoted by , where represents the growth ratio of width compared to the previous stage. The name of the i-th normal cell in stage s is denoted as 111The normal cells used in the same stage have the same name., the name of reduction cell used in stage s is denoted as , and the type of global pooling used in Stage 5 is denoted as . The options of , and are shown in Table 1. Therefore, an architecture can be encoded as shown in Figure 2.2. The set of all possible codes is denoted as and is referred to as our search space.

Cell Source
Normal Cell Symbol
Reduction Cell Symbol
DARTS (1st) DBLP:conf/iclr/LiuSY19/DARTS Darts_V1_NC DARTS_V1_RC
DARTS (2nd) DBLP:conf/iclr/LiuSY19/DARTS Darts_V2_NC DARTS_V2_RC
NasNet-A DBLP:conf/cvpr/ZophVSL18/NasNet NasNet_NC NasNet_RC
AmoebaNet-A DBLP:conf/aaai/RealAHL19/AmoebaNet AmoebaNet_NC AmoebaNet_RC
ASAP DBLP:journals/corr/abs-1904-04123/ASAP ASAP_NC ASAP_RC
ShuffleNet DBLP:conf/cvpr/ZhangZLS18/ShuffleNet ShuffleNet_NC ShuffleNet_RC
Global Polling Definition
Global Polling Symbol
Global average polling Avg_GP
Global max polling Max_GP
The average of Avg_GP and Max_GP
Table 1: Options of , and in network architectures. We extract 10 normal cell structures and 10 reduction cell structures from 10 high-performance CNN architectures discovered by previous work, as the components. And we consider 3 classic global polling operations in the final stage.

2.3 MoARR: Multi-Objective Optimization based on Adaptive Reverse Recommendation

Let denote two elements in set . If and , we say that architecture Pareto dominates ( is better than ), denoted as . For elements in set that are not Pareto dominated by other elements in , we call them the Pareto boundary of , denoted as . Then, the Pareto optimal solutions for our multi-objective NAS problem is denoted by .

In MoARR, our target is to quickly optimize the elements in Pareto boundary , where denotes the set of evaluated architectures, and finally obtain . More specifically, we aim at selecting the best possible architectures to evaluate for each iteration, avoiding selecting worse architectures as much as possible, thus accelerate optimization process and reduce evaluation cost. To achieve this goal, we put forward Adaptive Reverse Recommendation (ARR), an architecture selection strategy which is capable of utilizing historical evaluation information of for effective and targeted architecture recommendation, i.e., recommending the most suitable architecture code according to the performance demands. Such performance-oriented architecture selection strategy can greatly reduce useless architecture evaluations and improve the quality of the selected architectures by setting superior performance scores, which is coincident with the goal of MoARR. Besides, ARR avoids the defects of genetic MOO methods mentioned in Section 1, which makes MoARR more suitable for dealing with expensive NAS problem. We further discuss ARR as follows.

ARR. The model, which maps the performance pair (,) to a suitable architecture code that has the closest performance scores, is called the reverse recommendation model (RRModel) in ARR. And the core idea of ARR is to make full use of historical evaluation information = to adaptively build effective RRModel, and then utilize RRModel to select superior architectures directly by setting better performance values.

We note that the construction of effective RRModel is the key point of ARR. A straightforward solution for this task is to utilize to construct a performance-to-code training data

, where the performance scores are considered as input and corresponding codes as the target output. Then use

to train a Multi-Layer Perception (MLP) to obtain RRModel. Note that there may exists different codes with totally the same or very similar accuracy score and parameter amount in

, and the contradictory outputs may mislead the loss function and thus makes RRModel less effective. To eliminate the influence of contradictory values, this solution would only preserve one code for each performance pair in

. However, such operation would also result in two defects: (1) Information loss, the valuable information contained in the deleted records is underutilized; (2) Difficulty in selection, how to select the most suitable code to preserve thus achieve the best recommendation effect is unknown, and many trails should be done to achieve the best results, which is time-consuming.

Figure 4: An example of .

To avoid the above two defects, we propose an auxiliary model-based loss function for RRModel, which helps RRModel adaptively learn the most suitable output values by making full use of all historical information in . Suppose forward evaluation model (FEModel) is capable of mapping the given architecture code to its performance pair, i.e., (accuracy score, parameter amount)222The mappings are opposite in FEModel and RRModel. Then, the new loss function of RRModel is defined as follows:


where is a set of accuracy-parameter performance scores, and denotes the suitable architecture codes recommended by RRModel. Equation 2 measures the differences between the target performance and the performance of the codes recommended by RRModel. It can help RRModel to automatically determine suitable outputs under the guidance of the auxiliary model FEModel. More specifically, we can input enough accuracy-parameter performance scores to RRModel without giving outputs, and RRModel can adjust its outputs adaptively according to the performance feedback provided by FEModel, and thus achieve reasonable recommendation. As for the FEModel, it is unknown since neural architecture evaluation is a black-box. However, we could utilize to construct code-to-performance training data =, and use to train a MLP so as to approximate FEModel. Note that, different from , does not exist contradictory problems. Therefore, with the usage of the new loss function, RRModel can be built automatically and effectively by making full use of all historical data , and the two defectes are avoided.

The next step is to use the obtained RRModel to select superior architectures. Since the target is to optimize = , we need to find the architecture codes that are not Pareto dominated by the evaluated codes . Thus, we should input more competitive performance scores, i.e., performance scores with higher accuracy or lower parameter amount than the scores of , to RRModel. We denote these performance scores as , and its formula is given as follows:


Figure 4 is an example of . Suppose shaped points are performance scores of elements in , then shaded area is . After getting , we sample some superior performance scores randomly from as the inputs of RRModel, and thus get superior architectures.

MoARR. With the usage of ARR, we develop MoARR algorithm, which deals with our multi-objective NAS problem effectively. Algorithm 1 is the pseudo code of MoARR.

2.4 Fast Evaluation Strategy

In MoARR, CNN code evaluation is very time-consuming due to the huge training dataset and large number of training epochs. In order to reduce the evaluation cost and thus speed up MoARR, we propose the fast evaluation s

trategy (FES) to quickly estimate the final validation accuracy of CNN architectures in

using only a few training epochs and less training dataset.

FES. The core idea of FES is to use the following three types of characteristic attributes of an architecture to predict its , which is the validation accuracy of obtained after is fully trained using all training dataset:

  • Model complexity, including FLOPs and parameter amount of ;

  • Structural attributes, including Density, layer number and reduction cell number of , where Density() is the edge number divided by the dot number in DAG of ;

  • Quick evaluation scores, including top-1 accuracy, top-5 accuracy and loss value obtained by after training for 12 epochs using 1% training dataset.

This attribute-based prediction method comes from the Easy Stop Strategy (ESS) DBLP:conf/cvpr/ZhongYWSL18/BlockQNN , which is successfully applied to CNN architectures, that are stacked with many copies of a discovered cell, to reduce the evaluation cost. In FES, we make some adjustments to ESS, making this method suitable for our more complex CNN architectures that are stacked with diversified cell structures. Our adjustments fall into two categories. Their reasons are as follows: (1) Replacing cell attributes to architecture ones. ESS uses the complexity and structural attributes of the used cell to stand for that of the whole architecture, and FES can only use that of the whole architecture to achieve the same description. (2) Involving more attributes. Our architectures are more flexible, and we use less training dataset for fast evaluation, thus we need more complexity features and structural attributes to distinguish different architectures, and need more performance features to substitute for .

To make this prediction method work for the architectures in our search space , in FES we sample some architectures from to study the relationship between these attributes and . And finally we build a MLP regression model, denoted by , to predict according to the above 8 attribute values of . In MoARR, we utilize the obtained to efficiently estimate the of candidate architectures recommended by ARR selection strategy. With the usage of , we only cost about 20 seconds to estimate the final accuracy of an architecture , which greatly reduces our evaluation cost and speeds up our algorithm.

3 Experiments and Results

In this section, we test MoARR on common image classification benchmarks, and show its effectiveness compared to other state-of-the-art models (Section 3.1 to Section 3.3). We use CIFAR-10 DBLP:/Cifar10 dataset for the main search and evaluation phase, and do transferability experiments on the well-known benchmarks using the architecture found on CIFAR-10. In addition, we conduct an ablation study which asserts the role of MoARR in discovering novel architectures (Section 3.4).

3.1 Details of Architecture Search on CIFAR-10

Using MoARR, we search on CIFAR-10 DBLP:/Cifar10 for better lightweight CNN architectures. Considering that CNN architectures discovered by existing NAS work are generally have more than 2.5M parameters DBLP:conf/cvpr/DongY19/GDAS , we set the upper limit of parameter amount in equation 2 to 2.5M. During the search phase, we select 50 architectures to evaluate for each iteration, and train each selected network for a fixed 12 epochs on CIFAR-10 using the FES as described in Section 2.4. Following DBLP:conf/cvpr/ZhongYWSL18/BlockQNN , we set the batch size to 256 and use Adam optimizer with =0.9, =0.999, =. The initial learning rate is set to 0.001 and is reduced with a factor of 0.2 every 2 epochs. Our MoARR takes about six hours to accomplish the search phase on a single NVIDIA Tesla V100 GPU.

After the search phase, we extract four excellent lightweight architectures from the Pareto boundary obtained by MoARR: (1) Two architectures with the highest accuracy score in , which are represented by MoARR-Small1 and MoARR-Small2; (2) Two architectures with the highest accuracy score in , which are represented by MoARR-Tiny1 and MoARR-Tiny2. We then use these architectures (whose encodings are shown in the supplementary material) to test the effectiveness of MoARR.

3.2 CIFAR-10 Evaluation Results

We train the MoARR-Small and MoARR-Tiny networks for 600 epochs using a batch size of 96 and SGD optimizer with nesterov-momentum and a weight decay of

. We start the learning rate of 0.025 and reduce it to 0 with the cosine learning rate scheduler. For regularization we use cutout DBLP:journals/corr/abs-1708-04552/CutOut , scheduled drop-path DBLP:conf/iclr/LarssonMS17/DropPath , auxiliary towers DBLP:conf/cvpr/SzegedyLJSRAEVR15/Tower and randomly cropping. All the training parameters are the same as DARTS DBLP:conf/iclr/LiuSY19/DARTS . Table 3.2 shows the performance of MoARR architectures compared to other state-of-the-art NAS approaches. From the experimental results, our MoARR-Small1 network

outperforms previous NAS methods by a large margin, with an error rate of 2.61% and only 2.3M parameters. Moreover, MoARR-Small1 could reach 1.9% error rate under the settings of ASAP DBLP:journals/corr/abs-1904-04123/ASAP , where 1500 epochs and more regularization methods are applied. Our smallest network variant, MoARR-Tiny2, outperforms most previous models on CIFAR-10, while having much less parameters, i.e., it contains 33.3% to 94.4% fewer parameters than previous models. With the consideration of more flexible and diversified structures, we discover CNN models with less parameters and higher accuracy using existing cell structures, which demonstrates the significance of cell diversity and structural flexibility. In addition, our MoARR is the second fastest among the

Search Cost
(GPU days)
PNAS DBLP:conf/eccv/LiuZNSHLFYHM18/PNAS ECCV18 3.41 3.2M 150
AmoebaNet-A DBLP:conf/aaai/RealAHL19/AmoebaNet AAAI19 3.12 3.1M 3150
DARTS (1st) DBLP:conf/iclr/LiuSY19/DARTS ICLR19 3.00 3.3M 1.5
CARS-A DBLP:journals/corr/abs-1909-04977/CARS CVPR20 3.00 2.4M 0.4
NAONet DBLP:conf/nips/LuoTQCL18/NAONet NeurIPS18 2.98 28.6M 200
GDAS DBLP:conf/cvpr/DongY19/GDAS CVPR19 2.93 3.4M 0.21
ENAS DBLP:conf/icml/PhamGZLD18/ENAS ICML18 2.89 4.6M 0.5
RENAS DBLP:conf/cvpr/ChenMZXHMW19/RENAS CVPR19 2.88 3.5M 6
CARS-B DBLP:journals/corr/abs-1909-04977/CARS CVPR20 2.87 2.7M 0.4
GDAS (FRC) DBLP:conf/cvpr/DongY19/GDAS CVPR19 2.82 2.5M 0.17
DARTS (2nd) DBLP:conf/iclr/LiuSY19/DARTS ICLR19 2.76 3.3M 4
DATA DBLP:conf/nips/ChangZGMXP19/DATA NeurIPS19 2.70 3.2M 1
NasNet-A DBLP:conf/cvpr/ZophVSL18/NasNet CVPR18 2.65 3.3M 1800
CARS-I DBLP:journals/corr/abs-1909-04977/CARS CVPR20 2.62 3.6M 0.4
ASAP DBLP:journals/corr/abs-1904-04123/ASAP ArXiv19 1.99 2.5M 0.5
MoARR-Small1 NeurIPS20 2.61 2.3M 0.27
MoARR-Small2 NeurIPS20 2.69 2.2M 0.27
MoARR-Tiny1 NeurIPS20 2.74 1.9M 0.27
MoARR-Tiny2 NeurIPS20 2.76 1.6M 0.27
Table 2: Test error (%) of MoARR compared to state-of-the-art methods on CIFAR-10.

NAS methods listed in Table 3.2, next to GDAS DBLP:conf/cvpr/DongY19/GDAS .

3.3 Transferability Evaluation

Using the architecture found by MoARR searched on CIFAR-10, we preform transferability tests on 6 popular classification benchmarks.

ImageNet Results.

Our ImageNet network is composed of two initial stem cells for downscaling and a new variant of MoARR-Small1 architecture for feature extraction and image classification. Following previous work, in the new variant of MoARR-Small1, we set

to 184 and reduce 1 normal cell in each stage (stage 1 to 3), so the total number of network FLOPs is below 600M. Figure 3 from supplementary material shows our ImageNet architecture. We train the network for 250

Top-1 Test
Error (%)
Top-5 Test
Error (%)
Search Cost
(GPU days)
DARTS DBLP:conf/iclr/LiuSY19/DARTS 26.9 9.0 4.9M 595M 1.5
ShuffleNet (2x) DBLP:conf/cvpr/ZhangZLS18/ShuffleNet 26.3 - 5.4M 524M -
GDAS DBLP:conf/cvpr/DongY19/GDAS 26.0 8.5 5.3M 581M 0.21
NasNet-A DBLP:conf/cvpr/ZophVSL18/NasNet 26.0 8.4 5.3M 564M 1800
PNAS DBLP:conf/eccv/LiuZNSHLFYHM18/PNAS 25.8 8.1 5.1M 588M 150
ENAS DBLP:conf/icml/PhamGZLD18/ENAS 25.7 8.1 5.1M 523M 0.5
DATA DBLP:conf/nips/ChangZGMXP19/DATA 25.5 8.3 4.9M 568M 1
AmoebaNet-A DBLP:conf/aaai/RealAHL19/AmoebaNet 25.5 8.0 5.1M 555M 3150
CARS DBLP:journals/corr/abs-1909-04977/CARS 24.8 7.5 5.1M 591M 0.4
ASAP DBLP:journals/corr/abs-1904-04123/ASAP 24.4 - 5.1M - 0.2
RENAS DBLP:conf/cvpr/ChenMZXHMW19/RENAS 24.3 7.4 5.4M 580M 6
MoARR-Small1 24.0 7.3 4.9M 546M 0.27
Table 3: Transferability classification error on ImageNet dataset.

epochs with one cycle of the power cosine learning rate DBLP:journals/corr/abs-1903-09900/CosineLR and a nesterov-momentum optimizer. Results are shown in Table 3.3. We can observe from Table 3.3 that MoARR’s transferability results on ImageNet are highly competitive, outperforming all previous NAS models.

Additional Results. We further test MoARR transferability abilities on 5 smaller datasets: CIFAR-100 DBLP:/Cifar10 , Fashion-MNIST DBLP:journals/corr/abs-1708-07747/FashionMnist , SVHN DBLP:SVHN , Freiburg DBLP:journals/corr/JundAEB16/Freiburg and CINIC10 DBLP:journals/corr/abs-1810-03505/CINIC10 . We choose to use the MoARR-Small1 architecture, with similar training scheme as DBLP:journals/corr/abs-1904-04123/ASAP . Table 4 shows the performance of our model compared to other NAS methods. On Fashion-MNIST, MoARR-Small1 surpasses the next top architecture by 0.04%, achieving the second highest reported score on Fashion-MNIST, second only to DBLP:conf/iclr/LiuSY19/DARTS . On CIFAR-100, Freiburg and CINIC10, MoARR-Small1 surpasses all the other 6 architectures, achieving the lowest test errors.

Test Error (%)
Test Error (%)
Test Error (%)
Test Error (%)
Test Error (%)
Search Cost
(GPU days)
PNAS DBLP:conf/eccv/LiuZNSHLFYHM18/PNAS 15.9 3.72 1.83 12.3 7.03 3.2M 150
AmoebaNet-A DBLP:conf/aaai/RealAHL19/AmoebaNet 15.9 3.8 1.93 11.8 7.18 3.2M 3150
NasNet DBLP:conf/cvpr/ZophVSL18/NasNet 15.8 3.71 1.96 13.4 6.93 3.3M 1800
NAONet DBLP:conf/nips/LuoTQCL18/NAONet 15.7 - - - - 10.6M 200
DARTS DBLP:conf/iclr/LiuSY19/DARTS 15.7 3.68 1.95 10.8 6.88 3.4M 4
ASAP DBLP:journals/corr/abs-1904-04123/ASAP 15.6 3.73 1.81 10.7 6.83 2.5M 0.2
MoARR-Small1 14.3 3.69 1.74 7.27 6.21 2.3M 0.27
Table 4: Transferability classification error on 5 datasets. Results marked with are taken from DBLP:journals/corr/abs-1904-04123/ASAP .

3.4 The Importance of MoARR

In this part, we analyze the importance of MoARR. We examine whether MoARR is actually capable of finding good CNN architectures, or whether it is the design of our new search space that leads to MoARR’s strong empirical performance.

Comparing to Guided Random Search. We uniformly sample a CNN code from our search space, and build the CNN according to it, then we train this random model to convergence using the same settings mentioned in Section 3.2. The random CNN model has 3.5M parameters and achieves the test error of 2.89% on CIFAR-10 (using the setting of DBLP:journals/corr/abs-1904-04123/ASAP yields 2.37% error rate), which is far worse than MoARR-Small1’s 2.61% and 1.90%. It shows that our search space contains not only excellent lightweight architectures, but also many architectures with poor performance. Thus, efficient search strategy is necessary for our MOO NAS problem. The ARR selection strategy designed in MoARR can recommend good CNN codes with less parameters by analyzing the known performance information, which is effective.

Comparing to Evolutionary Multi-Objective Search. In addition to random search, we compare with the classic evolutionary multi-objective method RVEA* DBLP:journals/tec/ChengJOS16/MO16 . We set the population size to 50, and use RVEA* instead to deal with our multi-objective NAS problem. Figure 3.4 reports the performance scores of architectures that are evaluated by RVEA* or MoARR in five generations. We can observe that our MoARR evaluates much fewer useless architectures, and optimizes more quickly than RVEA*. Compared with the evolutionary method, our ARR selection strategy can recommend better architectures by utilizing potential relations learned from historical information. Our optimization process is more efficient and thus reduce the evaluation cost, which is more suitable to deal with the expensive multi-objective NAS problems, coinciding with the discussions in Section 1. Figure 5: Performance of architectures evaluated by MoARR and RVEA*. Two algorithms use the same initial architectures, which are randomly-selected, in Round 1.

4 Related Work

NAS is a popular and important research topic in deep learning. Many effective algorithms have been proposed to tackle this problem. Majority of them 

DBLP:conf/cvpr/DongY19/GDAS ; DBLP:conf/nips/NaymanNRFJZ19/XNAS ; DBLP:conf/cvpr/ChenMZXHMW19/RENAS ; DBLP:conf/iclr/LiuSY19/DARTS ; DBLP:conf/icml/PhamGZLD18/ENAS ; DBLP:conf/aaai/RealAHL19/AmoebaNet adopt the idea of micro search, which centers on learning cell structures and designs a neural architecture by stacking many copies of the discovered cells, and the minority are macro search methods DBLP:conf/icml/PhamGZLD18/ENAS ; DBLP:conf/iclr/BakerGNR17/MacroICLR17 ; DBLP:conf/aaai/CaiCZYW18/MacroAAAI18 ; DBLP:conf/iclr/BrockLRW18/MacroICLR18 , which directly discover the entire neural networks. The former ones greatly reduce the computation cost but may miss some good architectures due to the inflexible network structure used by them, and the latter ones consider more flexible structures but are incapable of finding good architectures within short time due to the huge search space. In this paper, we propose to construct more flexible network structures utilizing good cell structures discovered by previous work, and thus efficiently search huger space for better CNN architectures. Our idea combines the merits of two methods and achieves better results.

More recently, with the increasing needs of deploying high-quality deep neural networks on real-world devices, multiple objectives are considered in NAS for real applications. Some works DBLP:conf/cvpr/TanCPVSHL19/MNasNet ; DBLP:conf/cvpr/WuDZWSWTVJK19/MOtoSO1 ; DBLP:conf/iclr/CaiZH19/MOtoSO2

tried to converted the multi-objective NAS tasks into the single-objective ones, and utilized the existing single-objective search methods, such as reinforcement learning 

DBLP:conf/cvpr/ZophVSL18/NasNet ; DBLP:conf/icml/PhamGZLD18/ENAS , to deal with them. However, the weights in the single objective function are hard to determine, besides, the dimensional disunity of multiple objectives may result in poor robustness of single objective optimization. In this paper, we design MoARR to directly optimize multiple objectives, and thus avoid these problems.

5 Conclusion and Future Works

In this paper, we propose MoARR for finding good lightweight CNN architectures. We construct more flexible and diversified network architectures using existing cell structures, and adaptively learn CNN recommendation model utilizing the performance feedback for efficiently optimizing architectures. Experimental results show that our MoARR can discover more powerful and lightweight CNN model compared with the state-of-the-arts, which demonstrates the importance of structural diversity and effectiveness of our optimization method. Our cell reusing idea and the multi-objective NAS optimization method are not only applicable to CNN but also other kinds of nerural networks. In the future works, we will further explore more various kind of network structures such as GNN and RNN, and improve the efficiency of MoARR.

Broader Impact

To the best of our knowledge, we believe our work could benefit the society in areas that require image classification task. More specifically, our work could help to quickly generate high capacity models by utilizing current existing SOTA models when new problem come out and the image datasets are fresh. For example, to quickly generate robust chest CT image classification models after COVID-19 break out. Moreover, our lightweight model could deploy in light embedding devices, which make the technique more accessible for public. However, due to the possible system failure (e.g. misclassification), there may exist problems associated with public safety. For example, falsely classified products could enter the market which could cause serious problem, (e.g. unqualified medicine and agricultural products) and misclassified medical images may do harm to both the patients and the society. Such problem is hard to avoid due to the limited accuracy in current SOTA models as well as the unavoidable data quality problem, and we strongly hold that the model should be carefully adjusted and exhaustively tested (sometimes necessary manual assistance should be required) before putting into social practice.


  • (1) Bowen Baker, Otkrist Gupta, Nikhil Naik, and Ramesh Raskar. Designing neural network architectures using reinforcement learning. In 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24-26, 2017, Conference Track Proceedings, 2017.
  • (2) Andrew Brock, Theodore Lim, James M. Ritchie, and Nick Weston. SMASH: one-shot model architecture search through hypernetworks. In 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings, 2018.
  • (3) Han Cai, Tianyao Chen, Weinan Zhang, Yong Yu, and Jun Wang. Efficient architecture search by network transformation. In

    Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, (AAAI-18), the 30th innovative Applications of Artificial Intelligence (IAAI-18), and the 8th AAAI Symposium on Educational Advances in Artificial Intelligence (EAAI-18), New Orleans, Louisiana, USA, February 2-7, 2018

    , pages 2787–2794, 2018.
  • (4) Han Cai, Ligeng Zhu, and Song Han. Proxylessnas: Direct neural architecture search on target task and hardware. In 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 6-9, 2019, 2019.
  • (5) Jianlong Chang, Xinbang Zhang, Yiwen Guo, Gaofeng Meng, Shiming Xiang, and Chunhong Pan. DATA: differentiable architecture approximation. In Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, 8-14 December 2019, Vancouver, BC, Canada, pages 874–884, 2019.
  • (6) Yukang Chen, Gaofeng Meng, Qian Zhang, Shiming Xiang, Chang Huang, Lisen Mu, and Xinggang Wang. RENAS: reinforced evolutionary neural architecture search. In

    IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2019, Long Beach, CA, USA, June 16-20, 2019

    , pages 4787–4796, 2019.
  • (7) Ran Cheng, Yaochu Jin, Markus Olhofer, and Bernhard Sendhoff.

    A reference vector guided evolutionary algorithm for many-objective optimization.

    IEEE Trans. Evolutionary Computation

    , 20(5):773–791, 2016.
  • (8) Luke Nicholas Darlow, Elliot J. Crowley, Antreas Antoniou, and Amos J. Storkey. CINIC-10 is not imagenet or CIFAR-10. CoRR, abs/1810.03505, 2018.
  • (9) Kalyanmoy Deb, Samir Agrawal, Amrit Pratap, and T. Meyarivan. A fast and elitist multiobjective genetic algorithm: NSGA-II. IEEE Trans. Evolutionary Computation, 6(2):182–197, 2002.
  • (10) Kalyanmoy Deb and Himanshu Jain. An evolutionary many-objective optimization algorithm using reference-point-based nondominated sorting approach, part I: solving problems with box constraints. IEEE Trans. Evolutionary Computation, 18(4):577–601, 2014.
  • (11) Terrance Devries and Graham W. Taylor. Improved regularization of convolutional neural networks with cutout. CoRR, abs/1708.04552, 2017.
  • (12) Xuanyi Dong and Yi Yang. Searching for a robust neural architecture in four GPU hours. In IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2019, Long Beach, CA, USA, June 16-20, 2019, pages 1761–1770, 2019.
  • (13) Andrew Hundt, Varun Jain, and Gregory D. Hager. sharpdarts: Faster and more accurate differentiable architecture search. CoRR, abs/1903.09900, 2019.
  • (14) Philipp Jund, Nichola Abdo, Andreas Eitel, and Wolfram Burgard. The freiburg groceries dataset. CoRR, abs/1611.05799, 2016.
  • (15) Alex Krizhevsky and Geoffrey Hinton. Learning multiple layers of features from tiny images. Technical report, University of Toronto, 2009.
  • (16) Gustav Larsson, Michael Maire, and Gregory Shakhnarovich. Fractalnet: Ultra-deep neural networks without residuals. In 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24-26, 2017, Conference Track Proceedings, 2017.
  • (17) Chenxi Liu, Barret Zoph, Maxim Neumann, Jonathon Shlens, Wei Hua, Li-Jia Li, Li Fei-Fei, Alan L. Yuille, Jonathan Huang, and Kevin Murphy. Progressive neural architecture search. In Computer Vision - ECCV 2018 - 15th European Conference, Munich, Germany, September 8-14, 2018, Proceedings, Part I, pages 19–35, 2018.
  • (18) Hanxiao Liu, Karen Simonyan, and Yiming Yang. DARTS: differentiable architecture search. In 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 6-9, 2019, 2019.
  • (19) Renqian Luo, Fei Tian, Tao Qin, Enhong Chen, and Tie-Yan Liu. Neural architecture optimization. In Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, NeurIPS 2018, 3-8 December 2018, Montréal, Canada, pages 7827–7838, 2018.
  • (20) Niv Nayman, Asaf Noy, Tal Ridnik, Itamar Friedman, Rong Jin, and Lihi Zelnik-Manor. XNAS: neural architecture search with expert advice. In Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, 8-14 December 2019, Vancouver, BC, Canada, pages 1975–1985, 2019.
  • (21) Yuval Netzer, Tao Wang, Adam Coates, Alessandro Bissacco, Bo Wu, and Andrew Y. Ng. Reading digits in natural images with unsupervised feature learning. In NIPS Workshop on Deep Learning and Unsupervised Feature Learning 2011, 2011.
  • (22) Asaf Noy, Niv Nayman, Tal Ridnik, Nadav Zamir, Sivan Doveh, Itamar Friedman, Raja Giryes, and Lihi Zelnik-Manor. ASAP: architecture search, anneal and prune. CoRR, abs/1904.04123, 2019.
  • (23) Hieu Pham, Melody Y. Guan, Barret Zoph, Quoc V. Le, and Jeff Dean. Efficient neural architecture search via parameter sharing. In

    Proceedings of the 35th International Conference on Machine Learning, ICML 2018, Stockholmsmässan, Stockholm, Sweden, July 10-15, 2018

    , pages 4092–4101, 2018.
  • (24) Bo-Yang Qu and Ponnuthurai N. Suganthan. Multi-objective evolutionary algorithms based on the summation of normalized objectives and diversified selection. Inf. Sci., 180(17):3170–3181, 2010.
  • (25) Esteban Real, Alok Aggarwal, Yanping Huang, and Quoc V. Le. Regularized evolution for image classifier architecture search. In The Thirty-Third AAAI Conference on Artificial Intelligence, AAAI 2019, The Thirty-First Innovative Applications of Artificial Intelligence Conference, IAAI 2019, The Ninth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2019, Honolulu, Hawaii, USA, January 27 - February 1, 2019, pages 4780–4789, 2019.
  • (26) Ozan Sener and Vladlen Koltun. Multi-task learning as multi-objective optimization. In Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, NeurIPS 2018, 3-8 December 2018, Montréal, Canada, pages 525–536, 2018.
  • (27) Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott E. Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, and Andrew Rabinovich. Going deeper with convolutions. In IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2015, Boston, MA, USA, June 7-12, 2015, pages 1–9, 2015.
  • (28) Mingxing Tan, Bo Chen, Ruoming Pang, Vijay Vasudevan, Mark Sandler, Andrew Howard, and Quoc V. Le. Mnasnet: Platform-aware neural architecture search for mobile. In IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2019, Long Beach, CA, USA, June 16-20, 2019, pages 2820–2828, 2019.
  • (29) Mingxing Tan and Quoc V. Le. Efficientnet: Rethinking model scaling for convolutional neural networks. In Proceedings of the 36th International Conference on Machine Learning, ICML 2019, 9-15 June 2019, Long Beach, California, USA, pages 6105–6114, 2019.
  • (30) Bichen Wu, Xiaoliang Dai, Peizhao Zhang, Yanghan Wang, Fei Sun, Yiming Wu, Yuandong Tian, Peter Vajda, Yangqing Jia, and Kurt Keutzer. Fbnet: Hardware-aware efficient convnet design via differentiable neural architecture search. In IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2019, Long Beach, CA, USA, June 16-20, 2019, pages 10734–10742, 2019.
  • (31) Han Xiao, Kashif Rasul, and Roland Vollgraf. Fashion-mnist: a novel image dataset for benchmarking machine learning algorithms. CoRR, abs/1708.07747, 2017.
  • (32) Zhaohui Yang, Yunhe Wang, Xinghao Chen, Boxin Shi, Chao Xu, Chunjing Xu, Qi Tian, and Chang Xu. CARS: continuous evolution for efficient neural architecture search. In IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2020, 2020.
  • (33) Xiangyu Zhang, Xinyu Zhou, Mengxiao Lin, and Jian Sun. Shufflenet: An extremely efficient convolutional neural network for mobile devices. In 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pages 6848–6856, 2018.
  • (34) Zhao Zhong, Junjie Yan, Wei Wu, Jing Shao, and Cheng-Lin Liu. Practical block-wise neural network architecture generation. In 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pages 2423–2432, 2018.
  • (35) Barret Zoph, Vijay Vasudevan, Jonathon Shlens, and Quoc V. Le. Learning transferable architectures for scalable image recognition. In 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pages 8697–8710, 2018.