1 Introduction
Deep learning has achieved great success in many areas, such as computer vision, nature language processing, speech, gaming and so on. The design of the neural network architecture is important for such success. However, such design relies heavily on the knowledge and experience of experts and even experienced experts cannot design the optimal architecture. Therefore, Neural Architecture Search (NAS), which aims to design the architecture of neural networks in an automated way, has attracted great attentions in recent years. NAS has demonstrated the capability to find neural network architectures with stateoftheart performance in various tasks [emh19, lsy19, tcpvshl19, xzll19]
. Search strategies in NAS are based on several techniques, including reinforcement learning
[pham2018efficient, zoph2016neural][liu2017hierarchical, real2019regularized], Bayesian optimization, and gradient descent [lsy19, xu2019pc, chen2019progressive]. As a representative of gradientdescentbased NAS methods, the Differentiable ARchiTecture Search (DARTS) method [lsy19] becomes popular because of its good performance and low search cost.However, those NAS methods are typically only designed for optimizing the accuracy during the architecture searching process while neglecting other significant objectives, which results in very limited application scenarios. For example, a deep neural network with high computational and storage demands is difficult to deploy to embedded devices (mobile phone and IoT device), where the resource is limited. Besides, the robustness of deep neural networks is also important. It is well known that the trained neural networks are easily misled by adversarial examples [fgsm15, pgd17, fast_fgsm], which makes them risky to deploy in realworld applications. For example, a spammer can easily bypass the antispam email filter system by adding some special characters as perturbations, and a selfdriving car cannot recognize the guideboard correctly after sticking some adversarial patches.
Therefore, multiobjective NAS has drawn great attention recently because we need to consider more than performance when NAS meets realworld applications [cztl20, emh19, tcpvshl19]. In [jin2019rc, emh19, cztl20], the model size and computational cost are considered to satisfy some resource constraint. Besides, some works [Dong20, RobNets] search for differentiable architectures that can defense adversarial attacks. However, to the best of our knowledge, there is no work to simultaneously optimize the three objectives, the performance, the robustness, and the resource constraint.
To fill this gap, this paper proposes an Effective, Efficient, and Robust Neural Architecture Search method (E2RNAS) to balance the tradeoff among multiple objectives. Built on DARTS, the proposed E2RNAS method formulates the entire objective function as a bilevel multiobjective optimization problem where the upperlevel problem is a multiobjective optimization problem, which can be viewed as an extension of the objective function proposed in DRATS. To the best of our knowledge, there is little work to solve such bilevel multiobjective optimization problem based on gradient descent techniques. To solve such problem, we propose an optimization algorithm by combining the multiple gradient descent algorithm (MGDA) [desideri12] and the bilevel optimization algorithm [colson2007overview].
Specifically, the contributions of this paper are threefold.

We propose the E2RNAS method for searching effective, efficient and robust network architectures, leading to a practical DARTSbased framework for multiobjective NAS.

We formulate the objective function of the E2RNAS method as a novel bilevel multiobjective optimization problem and propose an efficient algorithm to solve it.

Experiments on benchmark datasets show that the proposed E2RNAS method can find adversarially robust architectures with optimized model size and comparable classification accuracy.
2 Related Works
2.1 Adversarial Attack and Defence
Deep neural networks are not robust while facing adversarial attacks [sze14]. Most adversarial attacks are whitebox attacks that assume attack algorithms can access to all configurations of the trained neural network, including the architecture and model weights.
Fast Gradient Sign Method
Goodfellow [fgsm15]
propose Fast Gradient Sign Method (FGSM) for generating adversarial examples. It directly uses the sign of the gradient of the loss function with respect to weights as the direction of the adversarial perturbation as
where is the original input, is a small scalar to represent the strength of the perturbation, denotes parameters of the victim model, is the original groundtruth label for input , denotes the elementwise sign function, denotes the loss function used for training the victim model, and denotes its gradient with respect to .
Projected Gradient Descent (PGD)
Instead of generating onestep perturbations as in FGSM, Kurakin [pgd17] propose the PGD method by applying a small number of iterative steps. To ensure the perturbation is in neighborhood of the original image, the PGD method clips the intermediate results after each iteration as
where is the perturbation generated in the th steps, is the attack step size. means to elementwisely clip the input to lie within an interval , .
Adversarial Training
Adversarial training is an effective method for defending adversarial attacks [fgsm15, pgd17, pgd7, fast_fgsm]. Goodfellow [fgsm15] leverage the FGSM as a regularizer to train deep neural networks and make the model more resistant to adversarial examples. Wong [fast_fgsm] use FGSM adversarial training with random initialization for the perturbation. The proposed method can speed up the adversarial training process and it is as effective as the PGDbased adversarial training.
2.2 MultiObjective Optimization
Multiobjective optimization aims to optimize more than one objective function simultaneously. Among different techniques to solve multiobjective problems, we are interested in gradientbased multiobjective optimization algorithms [desideri12, fliege2000steepest, schaffler2002stochastic], which leverage the KarushKuhnTucker (KKT) conditions [kuhn2014nonlinear] to find a common descent direction for all objectives.
In this paper, we utilize one such method, MGDA [desideri12]. With objective functions to be minimized, MGDA is an iterative method by first solving the following quadratic programming problem as
(1) 
where denotes the
norm of a vector and
can be viewed as a weight for the th objective, and then minimizing with respect to . When convergent, the MGDA can find a Paretostationary solution.2.3 MultiObjective NAS
Most NAS methods focus on searching architectures with the best accuracy. However, in realworld applications, other factors, such as model size and robustness, must be considered. To take those factors into consideration, some works on multiobjective NAS have been proposed in recent years. LEMONADE [emh19] considers two objectives, including maximizing the validation accuracy and minimizing the number of parameters. It is based on the evolutionary algorithm and thus the search cost is quite high. MnasNet [tcpvshl19] uses a reinforcement learning approach to optimize both the accuracy and inference latency together when searching the architecture. Chen [cztl20] perform the neural architecture search based on the reinforcement learning to optimize three objectives, including the maximization of the validation accuracy, the minimization of the number of parameters, and the minimization of the number of FLOPs. FBNet [wu2019fbnet] also considers both the accuracy and model latency when searching the architecture via a gradientbased method to solve the corresponding multiobjective problem. Built on DARTS, RCDARTS [jin2019rc] considers to search architectures with high accuracy while constraining model parameters of the searched architecture below a threshold. Therefore, the proposed objective function is formulated as a constrained optimization problem and a projected gradient descent method is proposed to solve it. Based on DARTS, GOLDNAS [bi2020gold] considers three objectives: maximizing the validation accuracy, minimizing the number of parameters, and minimizing the number of FLOPs, and by enlarging the search space, it proposes a onelevel optimization algorithm instead of the bilevel optimization.
3 The E2RNAS Method
In this section, we present the proposed E2RNAS method. We first give an overview of the DARTS method and then introduce how to achieve the robustness and formulate the objective to constrain the number of parameters in the search architecture. Finally we present the bilevel multiobjective problem of the proposed E2RNAS method as well as its optimization.
3.1 Preliminary: DARTS
DARTS [lsy19] aims to learn a Directed Acyclic Graph (DAG) called cell, which can be stacked to form a neural network architecture. Each cell consists of nodes
, each of which denotes a hidden representation.
denotes a discrete operation space. The edge of the DAG represents an operation function (skip connection or pooling) fromwith a probability
to perform at the node . Therefore, we can formulate each edge as a weighted sum function to combine all the operations in as . An intermediate node is the sum of its predecessors, . The output of the cell, node , is the concatenation of all the output of nodes excluding the two input nodes and . Therefore, can parameterize the searched architecture, where denotes the set of all the edges from all the cells.Let denote the training dataset and denote the corresponding set of labels. Similarly, the validation dataset and labels are denoted by and . We use to denote all the weights of the neural network and to denote the loss function. DARTS is to solve a bilevel optimization problem as
(2)  
where and represent the training and validation losses, respectively. Here is called the upperlevel problem and is called the lowerlevel problem.
When the search procedure finishes, the final architecture can be determined by the operation with the largest probability in each cell, .
3.2 Adversarial Training for Robustness
In E2RNAS, we expect the searched architecture to be robust, which means that for the trained model with the searched architecture, its performance is stable when adding some perturbation to the dataset. To improve the robustness of the searched architecture, we leverage the adversarial training method in [fast_fgsm] to train a robust model.
Following [fast_fgsm], for each sample and its corresponding label , we can generate a perturbation for using one single step as
where is the perturbation size,
is randomly initialized with an uniform distribution on the interval
, and is the attack step size. Therefore, we generate the adversarial instance as . Obviously, the FGSM is a special case of this method when is initialized with zero and . This FGSMbased adversarial training method with random initialization for [fast_fgsm] can effectively defense the PGD adversarial attack [pgd17], while not adding much computational cost in the architecture search procedure.We use these perturbed data to learn the network parameters so that the trained model can defense adversarial attacks. Therefore, we aim to minimize the training loss of the perturbed data as
(3) 
Note that this adversarial training method trains the model only on adversarial examples, which is different from the FGSMbased adversarial training method [fgsm15] that uses them as a regularization term for training.
3.3 Objective Function of Resource Constraints
Architectures with a small number of parameters have more application scenarios even in resourceconstrained mobile devices. Therefore, we regard resource constraints as one of the desired objectives.
By following DARTS [lsy19], we determine the operation of each cell in the final architecture as the one with the largest probabilities. So the number of parameters in an architecture can be computed as
(4) 
where denotes the number of parameters corresponding to the operation .
Note that in Eq. (4) is a nondifferentiable operation, making the computation of the gradient of with respect to infeasible. To make such operation differentiable, we use the softmax trick to approximate the operation and then formulate the approximation as
(5) 
Furthermore, to prevent the model to search oversimplified architectures (the one containing too many parameterfree operations) that leads to unsatisfactory performance, we add a lower bound to the parameter size in Eq. (5), . Therefore, the objective function of the resource constraint can be formulated as
(6) 
Different from RCDARTS [jin2019rc] that directly adds the resource constraint into the original DARTS objective function (2) as a constraint and formulates the objective function as a constrained optimization problem, here we take it as an objective function.
3.4 Bilevel MultiObjective Formulation
E2RNAS aims to search the architecture parameter to minimize the validation loss for the effectiveness and the number of parameters for the efficiency, while achieving the robustness via the adversarial training. Thus, we combine Eqs. (3) and (6) as well as the adversarial training to formulate the entire objective function as
(7)  
Problem (7) is similar to the bilevel optimization problem (2) in the DARTS, where the lowerlevel problem () is similar, but there exists significant differences in that the upperlevel problem () contains two objectives. So problem (7) is a bilevel multiobjective optimization problem which is a generalization of problem (2) in the DARTS. There are few works on bilevel multiobjective optimization [calvete2010linear, deb2009solving, zhang2012improved, ruuska2012constructing] and to the best of our knowledge, the proposed optimization algorithm as introduced in the following is the first gradientbased algorithm to solve general bilevel multiobjective optimization problems.
Problem (7) can be understood as a twostage optimization. Firstly, when given an architecture parameter , we can learn a robust model with optimal model weights via the empirical risk minimization on adversarial examples. Secondly, given , the architecture parameter is updated on the validation dataset by making a tradeoff between its performance and model size. Therefore, we can solve problem (7) in two stages, which are described as follows.
Updating
Given the architecture parameter , can be simply updated as
(8) 
where denotes the index of the iteration and denotes the learning rate.
Updating
After obtaining , we can optimize the upperlevel problem to update the architecture parameter . As the upperlevel problem is a multiobjective optimization problem, we adopt the MGDA to solve it. In MGDA, we first need to solve problem (1), which requires the computation of the gradients of the two objectives with respect to . The gradient of with respect to is easy to compute, while the gradient of with respect to is a bit complicate as is also a function of and it is too expensive to obtain . Therefore, we use a secondorder approximation as
(9)  
Obviously when , becomes an approximation of and Eq. (9) degenerates to the firstorder approximation, which can speed up the gradient computation and reduce the memory cost but lead to worse performance [lsy19]. So we use the secondorder approximation in Eq. (9). Then due to the two objectives in the upperlevel problem of problem (7), we can simplify problem (1) as a onedimensional quadratic function of as
(10) 
where and denote the gradients of two objectives, respectively. Here can be viewed the weight for the first objective and is for the second objective. It is easy to show that problem (10) has an analytical solution as
(11) 
After that, we can update by minimizing as
(12) 
where denotes the learning rate for .
Comparison between E2RNAS and DARTS
Though the proposed E2RNAS method is based on the DARTS, there are two key differences between them, which are shown in Figure 1. Firstly, E2RNAS adopts the adversarial training to improve the robustness of the corresponding neural network. Secondly, E2RNAS evaluates model with two objectives: minimizing the validation loss for the effectiveness and the number of parameters for the efficiency. Therefore, E2RNAS can search an effective, efficient, and robust architecture. The whole algorithm is summarized in Algorithm 1.
4 Experiments
In this section, we empirically evaluate the proposed E2RNAS method on three image datasets, including CIFAR10
[krizhevsky2009learning], CIFAR100 [krizhevsky2009learning], and SVHN
[netzer2011reading]. Details about these datasets are presented in the Appendix.4.1 Implementation Details
Search Space
The search space adopts the same setting as DARTS [lsy19]. There are two types of cells, the reduction cell and the normal cell. The reduction cell is located at the and of the total depth of the network and other cells belong to the normal cell. For both reduction and normal cells, there are nodes in each cell, including four intermediate nodes, two input nodes, and one output node. In both normal and reduction cell, the set of operations contains eight operations, including separable convolutions, separable convolutions, dilated separable convolutions, dilated separable convolutions, max pooling,
average pooling, identity, zero. For the convolution operator, the ReLUConvBN order is used.
Training Settings
By following DARTS [lsy19], a half of the standard training set is used for training a model and the other half for validation. A small network of 8 cells is trained via the FGSMbased adversarial training method [fast_fgsm] in Eq. (3
) with the batch size as 64 and initial channels as 16 for 50 epochs. Following the setting of
[fast_fgsm], the perturbation of the FGSM adversary is randomly initialized from the uniform distribution in , where . The attack step size is set to . The SGD optimizer with the momentum and the weight decayis used. The proposed method is implemented in PyTorch 0.3.1 and all the experiments are conducted in Tesla V100S GPUs with 32G CUDA memory.
Evaluation Settings
A large network of 20 cells is trained on the full training set for 600 epochs, with the batch size as 96, the initial number of channels 36, a cutout of length 16, the dropout probability 0.2, and auxiliary towers of weight 0.4. To make the model size comparable, we adjust the initial channels of each cell for both DARTS and the proposed E2RNAS method, which is denoted by “”. The accuracy is tested on the full testing set. Adversarial examples are generated using the PGD attack [pgd17] with the perturbation size on the testing set. The PGD attack takes 10 iterative steps with the step size of as suggested in [pgd7].
Architecture  Test Err.  Params  PGD Acc.  Search Cost  Search Method 

(%)  (MB)  (%)  (GPU days)  
DenseNetBC [huang2017densely]  3.46  25.6      manual 
NASNetA [zoph2016neural]  2.65  3.3    2000  RL 
AmoebaNetB [real2019regularized]  2.550.05  2.8    3150  evolution 
Hireachical Evolution [liu2017hierarchical]  3.750.12  15.7    300  evolution 
PNAS [liu2018progressive]  3.410.09  3.2    225  SMBO 
ENAS [pham2018efficient]  2.89  4.6    0.5  RL 
DARTS [lsy19]  2.59  3.349  6.57  0.595  gradientbased 
DARTSC28  2.68  2.061  5.42  0.595  gradientbased 
DARTSC20  3.15  1.083  3.90  0.595  gradientbased 
DARTSC12  3.09  0.416  3.08  0.595  gradientbased 
PDARTS [chen2019progressive]  2.59  3.434  8.35  0.247  gradientbased 
PCDARTS [xu2019pc]  2.65  3.635  9.53  0.426  gradientbased 
E2RNASC46  3.64  3.383  10.21  0.836  gradientbased 
E2RNASC36  4.19  2.102  9.61  0.836  gradientbased 
E2RNASC25  4.86  1.042  7.76  0.836  gradientbased 
E2RNASC16  6.03  0.449  6.76  0.836  gradientbased 
4.2 Analysis on Experimental Results
Search Architecture on CIFAR10
The normal and reduction cells searched by the E2RNAS method on the CIFAR10 dataset are presented in Figures 2 and 3, respectively. Different from DARTS [lsy19], the reduction cell in E2RNAS contains many convolution operations and the normal cell only includes one operation with parameters (the separable convolution). Thus, the parameter size of the architecture searched by E2RNAS is lower than that of DARTS because E2RNAS searched an architecture with fewer reduction cells.
Architecture Evaluation on CIFAR10
The comparison of the proposed E2RNAS method with stateoftheart NAS methods on the CIFAR10 dataset is shown in Table 1. Notably, E2RNAS outperforms these NAS methods in [zoph2016neural, real2019regularized, liu2017hierarchical, liu2018progressive] by searching for a more lightweight architecture with lower search costs of three to four orders of magnitude and a slightly higher test error rate. Moreover, although ENAS [pham2018efficient] slightly outperforms E2RNAS in the test accuracy and search time, it finds a deeper architecture with about doubled model size (4.6MB for “ENAS” 2.102MB for “E2RNASC36”).
Dataset  Architecture  Test Err.  Params  PGD Acc. 

(%)  (MB)  (%)  
CIFAR100 
DARTS [lsy19]  17.17  3.401  2.06 
DARTSC34  17.70  3.047  1.67  
DARTSC27  17.78  1.960  1.70  
DARTSC19  19.15  1.010  1.34  
PDARTS [chen2019progressive]  15.67  3.485  4.58  
PCDARTS [xu2019pc]  16.66  3.687  4.29  
E2RNASC38  19.30  3.459  4.90  
E2RNASC36  19.19  3.120  4.00  
E2RNASC29  19.80  2.075  3.78  
E2RNASC20  22.97  1.041  3.44  
SVHN 
DARTS [lsy19]  2.16  3.449  46.78 
DARTSC34  2.18  2.998  41.32  
DARTSC28  2.13  2.061  35.35  
DARTSC20  2.16  1.083  40.38  
PDARTS [chen2019progressive]  2.12  3.433  49.11  
PCDARTS [xu2019pc]  2.20  3.635  54.81  
E2RNASC39  2.21  3.421  44.15  
E2RNASC36  2.14  2.935  53.82  
E2RNASC30  2.13  2.075  52.38  
E2RNASC21  2.21  1.062  54.96 
Method  adv  nop  MGDA  L  Test Err.  Params  PGD Acc. 

(MB)  (%)  (MB)  (%)  
E2RNAS  4.19  2.102  9.61  
w/o adv  2.75  3.733  10.35  
w/o adv (C27)  2.84  2.148  8.91  
w/o adv ()  7.95  1.370  4.00  
w/o nop  8.29  1.370  5.21  
w/o MGDA  5.48  2.105  8.11  
w/o L  8.30  1.370  4.39 
Compared to the original DARTS in [lsy19], “E2RNASC36” significantly improves the robustness with lower model size and comparable search cost while the classification error increases slightly. Some studies [raghunathan2019adversarial, yang2020closer] show that the increased robustness is usually accompanied by decreased test accuracy. Therefore, the increased test error of E2RNAS is because of the improved robustness and the decreased parameter size, which indicates E2RNAS can make a better tradeoff among these three goals than DARTS.
Besides, both PDARTS [chen2019progressive] and PCDARTS [xu2019pc] search for a deeper architecture with less search cost than “E2RNASC36”, so they slightly outperform in the test error rate with competitive PGD accuracy. We can apply the E2RNAS method to PDARTS and PCDARTS to make a tradeoff among multiple objectives (the accuracy, the robustness and the number of parameters) in future work.
To further compare the performance of E2RNAS and DARTS, we change the initial number of channels in the architecture evaluation for both methods to keep a roughly similar model size. According to the results shown in Table 1, we can see that E2RNAS remarkably improves the robustness with comparable classification accuracy. For example, compared “E2RNASC46” with “DARTS”, the PGD accuracy increases about 1.6 times, while the test error increases by only around 0.9%.
In summary, experiment results in Table 1 show tat E2RNAS can search significantly robust architectures with a lower model size and comparable classification accuracy, compared with stateoftheart NAS methods.
Architecture Evaluation on CIFAR100 and SVHN
The comparison of E2RNAS with DARTS on the CIFAR100 and SVHN datasets is presented in Table 2. The performance of E2RNAS on the CIFAR100 dataset is similar to that on the CIFAR10 dataset in that E2RNAS can search a robust architecture with a lower model size and a slightly decreased test accuracy. For example, compared to DARTS, “E2RNASC36” reduces the number of parameters by 0.3MB and improves the PGD accuracy by nearly twice times, though the test error is slightly increased (about 2%). In addition, E2RNAS shows excellent results on the SVHN dataset. It not only significantly improves the robustness but also achieves competitive test accuracy with a lower parameter size. For instance, compared to DARTS, “E2RNASC36” reduces the model size by about 15% and increases the PGD accuracy, while keeping competitive performance. Therefore, those quantitative experiments indicate that E2RNAS can search robust architectures with a lower model size and comparable performance.



4.3 Ablation Study
In this section, we study how each design in E2RNAS influences its performance on different objectives. The corresponding results are presented in Table 3. The adversarial training (abbreviated as adv) in the lowerlevel problem of problem (7) transforms training data to adversarial examples and hopes to learn a robust model when given an architecture. The resource constraint (abbreviated as nop) in the upperlevel problem of problem (7) expects to constrain the parameter size of the searched architecture. The multiplegradient descent algorithm (abbreviated as MGDA) is applied to solve the upperlevel problem of problem (7), which is a multiobjective problem to minimize both the validation accuracy and the model size. If without MGDA, it means that we solve the upperlevel problem by minimizing an equally weighted sum of two objectives ( in Eq. (12)). The lower bound (abbreviated as L) of the number of parameters expects to prevent the model to search oversimplified architectures.
Impact of Adversarial Training
The adversarial training, which trains a neural network on adversarial examples, is an effective method for improving the robustness of a neural network. Thus, we apply it in the lowerlevel problem of problem (7) and hope the searched architecture can defense adversarial attacks. Here we discuss two impacts of the adversarial training in details.
Firstly, using the adversarial training tends to reduce the number of parameters, which may leads to worse accuracy. We notice that the parameter size of the architecture searched by DARTS with adversarial training (“E2RNAS w/o nop” in Table 3) is only 1.37MB, which means that the searched architecture contains many parameterfree operations. Therefore, it has a larger test error because of its simplistic architecture, although its PGD accuracy is larger than DARTS with a comparable model size (“DARTSC20” in Table 1). Besides, compared with E2RNAS, the model size of “E2RNAS w/o adv” increases by 1.631MB, which indicates that the adversarial training significantly decreases the number of parameters. However, it can be alleviated by constraining the parameter size with a lower bound .
Secondly, using the adversarial training can help E2RNAS make a tradeoff between the robustness and accuracy. We notice adversarial training can significantly influences the model size. Therefore, to make a fair comparison, we set the number of initial channels of “E2RNAS w/o adv” (“E2RNAS w/o adv (C27)”) in the architecture evaluation to keep its model size roughly similar to E2RNAS. The result in Table 3 shows “E2RNAS” has a better robustness but lower accuracy than “E2RNAS w/o adv (C27)”.
Therefore, using adversarial training can help E2RNAS to make a tradeoff among multiple objectives and search a robust architecture with a lower model size.
Effectiveness of MGDA
MGDA is used to solve the upperlevel problem of problem (7). We quantitatively compare the performance of E2RNAS with and without MGDA (“E2RNAS” “E2RNAS w/o MGDA” in Table 3) and find that solving with MGDA achieves much better results on the test accuracy, parameter size, and PGD accuracy. So instead of using equal weights, using MGDA can find a good solution of weights and make a tradeoff among multiple objectives.
Necessity of
We find that training E2RNAS without the minimum constraint (“E2RNAS w/o L” in Table 3) searches an architecture with many parameterfree operations (its parameter size is only 1.370MB). There are three reasons for this phenomenon. Firstly, the instability of DARTS sometimes makes it converge to extreme architectures (full of skipconnects) [zela2019understanding, chen2019progressive]. Secondly, as discussed above, using the adversarial training in the lowerlevel problem tends to reduce the number of parameters. Finally, only optimizing the number of parameters in the upperlevel problem (“E2RNAS w/o adv ()” in Table 3) also results in searched architectures with many parameterfree operations. Therefore, it is necessary to constrain the number of parameters with a lower bound to prevent E2RNAS to search oversimplified architectures.
Figure 4 shows the architecture evaluation results of E2RNAS on the CIFAR10 dataset using different in Eq. (6
). Hence, we set this hyperparameter
to 1 in our work because E2RNAS achieves the best performance (lowest test error rate in Figure 4(a), acceptable model size in Figure 4(b), highest PGD accuracy in Figure 4(c)) when .5 Conclusions
In this paper, we propose the E2RNAS method that optimizes multiple objectives simultaneously to search an effective, efficient and robust architecture. The proposed objective function is formulated as a bilevel multiobjective problem and we design an algorithm to integrate the MGDA with the bilevel optimization. Experiments demonstrate that E2RNAS can find adversarial robust architecture with optimized model size and comparable classification accuracy on various datasets. In our future study, we are interested in extending the proposed E2RNAS method to search for multiple Paretooptimal architectures at one time.
Comments
There are no comments yet.