1 Introduction
It has been widely acknowledged that deep neural networks (DNN) have made tremendous breakthroughs benefiting both academia and industry. Despite being effective, many DNN models trained with benign inputs are vulnerable to small and undetectable perturbation added to original data and tend to make wrong predictions under such threats. Those perturbed examples, also known as adversarial examples, can be easily constructed by algorithms such as DeepFool [moosavi2016deepfool], Fast Gradient Sign Method (FGSM) [goodfellow2014explaining], and CarliniWagner (C&W) attack [carlini2017towards]. Moreover, such adversarial attacks can also be conducted in the blackbox setting [brendel2017decision, cheng2018query, cheng2020signopt] and can appear naturally in the physical world [hendrycks2019natural, kurakin2016adversarial2]
. This phenomenon can bring about serious consequences in domains such as face recognition and autonomousdriving. Therefore, how to train a model resistant to adversarial inputs has become an important topic.
A variety of defense methods have been proposed to improve the performance of DNNs against adversarial attacks [kurakin2016adversarial, samangouei2018defense, wang2019direct, wang2019bilateral, xie2019feature, zhang2019theoretically]. Among them, adversarial training [kurakin2016adversarial] stands out for its effectiveness. Moreover, [madry2017towards] shows that adversarial training can be formulated as a minimax optimization problem, resembling a game between the attacker and the defender. The formulation is so intuitive that the inner problem aims at generating adversarial examples by maximizing the training loss while the outer one guides the network in the direction that minimizes the loss to resist attacks. However, directly obtaining the optimal value of the inner maximization is infeasible, so one has to run an iterative optimization algorithm for a fixed number (often 10) iterations to get an approximate inner maximizer.
Existing adversarial training often uses handdesigned general purpose optimizers, such as PGD attack, to (approximately) solve the inner maximization. However, there is an essential property of adversarial training that is rarely explored: the maximization problems associated with each sample share very similar structure, and a good inner maximizer for adversarial training only needs to work well for this set of datadependent problems. To be specific, there are a finite of maximization problems need to be solved (where is number of training samples), and those maximization problems share the same objective function along with identical network structure and weights, and the only difference is their input . Based on this observation, can we have a better optimizer that in particular works well for these very similar and datadependent problems?
Motivated by this idea, we propose a learned optimizer for improved adversarial training. Instead of using an existing optimizer with a fixed update rule (such as PGD), we aim at learning the inner maximizer that could be faster and more effective for this particular set of maximization problems. We have noticed that two works have already put forward algorithms to combine learning to learn with adversarial training [jiang2018learning, jang2019adversarial]
. Both of them adopt a convolutional neural network (CNN) generator to produce malicious perturbations whereas CNN structure might complicate the training process and cannot grasp the essence of the update rule in the long term. In contrast, we propose an L2Lbased adversarial training method with recurrent neural networks (RNN). RNN is capable of capturing longterm dependencies and has shown great potentials in predicting update directions and steps adaptively
[lv2017learning]. Thus, following the framework in [andrychowicz2016learning], we leverage RNN as the optimizer to generate perturbations in a coordinatewise manner. Based on the properties of the inner problem, we tailor our RNN optimizer with removed bias and weighted loss for further elaborations to ameliorate issues like shorthorizon in L2L [wu2018understanding].Specifically, our main contributions in this paper are summarized as follows:

We first investigate and confirm the improvement in the model robustness from stronger attacks by searching a suitable step size for PGD.

In replacement of handdesigned algorithms like PGD, an RNNbased optimizer based on the properties of the inner problem is designed to learn a better update rule. In addition to standard adversarial training, the proposed algorithm can also be applied to any other minimax defense objectives such as TRADES [zhang2019theoretically].

Comprehensive experimental results show that the proposed method can noticeably improve the robust accuracy of both adversarial training [madry2017towards] and TRADES [zhang2019theoretically]. Furthermore, our RNNbased adversarial training significantly outperforms previous CNNbased L2L adversarial training and requires much less number of trainable parameters.
2 Related Work
2.1 Adversarial Attack and Defense
Model robustness has recently become a great concern for deploying deep learning models in realworld applications. Goodfellow et al. [goodfellow2014explaining] succeeded in fooling the model to make wrong predictions by Fast Gradient Sign Method (FGSM). Subsequently, to produce adversarial examples, IFGSM and Projected Gradient Descent (PGD) [goodfellow2014explaining, madry2017towards] accumulate attack strength through running FGSM iteratively, and CarliniWagner (C&W) attack [carlini2017towards] designs a specific objective function to increase classification errors. Besides these conventional optimizationbased methods, there are several algorithms [reddy2018nag, xiao2018generating] focusing on generating malicious perturbations via neural networks. For instance, Xiao et al. [xiao2018generating] exploit GAN, which is originally designed for crafting deceptive images, to output corresponding noises added to benign iuput data. The appearance of various attacks has pushed forward the development of effective defense algorithms to train neural networks that are resistant to adversarial examples. The seminal work of adversarial training has significantly improved adversarial robustness [madry2017towards]. It has inspired the emergence of various advanced defense algorithms: TRADES [zhang2019theoretically] is designed to minimize a theoreticallydriven upper bound and GAT [lee2017generative]
takes generatorbased outputs to train the robust classifier. All these methods can be formulated as a minimax problem
[madry2017towards], where the defender makes efforts to mitigate negative effects (outer minimization) brought by adversarial examples from the attacker (inner maximization). Whereas, performance of such an adversarial game is usually constrained by the quality of solutions to the inner problem [jang2019adversarial, jiang2018learning]. Intuitively, searching a better maxima for the inner problem can improve the solution of minimax training, leading to improved defensive models.2.2 Learning to Learn
Recently, learning to learn emerges as a novel technique to efficiently address a variety of problems such as automatic optimization [andrychowicz2016learning], fewshot learning [finn2017model], and neural architecture search [elsken2018neural]. In this paper, we emphasize on the subarea of L2L: how to learn an optimizer for better performance. Rather than using humandefined update rules, learning to learn makes use of neural networks for designing optimization algorithms automatically. It is developed originally from [cotter1990fixed] and [younger2001meta], in which early attempts are made to model adaptive algorithms on simple convex problems. More recently, [andrychowicz2016learning] proposes an LSTM optimizer for some complex optimization problems, such as training a convolutional neural network classifier. Based on this work, elaborations in [lv2017learning] and [wichrowska2017learned] further improve the generalization and scalability for learned optimizers. Moreover, [ruan2019learning] demonstrates that a zeroth order optimizer can also be learned using L2L. Potentials of learningtolearn motivates a line of L2Lbased defense which replaces handdesigned methods for solving the inner problem with neural network optimizers. [jiang2018learning] uses a CNN generator mapping clean images into corresponding perturbations. Since it only makes onestep and deterministic attack like FGSM, [jang2019adversarial] modifies the algorithm and produces stronger and more diverse attacks iteratively. Unfortunately, due to the large number of parameters and the lack of ability to capture the longterm dependencies, the CNN generator adds too much difficulty in the optimization, especially for the minimax problem in adversarial training. Therefore, we adopt an RNN optimizer in our method for a more stable training process as well as a better grasp of the update rule.
3 Preliminaries
3.1 Notations
We use bold lowercase letters and to represent clean images and their corresponding labels. An image classification task is considered in this paper with the classifier parameterized by . is an elementwise operation to output the sign of a given input with . denotes the neighborhood of as well as the set of admissible perturbed images: , where the infinity norm is adopted as the distance metric. We denote by the projection operator that maps perturbed data to the feasible set. Specifically, , which is an elementwise operator. is a multiclass loss like crossentropy.
3.2 Adversarial Training
In this part, we present the formulation of adversarial training, together with some handdesigned optimizers to solve this problem. To obtain a robust classifier against adversarial attacks, an intuitive idea is to minimize the robust loss, defined as the worstcase loss within a small neighborhood . Adversarial training, which aims to find the weights that minimize the robust loss, can be formulated as a minimax optimization problem in the following way [madry2017towards]:
(1) 
where is the empirical distribution of input data. However, (1) only focuses on accuracy over adversarial examples and might cause severe overfitting issues on the training set. To address this problem, TRADES [zhang2019theoretically] investigates the tradeoff between natural and robust errors and theoretically puts forward a different objective function for adversarial training:
(2) 
Note that (1) and (2) are both defined as minimax optimization problems, and to solve such saddle point problems, a commonly used approach is to first get an approximate solution of inner maximization based on the current , and then use to conduct updates on model weights . The adversarial training procedure then iteratively runs this on each batch of samples until convergence. Clearly, the quality and efficiency of inner maximization is crucial to the performance of adversarial training. The most commonly used inner maximizer is the projected gradient descent algorithm, which conducts a fixed number of updates:
(3) 
Here represents the maximization term in (1) or (2) with abuse of notation.
3.3 Effects of Adaptive Step Sizes
We found that the performance of adversarial training crucially depends on the optimization algorithm used for inner maximization, and the current widely used PGD algorithm may not be the optimal choice. Here we demonstrate that even a small modification of PGD and without any change to the adversarial training objective can boost the performance of model robustness. We use the CNN structure in [zhang2019theoretically] to train a classifier on MNIST dataset. When 10step PGD (denoted by PGD for simplicity) is used for the inner maximization, a constant step size is always adopted, which may not be suitable for the subsequent update. Therefore, we make use of backtracking line search (BLS) to select a step size adaptively for adversarial training (AdvTrain as abbreviation). Starting with a maximum candidate step size value , we iteratively decrease it by until the following condition is satisfied:
(4) 
where is a search direction. Based on a selected control parameter , the condition tests whether the update with step size leads to sufficient increase in the objective function, and it is guaranteed that a sufficiently small will satisfy the condition so line search will always stop in finite steps. This is standard in gradient ascent (descent) optimization, and see more discussions in [nocedal2006numerical]. Following the convention, we set and . As shown in Table 1, defense with AdvTrain+BLS leads to a more robust model than solving the inner problem only by PGD ( vs ). At the same time the attacker combined with BLS generates stronger adversarial examples: the robust accuracy of the model trained from vanilla adversarial training drops over with PGD+BLS, compared to merely PGD attack. This experiment motivates our efforts to find a better inner maximizer for adversarial training.
DefenseAttack  Natural  PGD  PGD+BLS 

AdvTrain  
AdvTrain+BLS 
4 Proposed Algorithm
4.1 Learning to Learn for Adversarial Training
As mentioned in the previous section, it can be clearly seen that the inner maximizer plays an important role in the performance of adversarial training. However, despite the effectiveness of BLS introduced in Section 3.3, it is impractical to combine it with adversarial training as multiple line searches together with loss calculation in this algorithms increase its computational burden significantly. Then a question arises naturally: is there any automatic way for determining a good step size for inner maximization without too much computation overhead? Moreover, apart from the step size, the question can be extended to whether such a maximizer can be learned for a particular dataset and model in replacement of a general optimizer like PGD. Recently, as a subarea of learningtolearn, researchers have been investigating whether it is possible to use machine learning, especially neural networks, to learn improved optimizer to replace the handdesigned optimizer [andrychowicz2016learning, lv2017learning, wichrowska2017learned]. However, it is commonly believed that those MLlearned generalpurpose optimizers are still not practically useful due to several unsolved issues. For instance, the exploded gradient [metz2019understanding] in unrolled optimization impedes generalization of these learned optimizers to longer steps and truncated optimization on the other hand induces shorthorizon bias [wu2018understanding].
In this paper, we show it is possible and practical to learn an optimizer for inner maximization in adversarial training. Note that in adversarial training, the maximization problems share very similar form:
, where they all have the same loss function
and the same network (structure and weights) , and the only difference is their input and label . Furthermore, we only need the maximizer to perform well on a fixed set of optimization problems for adversarial training. These properties thus enable us to learn a better optimizer that outperforms PGD.To allow a learned inner maximizer, we parameterize the learned optimizer by an RNN network. This is following the literature of learningtolearn [andrychowicz2016learning], but we propose several designs as shown below that works better for our inner maximization problem which is a constrained optimization problem instead of a standard unconstrained training task in [andrychowicz2016learning]. We then jointly optimize the classifier parameters () as well as the parameters of the inner maximizer (). The overall framework can be found in Figure 1.
Specifically, the inner problem is to maximize vanilla adversarial training loss in (1) or TRADES loss in (2), with a constraint that . We expand on adversarial training here and more details about TRADES can be found in Appendix 0.A. With an RNN optimizer parameterized by , we propose the following parameterized update rule to mimic the PGD update rule in (3):
(5) 
Here, is the gradient and is the hidden state representation. It has to be emphasized that our RNN optimizer generates perturbations coordinatewisely, in contrast to other L2L based methods which take as input the entire image. This property reduces trainable parameters significantly, making it much easier and faster for training. In addition, note that the hidden state of our RNN optimizer plays an important role in the whole optimization. A separate hidden state for each coordinate guarantees the different update behavior. And it contains richer information like the trajectory of loss gradients mentioned in [jang2019adversarial] but can produce a recursive update with a simpler structure.
For the RNN design, we mainly follow the structure in [andrychowicz2016learning] but with some modifications to make it more suitable to adversarial training. We can expand the computation of perturbation for each step as:
(6)  
(7) 
where , , , , and in the coordinatewise update manner. As the optimization proceeds, the gradient will become much smaller when approaching the local maxima. At that time, a stable value of the perturbation is expected without much change between two consecutive iterations. However, from (6) and (7), we can clearly see that despite small , the update rule will still produce an update with magnitude proportional to . Imagine the case where the exact optimal value is found with an allzero hidden state ( needs to be zero as well), with a nonzero bias will push the adversarial example away from the optimal one. Thus, two bias terms and are problematic for optimization close to the optimal solution. Due to the short horizon of the inner maximization in adversarial training, it is unlikely for the network to learn zero bias terms. Therefore, to ensure stable training, we remove the bias terms in the vanilla RNN in all implementations.
With an L2L framework, we simultaneously train the RNN optimizer parameters and the classifier weights together. The joint optimization problem can be formulated as follows:
(8)  
s.t.  (9) 
where is computed by running Eq.(5) times iteratively. Since the learned optimizer aims at finding a better solution to the inner maximization term, the objective function for training it in the horizon is defined as:
(10) 
Note that if we set for all and , then (10) implies that our learned maximizer will maximize the loss after iterations. However, in practice we found that considering intermediate iterations can further improve the performance since it will make the maximizer converges faster even after conducting one or few iterations. Therefore in the experiments we set an increased weights for . Note that [metz2019understanding] showed that this kind of unrolled optimization may lead to some issues such as exploded gradients which is still an unsolved problem in L2L. However, in adversarial training we only need to set a relative small (e.g., ) so we do not encounter that issue.
While updating the learned optimizer, corresponding adversarial examples are produced together. We can then train the classifier by minimizing the loss accordingly. The whole algorithm is presented in Algorithm 1.
4.2 Advantages over Other L2Lbased Methods
Previous methods have proposed to use a CNN generator [jang2019adversarial, jiang2018learning] to produce perturbations in adversarial training. However, CNNbased generator has a larger number of trainable parameters, which makes it hard to train. In Table 2
, the detailed properties including the number of parameter and training time per epoch are provided for different learningtolearn based methods. We can observe that our proposed RNN approach stands out with the smallest parameters as well as efficiency in training. Specifically, our RNN optimizer only has
parameters, almost 5000 times fewer than L2LDA while the training time per epoch is 268.50s (RNNTRADES only consumes 443.52s per training epoch) v.s. 1972.41s. Furthermore, our method also leads to better empirical performance, as shown in our main comparison in Table 3, 4 and 5. Comparison of our variants and original adversarial training methods can be found in Appendix 0.B.Number of parameters  Training time per epoch (s)  

RNNAdv  120  268.50 
RNNTRADES  120  443.52 
L2LDA  500944  1972.41 
5 Experimental Results
In this section, we present experimental results of our proposed RNNbased adversarial training. We compare our method with various baselines against both whitebox and blackbox attack. In addition, different datasets and network architectures are also evaluated.
5.1 Experimental Settings

Datasets and classifier networks. We mainly use MNIST [lecun1998gradient] and CIFAR10 [krizhevsky2010cifar] datasets for performance evaluation in our experiments. For MNIST, the CNN architecture with four convolutional layers in [carlini2017towards] is adopted as the classifier. For CIFAR10, we use both the standard VGG16 [simonyan2014very] and Wide ResNet [zagoruyko2016wide], which has been used in most of the previous defense papers including adversarial training [madry2017towards] and TRADES [zhang2019theoretically]
. We also conduct an additional experiment on Restricted ImageNet
[tsipras2018robustness] with ResNet18 and results are presented in Appendix 0.C. 
Baselines for Comparison. Note that our method is an optimization framework which is irrelevant to what minimax objective function is used. Therefore we choose two most popular minimax formulations, AdvTrain^{1}^{1}1https://github.com/xuanqing94/BayesianDefense [madry2017towards] and TRADES^{2}^{2}2https://github.com/yaodongyu/TRADES [zhang2019theoretically], and substitute the proposed L2Lbased optimization for their original PGDbased algorithm. Moreover, we also compare with a previous L2L defense mechanism L2LDA^{3}^{3}3https://github.com/YunseokJANG/l2lda [jang2019adversarial] which outperforms other L2Lbased methods for thorough comparison. We use the source code provided by the authors on github with their recommended hyperparameters for all these baseline methods.

Evaluation and implementation details. Defense algorithms are usually evaluated by classification accuracy under different attacks. Effective attack algorithms including PGD, C&W and the attacker of L2LDA are used for evaluating the model robustness, with the maximum perturbation strength for MNIST and for CIFAR10. For PGD, we run 10 and 100 iterations (PGD10 and 100) with the step size , as suggested in [jang2019adversarial]. C&W is implemented with 100 iterations in the infinity norm. For L2LDA attacker, it is learned from L2LDA [jang2019adversarial] under different settings with 10 attack steps. In addition, we also uses the learned optimizer of RNNAdv to conduct 10step attacks.
For our proposed RNNbased defense, we use a onelayer vanilla RNN with the hidden size of 10 as the optimizer for the inner maximization. Since we test our method under two different minimax losses, we name them as RNNAdv and RNNTRADES respectively. The classifier and the optimizer are updated alternately according to the Algorithm 1
. All algorithms are implemented in PyTorch1.1.0 with four NVIDIA 1080Ti GPUs. Note that all adversarial training methods adopt 10step inner optimization for fair comparison. We run each defense method five times with different random seeds and report the lowest classification accuracy.
5.2 Performance on Whitebox Attacks
We demonstrate the robustness of models trained from different defense methods under the whitebox setting in this part. Experimental results are shown in Table 3, 4 and 5. From these three tables, we can observe that our proposed L2Lbased adversarial training with RNN always outperforms its counterparts.
DefenseAttack  Natural  PGD10  PGD100  CW100  L2LDA  RNNAdv  Min 

Plain  99.46  1.04  0.42  83.63  5.94  0.79  0.42 
AdvTrain  99.17  94.89  94.28  98.38  95.83  94.39  94.28 
TRADES  99.52  95.77  95.50  98.72  96.03  95.50  95.50 
L2LDA  98.76  94.73  93.22  97.69  95.28  93.16  93.16 
RNNAdv  99.20  95.80  95.62  98.75  96.05  95.51  95.51 
RNNTRADES  99.46  96.09  95.83  98.85  96.56  95.80  95.80 
DefenseAttack  Natural  PGD10  PGD100  CW100  L2LDA  RNNAdv  Min 
Plain  93.66  0.74  0.09  0.08  0.89  0.43  0.08 
AdvTrain  81.11  42.32  40.75  42.26  43.55  41.07  40.75 
TRADES  78.08  48.83  48.30  45.94  49.94  48.38  45.95 
L2LDA  77.47  35.49  34.27  35.31  36.27  34.54  34.27 
RNNAdv  81.22  44.98  42.89  43.67  46.20  43.21  42.89 
RNNTRADES  80.76  50.23  49.42  47.23  51.29  49.49  47.23 

DefenseAttack 
Natural  PGD10  PGD100  CW100  L2LDA  RNNAdv  Min 

Plain  95.14  0.01  0.00  0.00  0.02  0.00  0.00 
AdvTrain  86.28  46.64  45.13  46.64  48.46  45.41  45.13 
TRADES  85.89  54.28  52.68  53.68  56.49  53.00  52.68 
L2LDA  85.30  45.47  44.35  44.19  47.16  44.54  44.19 
RNNAdv  85.92  47.62  45.98  47.26  49.40  46.23  45.98 
RNNTRADES  84.21  56.35  55.68  54.11  58.86  55.80  54.11 
To be specific, our method achieves robust accuracy among various attacks on MNIST dataset. On CIFAR10, RNNTRADES reaches and for VGG16 and Wide ResNet with and gain over other baselines. It should be stressed that our method surpasses L2LDA (the previous CNNbased L2L method) noticeably. For conventional defense algorithms, our L2Lbased variant improves the original method by percents under different attacks from comparison of robust accuracy in AdvTrain and RNNAdv. A similar phenomenon can also be observed in TRADES and RNNTRADES. Since previous works of L2Lbased defense only concentrate on PGDbased adversarial training, the substantial performance gain indicates that the learned optimizer can contribute to the minimax problem in TRADES as well. Furthermore, apart from traditional attack algorithms, we leverage our RNN optimizer learned from adversarial training as the attacker (the column RNNAdv). Results in three experiments show that compared with other general attackers when conducting 10 iterations such as PGD10 and L2LDA, ours is capable of producing much stronger perturbations which lead to low robust accuracy.
5.3 Analysis
Learned Optimizer. As mentioned in Section 5.2, the optimizer learned from PGDbased adversarial training can be regarded as an special attacker. Thus, we primarily investigate the update trajectories of different attackers to obtain an indepth understanding of our RNN optimizer. For VGG16 models trained from four defense methods, three attacker are used to generate perturbations in 10 steps respectively and losses are recorded as shown in Figure 2.
We can see clearly from these four figures that the losses obtained from RNNAdv are always larger than others within 10 iterations, reflecting stronger attacks produced by our proposed optimizer. Moreover, it should be noted that the loss gap between RNNAdv and other attackers is much more prominent at some very beginning iterations. This in fact demonstrates an advantage of the learningtolearn framework that the optimizer can converge faster than handdesigned algorithms.
Generalization to more attack steps. Although our learned RNN optimizer is only trained under 10 steps, we show that it can generalize to more steps as an attacker. From Table 7, we can observe that the attacker is capable of producing much stronger adversarial examples by extending its attack steps to 40. Performance of our attacker is even comparable with that of PGD100, which further demonstrates the superiority of our proposed method.
5.4 Performance on Blackbox Transfer Attacks
We further test the robustness of the proposed defense method under transer attack. As suggested by [athalye2018obfuscated], this can be served as a sanity check to see whether our defense leads to obfuscated gradients and gives a false sense of model robustness. Following procedures in [athalye2018obfuscated], we first train a surrogate model with the same architecture of the target model using a different random seed, and then generate adversarial examples from the surrogate model to attack the target model.
DefenseStep  10  40 

Plain  0.43  0.03 
AdvTrain  41.07  40.70 
TRADES  48.38  48.27 
L2LDA  34.54  34.19 
RNNAdv  43.21  42.89 
RNNTRADES  49.49  49.28 
DefenseSurrogate  PlainNet  PGDNet 
AdvTrain  79.94  62.57 
TRADES  77.01  65.41 
L2LDA  76.37  60.32 
RNNAdv  80.58  63.17 
RNNTRADES  79.54  
Specifically, we choose VGG16 models obtained from various defense algorithms as our target models. In the meanwhile, we train two surrogate models: one is PlainNet with natural training and the other is PGDNet with 10step PGDbased adversarial training. Results are presented in Table 7. We can observe that our method outperforms all other baselines, with RNNPGD and RNNTRADES standing out in defending attacks from PlainNet and PGDNet respectively. It suggests great resistance of our L2L defense to transfer attacks.
5.5 Loss Landscape Exploration
To further verify the superior performance of the proposed algorithm, we visualize the loss landscapes of VGG16 models trained under different defense strategies, as shown in Figure 3. According to the implementation in [engstrom2018evaluating]
, we modify the input along a linear space defined by the sign of the gradient and a random Rademacher vector, where the x and y axes represent the magnitude of the perturbation added in each direction and the z axis represents the loss. It can be observed that loss surfaces of models trained from RNNAdv and RNNTRADES in Figure
2(e) and 2(f) are much smoother than those of their counterparts in Figure 2(b) and 2(c). Besides, our method significantly reduces the loss value of perturbed data close to the original input. In particular, the maximum loss decreases roughly from in adversarial training to in RNNAdv. Compared with L2LDA in Figure 2(d), the proposed RNN optimizer can contribute to less bumpier loss landscapes with smaller variance, which further demonstrates the stability and superiority of our L2Lbased adversarial training.
6 Conclusion
For defense mechanisms that can be formulated as a minimax optimization problem, we propose to replace the inner PGDbased maximizer with a automatically learned RNN maximizer, and show that jointly training the RNN maximizer and classifier can significantly improve the defense performance. Empirical results demonstrate that the proposed approach can be combined with several minimax defense objectives, including adversarial training and TRADES.
For future work, it can be a worthwhile direction to address the inadequacy of L2L in dealing with a longhorizon problem. Then we can substitute the learned optimizer for handdesigned algorithms in both inner and outer problems, which enables an entirely automatic process for adversarial training.
References
Appendix 0.A Algorithm for TRADES
As we have emphasized, our proposed method can be incorporated into any adversarial training which can be formulated as a minimax optimization problem. Here we provide the detailed algorithm of RNNTRADES in Algorithm 2.
Appendix 0.B Additional Analysis
0.b.1 Time comparison
In this section, we compared training time of our proposed methods with the original adversarial training. We still conduct analysis of VGG16 on CIFAR10 dataset. From results in Table 8, it can be observed that our methods approximately double the overall training time per epoch, which is not a heavy burden with improved performance taken into account.
Training Time  Ratio of RNN counterpart  

AdvTrain  122.24  2.20 (268.50) 
TRADES  189.43  2.34 (443.52) 
0.b.2 Trajectory
A similar trajectory can be observed in terms of classification accuracy as well when the model is attacked by different attackers. In Figure 4, the robust accuracy under RNNAdv drops most rapidly and also achieves the lowest point after the entire 10step attack. It further verifies that our learned optimizer can guide the optimization along a better trajectory for the inner problem, meaning that crafted adversarial examples are much more powerful. This in turn contributes to a more robust model.
DefenseAttack  Natural  PGD10  PGD100  CW100  Min 

Plain  97.66  0.00  0.00  0.00  0.00 
AdvTrain  91.43  8.50  4.88  9.04  4.88 
TRADES  72.51  6.96  4.80  10.47  4.80 
RNNAdv  90.28  10.13  6.04  12.98  6.04 
RNNTRADES  73.84  9.76  5.79  13.22  5.79 
Appendix 0.C Experiments on Restricted ImageNet
Without loss of generality, we conduct extra experimental analysis on a larger scale dataset, Restricted ImageNet [tsipras2018robustness]. It is a subset of 9 different superclasses extracted from the entire ImageNet to reduce the computational burden for adversarial training. We adopt the structure of ResNet18 as the classifier for this dataset. Following the literature [sinha2019harnessing, tsipras2018robustness], the inner attack strength of all defense methods is set to be 0.005 in ball while models trained from these mechanisms are evaluated under the attack with the radius of 0.025. Note that in this experiment we only finetune the model starting from the naturally trained one for 2 epochs to observe different performance of defense strategies.
From Table 9, we can clearly see that our proposed methods such as RNNAdv and RNNTRADES consistently improve the robust accuracy compared with their original adversarial training algorithms. Specifically, models trained by our methods are always percents more robust that others under various attacks.