1 Introduction
Despite the remarkable success of deep neural networks (DNNs) in various applications [15], these models are vulnerable to adversarial examples [3, 12, 16, 24, 38], which are maliciously generated by adding imperceptible perturbations to normal examples but can cause erroneous predictions. As the vulnerability of DNNs raises concerns in various securitysensitive applications, a large number of adversarial defense methods have been proposed, including randomization [6, 48], image denoising [26], ensemble learning [32, 41], adversarial detection [25, 30], and adversarial training [16, 28], and certified defenses [45]. However, most of the defenses have soon been shown to be ineffective against stronger or adaptive attacks [2, 13, 40, 42, 50], making it challenging to understand the effects of the current defenses and identify the progress of the field. Among these defenses, adversarial training is arguably the most effective approach [2, 11], which trains the network on the adversarial examples generated by different attacks instead of the natural examples [28].
The most widely adopted approach to evaluate adversarial robustness is using adversarial attacks. One of the most popular attacks is the projected gradient descent (PGD) method [28]
, which iteratively generates an adversarial example by performing gradient updates to maximize a classification loss (e.g., crossentropy loss) w.r.t. input. Based on PGD, recent improvements have been made in different aspects, including using different loss functions
[3, 7, 18], adjusting the step size during adversarial attacks [7], and introducing new strategies of initialization [39]. For example, AutoAttack [7] establishes a stronger baseline for reliable evaluation of adversarial robustness, which is composed of an ensemble of four attack methods, including APGD_{CE}, APGD_{DLR}, FAB, and Square Attack. The output diversified initialization (ODI) [39] method proposes to improve the diversity of adversarial examples in the output space by adopting a new initialization strategy.Stage I  Stage II & Final Stage  

Defense Model  Architecture  Dataset  Defense Model  Architecture  Dataset 
Madry et al. (2018) [28]  WRN2810  CIFAR10  Wang et al. (2020) [44]  WRN2810  CIFAR10 
Carmon et al. (2019) [4]  WRN2810  CIFAR10  Ding et al. (2020) [9]  WRN2810  CIFAR10 
Zhang et al. (2019) [53]  WRN3410  CIFAR10  Wang & Zhang (2019) [43]  WRN2810  CIFAR10 
Zhang & Wang (2019) [52]  WRN2810  CIFAR10  Huang et al. (2020) [20]  WRN3410  CIFAR10 
Hendrycks et al. (2019) [19]  WRN2810  CIFAR10  Chen et al. (2020) [5]  (3) ResNet50  CIFAR10 
Pang et al. (2020a) [34]  WRN3410  CIFAR10  Gowal et al. (2020) [17]  WRN2810  CIFAR10 
Rice et al. (2020) [35]  WRN3410  CIFAR10  Alayrac et al. (2019) [1]  WRN1068  CIFAR10 
Pang et al. (2020b) [31]  ResNetv247  CIFAR10  Dong et al. (2020) [10]  WRN2810  CIFAR10 
Wong et al. (2020) [46]  ResNet18  CIFAR10  Pang et al. (2021) [33]  DenseNet121  CIFAR10 
Shafahi et al. (2019) [37]  WRN3410  CIFAR10  Pang et al. (2021) [33]  DenseNet201  CIFAR10 
Wu et al. (2020) [47]  WRN2810  CIFAR10  Pang et al. (2021) [33]  DPN  CIFAR10 
Sehwag et al. (2020) [36]  WRN2810  CIFAR10  Pang et al. (2021) [33]  WRN3410  CIFAR10 
Pang et al. (2021) [33]  WRN3410  CIFAR10  Pang et al. (2021) [33]  WRN3410  CIFAR10 
Wong et al. (2020) [46]  ResNet50  ImageNet  Pang et al. (2020a) [34]  ResNet50  ImageNet 
Shafahi et al. (2019) [37]  ResNet50  ImageNet  Xie et al. (2019) [49]  ResNet152  ImageNet 
To further accelerate the research on reliable evaluation of adversarial robustness of different defense models, we organized this competition in which the participants were required to develop effective whitebox attack algorithms. We focus on image classification models since most adversarial defenses are developed on typical image benchmarks, e.g., CIFAR10 [23] and ImageNet [8]. We only consider adversarial training models as the evaluated defenses since 1) these models are the stateoftheart defenses; 2) these models do not have randomness in their predictions; and 3) the model gradient can be calculated without causing nonexistence of gradients. The whole competition consisted of three stages. In the first stage, we evaluated the submissions of whitebox attacks on 15 defense models, including 13 models on CIFAR10 and 2 models on ImageNet. We used the first 1,000 images of the CIFAR10 test set and randomly chose 1,000 images of the ILSVRC 2012 validation set for evaluation. The top100 teams of the first stage can enter the second one. In the second stage, we evaluated submissions on another set of 15 secret defense models with the same images, to ensure that the submissions cannot adapt to the details of the defense models. The top20 teams can enter the final stage, in which we evaluated the final submissions using the whole CIFAR10 test set containing 10,000 images and randomly chosen 10,000 images from the ILSVRC 2012 validation set. We evaluated the submissions of whitebox attacks by the misclassification rate of the defense models (higher is better). Below we present more details about the competition and the top6 solutions.
2 Competition Details
This competition was conducted on an adversarial robustness evaluation platform — ARES (https://github.com/thuml/ares) [11]. The participants needed to implement whitebox attack algorithms following the Attacker class in ARES. The whole competition consisted of three stages with different models and datasets.
2.1 Models and Datasets
Stage I. In the first stage, we evaluated the submissions of whitebox attacks on 15 defense models, which include 13 models trained on CIFAR10 and 2 models trained on ImageNet. The models are shown in Table 1. We used the first 1,000 images from the CIFAR10 test set and randomly chose 1,000 images from the ILSVRC 2012 validation set for evaluating the submissions on the public leaderboard.
Stage II. The top100 teams in Stage I can enter Stage II. In this stage, we evaluated submissions on another set of 15 secret models. The models are shown in Table 1. The datasets are the same as Stage I.
Final Stage. We evaluated the final scores of the top20 teams in Stage II. Each participant can select two submissions for the final evaluation. The models are the same as those used in Stage II. We used all the 10,000 images from the CIFAR10 test set and randomly chosen 10,000 images from the ILSVRC 2012 validation set for final evaluations.
2.2 Evaluation
The participants were required to submit the source code of their developed whitebox attacks. The submissions were evaluated under the untargeted norm threat models. For CIFAR10, the perturbation budget is . For ImageNet, the perturbation budget is
. The submitted attacks have access to the whitebox models, but are limited to using their logit outputs. The participants can design losses and attack algorithms based on the logits for gradient calculations. We encouraged the participants to develop “general” attack algorithms, which means that they are not specified to each model. We evaluated the submitted attacks by the misclassification rate of the defenses models (higher is better), which is computed by the following formula
(1) 
where is an attack method that returns the adversarial example given a natural one, is a defense model, and is an imagelabel pair. The participants needed to ensure that returns an adversarial example whose distance from is smaller than , otherwise we would clip the adversarial example within the range.
2.3 Additional Restrictions
To save computational cost and make a fair comparison between different attack methods, we restricted the runtime of the attack submissions. The average number of backward gradient calculations per image should be less than 100; the average number of forward model predictions should be less than 200. The total runtime of a submission for all models should be less than 3 hours on a Tesla V100 GPU (as a comparison, the baseline PGD100 attack needs less than 2 hours for all models). When the attack runs backward gradient calculation, running forward model prediction is necessary, such that each backward gradient calculation would consume one backward gradient calculation quota and one forward model prediction quota, even if the attack does not explicit run the forward model predictions.
Team  Stage I  Stage II  Final Stage 

green hand  52.53  50.93  51.104 
UMSIAT  52.39  50.65  50.961 
S3L  52.13  50.56  50.955 
BorderLine  52.40  50.69  50.895 
Kanra  52.18  50.65  50.888 
BalaBala2020  51.89  50.55  50.887 
3 Competition Results
There are more than 1,600 teams participating the competition. The total number of submissions is over 2,500. We received about 100 highquality attack algorithms during the competition. The scores of the top6 submissions are shown in Table 2. The final scores of the top6 submissions are very close. They all lie within . After the competition, we summarized the results and established a new adversarial robustness benchmark at https://ml.cs.tsinghua.edu.cn/aresbench/. This benchmark includes several typical attacks (e.g., PGD100, MIM100, CW100) as well as the top5 attacks in the competition to evaluate adversarial robustness of different defense models. This benchmark also allows users to upload adversarial attacks and defenses for evaluation. We will further maintain the benchmark results after the competition for evaluating and comparing future attack or defense methods.
4 Top Scoring Submissions
4.1 1st place: green hand
Team members: Ye Liu, Qilong Zhang, Lianli Gao
4.1.1 Method
Since the numbers of backward gradient calculations and the forward model predictions are limited, we need to make effective use of it. We adopt ODIPGD [39] as the baseline method and make several improvements upon it.
Step decay: The fixed step size is suboptimal, so we gradually reduce the step size from the maximum to one hundredth of the maximum step size during the attack. Specifically, we use the SGDR [27] method in the step adjustment strategy. In each restart, we utilize a large step size at the beginning and then quickly reduce the step size so that we can converge to local maximum of loss. The overall step size can be formalized as:
(2) 
where indicates the step size of the th iteration, represents the smallest step size, represents the maximum step size, denotes the current iteration number, and represents the total number of iterations.
Delete difficult examples: The attack difficulty of different examples is different, and we cannot successfully attack all examples. Therefore, we do not attack the difficult examples. Specifically, we judge the difficulty of the examples based on the loss. If the loss is relatively small, we think the example is more difficult to attack, if the loss is relatively large, we think the example is easier to attack successfully.
Iteration increase: In order to efficiently use the number of iterations, we use multiple restarts. At the beginning, the number of iterations for restart is smaller. As the number of restarts increases, the number of iterations included in restart also increases.
Bias output diversity initialization: The output diversity initialization method is moving in a random direction. We propose a biased output diversity initialization method, which can move in a better direction to improve the efficiency of the attack.
4.1.2 Submission Details and Results
Submission details: The number of restarts is 17, the number of iterations is [1060], maximum step size is 8/255, smallest step size is .
Results: The preliminary score is 52.53, the semifinal score is 50.93, the final score is 51.104. All scores are ranked first.
4.2 2nd place: team UMSIAT
Team members: Yunrui Yu, Xitong Gao
4.2.1 Method
We used the new surrogate loss function introduced in our earlier work [51], which can improve attack success rate and convergence rate with this function when compared to the original loss function used to train the model. For this competition, we divided the model into robust models and nonrobust ones, and designed different strategies for each type. In order to maximize the score, we additionally explored the use of stepsize schedules, ensemble strategies, and adapts the number of iterations for different image samples. The next section will explain our method in detail.
4.2.2 Submission Details and Results
We found that for many of the defense models, the output from the standard softmax crossentropy loss function is often easily saturated, , it is prone to underflow in floatingpoint arithmetic, or produce negligible backpropagation signals. In some cases, attacks based on backpropagation on this function will fail. To overcome this, we used the LAFEAT loss function introduced in our earlier work
[51], which constrains the range of the difference between the maximum and the second largest value in the logits output of the model:(3) 
where is the prelogits network output, is the onehot groundtruth label, and denotes the elementwise product. Using this loss function, gradientbased attacks can in general converge significantly faster, and improve the success rate of attacks.
The second tactic we adopted in our method is targeted variants of the above loss function. Instead of maximizing the loss of the groundtruth label, the targeted variant minimizes the loss of the target label:
(4) 
We found that the targeted variants could further reduce the accuracy of the nonrobust models. Based on this finding, we designed a simple tactic to automatically distinguish if the model is robust against our baseline strategy, and adjust our strategy accordingly. We designed a 10iteration attack which uses the untargeted surrogate loss, followed by the top3 targets of the DLR loss [7]
. We classify models as nonrobust, if the robustness of the model is dropped sharply during the targeted phase. We divide the remaining number of gradientiteration budget by 9 to attack the nonrobust model by the top9 targets of the new loss function. We will use all remaining iterations to attack robust models using the untargeted loss function. As the different attack settings are carried out in sequence, we allow new attack settings to continue working on the previous perturbation generated by the previous setting.
We explored different choices of step size schedules, and used as the final version, where is the current iteration count and is the number of iterations of the current attack. After each attack iteration, if the sample has been attacked successfully, we will stop further attacks on the sample and allocate the remaining iterations to outstanding samples that have not been attacked successfully.
To ensure that the attack strategy can generalize to new models, we search for the optimal configuration with results that are relatively stable under local hyperparameter changes.
Result:
The final score is 50.961.
4.3 3rd place: team S3L
Team members: Zhe Zhao and Daquan Lin
4.3.1 Method
The critical point of this competition is the limitation on queries, which requires the attacker to construct more efficient attack methods. We conducted a simple test of each adversarially trained model using PGD attack, and found that these models often have similar characteristics when facing an adversarial attack: about 50% of the benign samples can be easily attacked successfully, about 40% of the samples are almost impossible to find adversarial examples, the remaining 10% are the key to compete with other teams, and these images have adversarial examples, but require a lot of iterations to find them.
We designed our method for this competition from two perspectives.

For images that are easy to attack: finding adversarial examples using as few iterations as possible.

For images that cannot be attacked, finding and excluding them using as few iterations as possible.
The key point of our method is to estimate
local robustness effectively and efficiently.Definition 4.1 (Local Robustness)
Given a sample input , a DNN and a perturbation threshold , is local robust iff for any sample input such that , we have , where is the norm to measure the distance between two images.
There are some techniques based on software analysis and verification (e.g., interval analysis, abstract interpretation, etc. [21, 54]) that can derive the robust accuracy of a neural network soundly. The robust accuracy refers to the accuracy that a neural network can achieve under the local robustness constraint. In this competition the models are whitebox models, but the organizers restrict the attackers to use only gradient data. Thus we cannot use the verification techniques mentioned above. Therefore, we propose a novel attack method, named outsideinside attack (OIA), to estimate whether an input has the potential to be successfully attacked.^{1}^{1}1The paper based on OIA is in preparation and will be available soon. The inspiration of OIA comes from [55], which uses attack costs to estimate local robustness.
OIA first attacks the benign samples using a perturbation larger than (e.g., ), and subsequently projects the perturbation to under constraint. In the first step, if the image is successfully attacked with a larger perturbation, we consider it to have an adversarial example in the interval. While for a failed attack, we think the input has no adversarial example in the interval, i.e., satisfying local robustness. For these images, we remove them from the subsequent iterations, thus saving the iterations for those images that are likely to be successfully attacked.
4.3.2 Submission Details and Results
For OIA, we used a 5step BIM attack with . Subsequently, we used ODIPGD attack [39]. Referring to the hyperparameters provided by [39], we performed a 2step ODI attack with . Then a 20step PGD search was executed. It is worth mentioning that we use the method recommended in AutoAttack [7] to select the optimal perturbation obtained during the previous attack as the starting point for the next round.
Using OIA before ODIPGD attack, we can optimize the two perspectives mentioned in Sec. 4.3.1 at the same time. First, OIA can quickly attack those vulnerable samples whose perturbations are also easily effective after clipping. In addition, OIA uses only a few iterations to filter out a large number of images that cannot be successfully attacked under constraint. However, OIA only provides an approximate estimate of local robustness and cannot give a theoretical guarantee, but from our statistical results, the false negative rate of this method is only about 0.1.
During the implementation of PGD attack, we hardcoded the number of iterations and step size decay. In the initial few restarts, we use a smaller number of iterations to search quickly, and in the subsequent restarts, we increase the number of iterations. During the iterations, we keep reducing the step size to avoid oscillations. Inspired by DFA [14], we constrain the minimum stepsize in the reducing process to avoid accuracy problems. In the end, we got 50.955 points and took third place.
4.4 4th place: team BorderLine
Team members: Jiadong Lin and Chuanbiao Song
4.4.1 Method
In this section, we introduce the random real target (RRT) attack method, which won the 4th place in the competition. The idea is to develop a novel sampling strategy for the initial perturbed points to improve the efficiency of adversarial attack.
Random sampling is crucial for the optimizationbased whitebox attacks, which helps to find a diverse set of initial perturbed points for the attack success. Many sampling strategies have been used to improve the attack efficiency, such as uniform sampling, Gaussian sampling and output diversified sampling (ODS) [39]. Currently, ODS is widely used to randomly restart for the whitebox attacks and leads to achieve the best performance among the sampling strategies. The basic idea of ODS is randomly specify a direction in the output space and then perform gradientbased optimization to generate a perturbation in the input space. However in practice, the randomly direction in the output space may not be a great representation of the effective and true direction, which limits the attack strength.
In order to overcome this drawback, we propose random real target (RRT) sampling, a novel sampling strategy that attempts to enhance the sampling diversity in the space of the target model’s outputs while maintain its authenticity. RTT and ODS differ in two ways. First, RRT uses the logits of the real target image as the optimization direction, while ODS uses the random noises sampled from the uniform distribution. Second, RRT uses cosine similarity as the distance for the optimization, while ODS directly uses the sampled direction as the distance.
Given an original image , a random real target image
, we define the perturbation vector of RRT as follows:
where is a classifier that maps the input image to the logits , is input dimension and is the number of classes.
In the competition, we utilize RRT for initialization to generate diversified initial perturbed points and maintain the representative and authenticity of the logits of the initial perturbed points. Following ODS, we perform RRT initialization by the gradientbased optimization. Given an original image and a random real target image , we try to find a restart point via the following iterative update:
where is the set of allowed perturbations, and indicates the projection of the set . Similar to ODS, we randomly sample a new target image for each random restart for the attack. After obtaining the starting points, we perform the projected gradient descent (PGD) attack to generate adversarial examples.
4.4.2 Submission Details and Results
In this section, we summarize the submission details including hyperparameters setting and some tricks, and the submission results.
Iterative number. Iterative number is a crucial parameter for adversarial attacks. In this competition, an average of 100 iterations were allowed per sample. We set the iterative number of RRT initialization as 2 and the iterative number of PGD attack as 18.
Learning rate. Following the MD attack method [22], we use a large step size () in the first stage (steps 04) for a better exploration, and a small step size () in the second stage (steps 517) to ensure a stable optimization.
Momentum update. Following the MIM attack [12], we also integrated the momentum idea into PGD to accelerate the attack. Since in the first stage, the update direction is highly inaccurate, we only perform momentum update in the second stage (step 517).
MultiTargeted attack. Following the MT attack [18], at each restart, we pick a new target class for targeted attack. We first sort the target classes based on the logit outputs, from the highest to the bottom. Then, in turn, the target class is selected to attack until the allowed number of iterations are run out.
Submission Results. As shown in Table 3, we report the major milestone for the changes in attack method. We can see that, at the Stage I, the ODI, MT and MIM strategies can boost the attack success of the PGD attack, and our RRT provides better performance than the ODI, which we can attribute to the diverse and real initial perturbed points.
Scenario  Stage I  Stage II  Final 

RRT+PGD+MT+MIM  52.40  50.69  50.895 
ODI+PGD+MT+MIM  52.23     
ODI+PGD+MT  52.09     
ODI+PGD  51.76     
PGD  46.18     
4.5 5th place: team Kanra
Team members: Zihao Wang, Zhennan Wu, Yang Guo
4.5.1 Method
Since the average number of backward gradient calculations per image and forward model predictions are restricted, to improve the efficiency of the attack, we propose Fast Restart Projected Gradient Descent (FRPGD), a twophase strategy that can efficiently utilize the limited resources.
Figure 1 demonstrates a highlevel workflow of our attack. The attack can be divided into two phases: Fast Restart phase and multistep convergence phase. Compared with previous restart methods, each restart of FRPGD takes only a small number of backward gradient calculations. Therefore, the number of restarts can be guaranteed even in the case of resource constraints. Then, several adversarial samples from the Fast Restart phase (the number of the samples depends on how many samples have not been successfully attacked in the restart phase) with the highest Loss value are sent to the multistep convergence phase. Finally, the convergence results are returned.
It is worth noting that although we choose the highest Loss value each time, there is only a positive correlation between the Loss value and the final success probability of the attack to a certain extent, but not absolutely. The more the number of backward gradient calculations in the restart phase, the stronger the correlation. However, this will lead to a reduction in the number of restarts or the steps of convergence, since the resources are limited. Therefore, it needs to be weighed according to resources.
4.5.2 Submission Details and Results
Specifically, a Output Diversified Initialization (ODI) [39] version of Margin Decomposition (MD) attack [22] is used in the Fast Restart phase. The loss function in each restart is defined as follows:
(5)  
where, is the number of backward gradient calculations in the restart phase, is the th restart, is the modulo operation for alternating optimization, and defines the loss function used at the th step and th restart. The loss function switches from the individual terms back to the full margin loss at step . In each restart, we perform 2 steps of ODI and 4 steps of gradient ascents. After each restart, the computing resources are reallocated. Specifically, the resources of the samples that have been successfully attacked are allocated to the samples that have not been done, so that the unfinished samples can get more number of restarts.
In the multistep convergence phase, Momentum Iterative Method (MIM) [12] with the C&W Loss [3] is performed. We adopt scheduled step size instead of fixed one. Because we found that starting from large step size brings better results. We set the initial step size as . We update the step size to at , respectively.
The ratio of resources of the Fast Restart phase and the multistep convergence phase is set to 4:1 empirically. Our method got a score of 50.888 in the Final Stage.
4.6 6th place: team BalaBala2020
Team members: Jiequan Cui, Xiaogang Xu, Pengguang Chen
4.6.1 Method
The team proposed a novel attack strategy called DifficultyHierarchical Attack (DH Attack) for attacking target models with a constrained number of attack iterations. In this strategy, more attack iterations will be conducted for the robust samples that are harder to perturb, improving the integral attack success rate. Moreover, a global perturbation [29] is utilized, attacking a part of samples at the beginning of each attack iteration and allowing more attack iterations for robust samples. Furthermore, several effective tricks are explored and adopted in DH Attack.
Hierarchical Attack. To optimally utilize the constrained attack iterations, the team distributes the number of attack iteration to each sample according to their robustness towards adversarial perturbations. The attack is conducted hierarchically with multiple restart times, and once one sample is successfully attacked, the attack process for this sample will be canceled, preserving more attack iterations for robust samples.
Global Perturbation. Moreover, the team utilizes the transferability of adversarial attacks and creates a global perturbation by accumulating the perturbations of different samples. The creation of global perturbation does not involve the backward of the network, and it can be utilized at the beginning of each restart to attack a part of the samples.
Other Tricks.
I. Loss function ensemble. The team observed that different robust models are sensitive to different loss functions for generating adversarial examples. After each restart with ODI initialization [39], DH Attack would uniformly choose loss function from {MarginLoss [3], DLRLoss [7]}.
II. Decaying stepsize. Adopting cosine schedule for stepsize is better than a fixed value.
III. ODI initialization with restricted search directions. In the ODI algorithm, the generated initialization with one random vector uniformly exists in the whole search space. It keeps the diversity of initialization at the cost of attack efficiency. In experiments, the team observed that, for most successfully attacked images, the corresponding adversarial examples are misclassified into the first or the second most possible classes finally. Based on this phenomenon, after each ODI initialization, DH Attack would apply 5 attack iterations with a multitargeted attack loss, ı.e., summing the targeted loss for the first and the second most possible classes that could be misclassified into.
4.6.2 Submission Details and Results
For ODI initialization, the team applied 5 iterations to update. The initial stepsize is 4/255 and is decayed with a cosine schedule for each restart. In the beginning, the team utilized a 10 iterations attack to filter the easiest samples that can be attacked. We use 20 iterations for all the remaining restart if the backward quota is enough. The team achieved a score of 50.887 for the final testing and rank sixth.
5 Conclusion
We successfully organized the Adversarial Attacks on ML Defense Models competition, which attracted thousands of teams to participate. We obtained many effective attack algorithms to evaluate adversarial robustness reliably and correctly, which helps to accelerate the research on robustness evaluation. The technical details of the top scoring submissions are presented in this paper. We also established an adversarial robustness benchmark to summarize the results, which could be further used for newly developed attack and defense methods.
References
 [1] JeanBaptiste Alayrac, Jonathan Uesato, PoSen Huang, Alhussein Fawzi, Robert Stanforth, and Pushmeet Kohli. Are labels required for improving adversarial robustness? In Advances in Neural Information Processing Systems (NeurIPS), pages 12192–12202, 2019.
 [2] Anish Athalye, Nicholas Carlini, and David Wagner. Obfuscated gradients give a false sense of security: Circumventing defenses to adversarial examples. In International Conference on Machine Learning (ICML), 2018.
 [3] Nicholas Carlini and David Wagner. Towards evaluating the robustness of neural networks. In 2017 IEEE Symposium on Security and Privacy (SP), pages 39–57, 2017.
 [4] Yair Carmon, Aditi Raghunathan, Ludwig Schmidt, Percy Liang, and John C Duchi. Unlabeled data improves adversarial robustness. In Advances in Neural Information Processing Systems (NeurIPS), 2019.

[5]
Tianlong Chen, Sijia Liu, Shiyu Chang, Yu Cheng, Lisa Amini, and Zhangyang
Wang.
Adversarial robustness: From selfsupervised pretraining to
finetuning.
In
Conference on Computer Vision and Pattern Recognition (CVPR)
, pages 699–708, 2020.  [6] Jeremy M Cohen, Elan Rosenfeld, and J Zico Kolter. Certified adversarial robustness via randomized smoothing. In International Conference on Machine Learning (ICML), 2019.
 [7] Francesco Croce and Matthias Hein. Reliable evaluation of adversarial robustness with an ensemble of diverse parameterfree attacks. In International Conference on Machine Learning (ICML), pages 2206–2216, 2020.
 [8] Jia Deng, Wei Dong, Richard Socher, LiJia Li, Kai Li, and Li FeiFei. Imagenet: A largescale hierarchical image database. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2009.
 [9] Gavin Weiguang Ding, Yash Sharma, Kry Yik Chau Lui, and Ruitong Huang. Mma training: Direct input space margin maximization through adversarial training. In International Conference on Learning Representations (ICLR), 2020.

[10]
Yinpeng Dong, Zhijie Deng, Tianyu Pang, Hang Su, and Jun Zhu.
Adversarial distributional training for robust deep learning.
In Advances in Neural Information Processing Systems, 2020.  [11] Yinpeng Dong, QiAn Fu, Xiao Yang, Tianyu Pang, Hang Su, Zihao Xiao, and Jun Zhu. Benchmarking adversarial robustness on image classification. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 321–331, 2020.
 [12] Yinpeng Dong, Fangzhou Liao, Tianyu Pang, Hang Su, Jun Zhu, Xiaolin Hu, and Jianguo Li. Boosting adversarial attacks with momentum. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 9185–9193, 2018.
 [13] Yinpeng Dong, Tianyu Pang, Hang Su, and Jun Zhu. Evading defenses to transferable adversarial examples by translationinvariant attacks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 4312–4321, 2019.
 [14] Yuchao Duan, Zhe Zhao, Lei Bu, and Fu Song. Things you may not know about adversarial example: A blackbox adversarial image attack. arXiv preprint arXiv:1905.07672, 2019.
 [15] Ian Goodfellow, Yoshua Bengio, and Aaron Courville. Deep Learning. MIT Press, 2016. http://www.deeplearningbook.org.
 [16] Ian J Goodfellow, Jonathon Shlens, and Christian Szegedy. Explaining and harnessing adversarial examples. In International Conference on Learning Representations (ICLR), 2015.
 [17] Sven Gowal, Chongli Qin, Jonathan Uesato, Timothy Mann, and Pushmeet Kohli. Uncovering the limits of adversarial training against normbounded adversarial examples. arXiv preprint arXiv:2010.03593, 2020.
 [18] Sven Gowal, Jonathan Uesato, Chongli Qin, PoSen Huang, Timothy Mann, and Pushmeet Kohli. An alternative surrogate loss for pgdbased adversarial testing. arXiv preprint arXiv:2006.13726, 2020.
 [19] Dan Hendrycks, Kimin Lee, and Mantas Mazeika. Using pretraining can improve model robustness and uncertainty. In International Conference on Machine Learning (ICML), pages 2712–2721, 2019.
 [20] Lang Huang, Chao Zhang, and Hongyang Zhang. Selfadaptive training: beyond empirical risk minimization. arXiv preprint arXiv:2002.10319, 2020.
 [21] Xiaowei Huang, Daniel Kroening, Wenjie Ruan, James Sharp, Youcheng Sun, Emese Thamo, Min Wu, and Xinping Yi. A survey of safety and trustworthiness of deep neural networks: Verification, testing, adversarial attack and defence, and interpretability. Computer Science Review, 37:100270, 2020.
 [22] Linxi Jiang, Xingjun Ma, Zejia Weng, James Bailey, and YuGang Jiang. Imbalanced gradients: A new cause of overestimated adversarial robustness. arXiv preprint arXiv:2006.13726, 2020.
 [23] Alex Krizhevsky and Geoffrey Hinton. Learning multiple layers of features from tiny images. Technical report, University of Toronto, 2009.
 [24] Alexey Kurakin, Ian Goodfellow, and Samy Bengio. Adversarial examples in the physical world. In International Conference on Learning Representations (ICLR) Workshops, 2017.
 [25] Kimin Lee, Kibok Lee, Honglak Lee, and Jinwoo Shin. A simple unified framework for detecting outofdistribution samples and adversarial attacks. In Advances in Neural Information Processing Systems (NeurIPS), 2018.
 [26] Fangzhou Liao, Ming Liang, Yinpeng Dong, Tianyu Pang, Xiaolin Hu, and Jun Zhu. Defense against adversarial attacks using highlevel representation guided denoiser. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018.
 [27] Ilya Loshchilov and Frank Hutter. Sgdr: Stochastic gradient descent with warm restarts. arXiv preprint arXiv:1608.03983, 2016.
 [28] Aleksander Madry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, and Adrian Vladu. Towards deep learning models resistant to adversarial attacks. In International Conference on Learning Representations (ICLR), 2018.
 [29] SeyedMohsen MoosaviDezfooli, Alhussein Fawzi, Omar Fawzi, and Pascal Frossard. Universal adversarial perturbations. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017.
 [30] Tianyu Pang, Chao Du, Yinpeng Dong, and Jun Zhu. Towards robust detection of adversarial examples. In Advances in Neural Information Processing Systems (NeurIPS), pages 4579–4589, 2018.
 [31] Tianyu Pang, Kun Xu, Yinpeng Dong, Chao Du, Ning Chen, and Jun Zhu. Rethinking softmax crossentropy loss for adversarial robustness. In International Conference on Learning Representations (ICLR), 2020.
 [32] Tianyu Pang, Kun Xu, Chao Du, Ning Chen, and Jun Zhu. Improving adversarial robustness via promoting ensemble diversity. In International Conference on Machine Learning (ICML), 2019.
 [33] Tianyu Pang, Xiao Yang, Yinpeng Dong, Hang Su, and Jun Zhu. Bag of tricks for adversarial training. In International Conference on Learning Representations (ICLR), 2021.
 [34] Tianyu Pang, Xiao Yang, Yinpeng Dong, Kun Xu, Hang Su, and Jun Zhu. Boosting adversarial training with hypersphere embedding. In Advances in Neural Information Processing Systems (NeurIPS), 2020.
 [35] Leslie Rice, Eric Wong, and J Zico Kolter. Overfitting in adversarially robust deep learning. In International Conference on Machine Learning (ICML), 2020.
 [36] Vikash Sehwag, Shiqi Wang, Prateek Mittal, and Suman Jana. Hydra: Pruning adversarially robust neural networks. In Advances in Neural Information Processing Systems (NeurIPS), 2020.
 [37] Ali Shafahi, Mahyar Najibi, Amin Ghiasi, Zheng Xu, John Dickerson, Christoph Studer, Larry S Davis, Gavin Taylor, and Tom Goldstein. Adversarial training for free! In Advances in Neural Information Processing Systems (NeurIPS), 2019.
 [38] Christian Szegedy, Wojciech Zaremba, Ilya Sutskever, Joan Bruna, Dumitru Erhan, Ian Goodfellow, and Rob Fergus. Intriguing properties of neural networks. In International Conference on Learning Representations (ICLR), 2014.
 [39] Yusuke Tashiro, Yang Song, and Stefano Ermon. Diversity can be transferred: Output diversification for white and blackbox attacks. In Advances in Neural Information Processing Systems (NeurIPS), pages 4536–4548, 2020.
 [40] Florian Tramer, Nicholas Carlini, Wieland Brendel, and Aleksander Madry. On adaptive attacks to adversarial example defenses. In Advances in Neural Information Processing Systems (NeurIPS), 2020.
 [41] Florian Tramèr, Alexey Kurakin, Nicolas Papernot, Ian Goodfellow, Dan Boneh, and Patrick McDaniel. Ensemble adversarial training: Attacks and defenses. arXiv preprint arXiv:1705.07204, 2017.
 [42] Jonathan Uesato, Brendan O’Donoghue, Aaron van den Oord, and Pushmeet Kohli. Adversarial risk and the dangers of evaluating against weak attacks. In International Conference on Machine Learning (ICML), 2018.
 [43] Jianyu Wang and Haichao Zhang. Bilateral adversarial training: Towards fast training of more robust models against adversarial attacks. In IEEE International Conference on Computer Vision (ICCV), pages 6629–6638, 2019.
 [44] Yisen Wang, Difan Zou, Jinfeng Yi, James Bailey, Xingjun Ma, and Quanquan Gu. Improving adversarial robustness requires revisiting misclassified examples. In International Conference on Learning Representations (ICLR), 2020.
 [45] Eric Wong and Zico Kolter. Provable defenses against adversarial examples via the convex outer adversarial polytope. In International Conference on Machine Learning (ICML), 2018.
 [46] Eric Wong, Leslie Rice, and J. Zico Kolter. Fast is better than free: Revisiting adversarial training. In International Conference on Learning Representations (ICLR), 2020.
 [47] Dongxian Wu, ShuTao Xia, and Yisen Wang. Adversarial weight perturbation helps robust generalization. In Advances in Neural Information Processing Systems (NeurIPS), 2020.
 [48] Cihang Xie, Jianyu Wang, Zhishuai Zhang, Zhou Ren, and Alan Yuille. Mitigating adversarial effects through randomization. In International Conference on Learning Representations (ICLR), 2018.
 [49] Cihang Xie, Yuxin Wu, Laurens van der Maaten, Alan Yuille, and Kaiming He. Feature denoising for improving adversarial robustness. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2019.
 [50] Xiao Yang, Yinpeng Dong, Tianyu Pang, Hang Su, and Jun Zhu. Boosting transferability of targeted adversarial examples via hierarchical generative networks. arXiv preprint arXiv:2107.01809, 2021.
 [51] Yunrui Yu, Xitong Gao, and ChengZhong Xu. LAFEAT: Piercing through adversarial defenses with latent features. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2021.
 [52] Haichao Zhang and Jianyu Wang. Defense against adversarial attacks using feature scatteringbased adversarial training. In Advances in Neural Information Processing Systems (NeurIPS), pages 1829–1839, 2019.
 [53] Hongyang Zhang, Yaodong Yu, Jiantao Jiao, Eric P Xing, Laurent El Ghaoui, and Michael I Jordan. Theoretically principled tradeoff between robustness and accuracy. In International Conference on Machine Learning (ICML), 2019.
 [54] Yedi Zhang, Zhe Zhao, Guangke Chen, Fu Song, and Taolue Chen. Bdd4bnn: A bddbased quantitative analysis framework for binarized neural networks. arXiv preprint arXiv:2103.07224, 2021.
 [55] Zhe Zhao, Guangke Chen, Jingyi Wang, Yiwei Yang, Fu Song, and Jun Sun. Attack as defense: Characterizing adversarial examples using robustness. arXiv preprint arXiv:2103.07633, 2021.