1 Introduction
Deep neural networks are well known to be vulnerable to adversarial examples
(Szegedy et al., 2013), i.e., a small perturbation on the original input can lead to misclassification or erroneous prediction. Many defense methods have been developed to mitigate the disturbance of adversarial examples (Guo et al., 2018; Xie et al., 2018; Song et al., 2018; Ma et al., 2018; Samangouei et al., 2018; Dhillon et al., 2018; Madry et al., 2018; Zhang et al., 2019), among which robust training methods, such as adversarial training (Madry et al., 2018) and TRADES (Zhang et al., 2019), are currently the most effective strategies. Specifically, adversarial training method (Madry et al., 2018) trains a model on adversarial examples by solving a minmax optimization problem:(1) 
where is the training dataset,
denotes the logits output of the neural network,
denotes the perturbation ball, and is the crossentropy loss.On the other hand, instead of directly training on adversarial examples, TRADES (Zhang et al., 2019) further improves model robustness with a tradeoff between natural accuracy and robust accuracy, by solving the empirical risk minimization problem with a robust regularization term:
(2) 
where denotes the softmax function, and is a regularization parameter. The goal of this robust regularization term (i.e., KL divergence term) is to ensure the outputs are stable within the local neighborhood. Both adversarial training and TRADES achieve good model robustness, as shown on recent model robustness leaderboards^{1}^{1}1https://github.com/fra31/autoattack and https://github.com/uclaml/RayS. (Croce & Hein, 2020b; Chen & Gu, 2020). However, a major drawback lies in that both are highly timeconsuming for training, limiting their usefulness in practice. This is largely due to the fact that both methods perform iterative adversarial attacks (i.e., Projected Gradient Descent) to solve the inner maximization problem in each outer minimization step.
Recently, Wong et al. (2020) shows that it is possible to use singlestep adversarial attacks to solve the inner maximization problem, which previously was believed impossible. The key ingredient in their approach is adding a random initialization step before the singlestep adversarial attack. This simple change leads to a reasonably robust model that outperforms other fast robust training techniques, e.g., Shafahi et al. (2019). However, it remains a mystery why random initialization is empirically effective. Furthermore, compared to stateoftheart robust training models (Madry et al., 2018; Zhang et al., 2019), this approach still lags behind on model robustness.
In this work, we aim to understand the role of random initialization, as well as closing the robustness gap between adversarial training and Fast Adversarial Training (Fast AT) (Wong et al., 2020). We propose a new principle towards understanding Fast AT  that random initialization can be viewed as performing randomized smoothing for better optimization of the inner maximization problem. We demonstrate that the smoothing effect by random initialization is not sufficient under the adversarial perturbation constraint. By proposing a new initialization strategy, backward smoothing, which strengthens the smoothing effect within the perturbation ball, we present a new fast robust training method based on TRADES (Zhang et al., 2019). The resulting method significantly improves both stability and model robustness over the singlestep version of TRADES (Zhang et al., 2019), while consuming much less training time (x improvement with the same training schedule).
2 Related Work
There exists a large body of work on adversarial attacks and defenses. In this section, we only review the most relevant work to ours.
Adversarial Attack The concept of adversarial examples was first proposed in Szegedy et al. (2013). Since then, many methods have been proposed, such as Fast Gradient Sign Method (FGSM) (Goodfellow et al., 2015), and Projected Gradient Descent (PGD) (Kurakin et al., 2016; Madry et al., 2018). Later on, various attacks (Papernot et al., 2016; MoosaviDezfooli et al., 2016; Carlini & Wagner, 2017; Athalye et al., 2018; Chen et al., 2020; Croce & Hein, 2020a) were also proposed for better effectiveness or efficiency. There are also many attacks focused on different attack settings. Chen et al. (2017)
proposed a blackbox attack where the gradient is not available, by estimating the gradient via finitedifferences. Various methods
(Ilyas et al., 2018; AlDujaili & O’Reilly, 2020; Moon et al., 2019; Andriushchenko et al., 2019) have been developed to improve the query efficiency of Chen et al. (2017). Other methods (Brendel et al., 2018; Cheng et al., 2019, 2020) focused on the more challenging hardlabel attack setting, where only the prediction labels are available. On the other hand, there is recent work (Croce & Hein, 2020b; Chen & Gu, 2020) that aims to accurately evaluate the model robustness via ensemble of attacks or effective hardlabel attack.Robust Training
Many heuristic defenses
(Guo et al., 2018; Xie et al., 2018; Song et al., 2018; Ma et al., 2018; Samangouei et al., 2018; Dhillon et al., 2018) were proposed when the concept of adversarial examples was first introduced. However, they are later shown by Athalye et al. (2018) as not truly robust. Adversarial training (Madry et al., 2018) is the first effective method towards defending against adversarial examples. In Wang et al. (2019), a new convergence quality criterion was proposed. Zhang et al. (2019) showed the tradeoff between natural accuracy and robust accuracy. Wang et al. (2020) proposed to improve model robustness by better utilizing misclassified examples. Another line of research utilizes extra information (e.g., pretrained models (Hendrycks et al., 2019) or extra unlabeled data (Carmon et al., 2019; Alayrac et al., 2019)) to further improve robustness. Other work focuses on improving training efficiency, such as free adversarial training from Shafahi et al. (2019) and Fast AT from Wong et al. (2020) using singlestep attack (FGSM) with random initialization. Li et al. (2020) proposed a hybrid approach for improving Fast AT which is orthogonal to ours. Andriushchenko & Flammarion (2020) proposed a new regularizer promoting gradient alignment. Yet, it is not focused on closing the robustness gap with stateofthearts.Randomized Smoothing Duchi et al. (2012)
proposed the randomized smoothing technique and proved variancebased convergence rates for nonsmooth optimization. Later on, this technique was applied to certified adversarial defenses
(Cohen et al., 2019; Salman et al., 2019) for building robust models with certified robustness guarantees. In this paper, we are not targeting certified defenses. Instead, we use the randomized smoothing concept in optimization to explain Fast AT.3 Why Random Initialization Helps?
We aim to explain why random initialization in Fast AT is effective, and propose a new understanding that random initialization can be viewed as performing randomized smoothing on the inner maximization problem in adversarial training (Madry et al., 2018). Below, we first introduce the randomized smoothing technique (Duchi et al., 2012) in optimization.
It is well known from optimization theory (Boyd et al., 2004)
that nonsmooth objectives are generally harder to optimize compared with smooth objectives. In general, a smoother loss function allows us to use a larger step size while guaranteeing the convergence of gradientbased algorithms. Randomized smoothing technique
(Duchi et al., 2012) was proposed based on the observation that random perturbation of the optimization variable can be used to transform the loss into a smoother one. Instead of using only the gradient at the original iterate, randomized smoothing proposes to randomly generate perturbed iterates and use their gradients for optimization procedure. More details are provided in Appendix A. Let us recall the inner maximization problem in adversarial training:(3) 
Here, denotes a neural network parameterized by
. In general, neural networks are nonsmooth due to ReLU activations and pooling layers. This suggests that (
3) can be difficult to solve, and using gradient descent with large step size can lead to divergence in the maximization problem. It also explains why directly using singlestep projected gradient ascent without random initialization fails (Wong et al., 2020). Now, let us apply randomized smoothing to (3):(4) 
where
is the perturbation vector for randomized smoothing, and
is the perturbation vector for later gradient update step (initialized as zero). Suppose we solve (4) in a stochastic fashion (i.e., sample a random perturbation instead of computing the expectation over ), and using only one step gradient update. We can see that this reduces to the Fast AT formulation. This suggests that Fast AT can be viewed as performing stochastic singlestep attacks on a randomized smoothed objective function which allows using larger step size. This explains why random initialization helps Fast AT as it makes the loss objective smoother and thus easier to optimize.4 Proposed Approach
4.1 Drawbacks of the Random Initialization Strategy
Although Fast AT achieves much faster robust training compared with standard adversarial training (Madry et al., 2018), it exposes several major weaknesses. For demonstration, we exclude the additional acceleration techniques introduced in Wong et al. (2020) for accelerating the training speed (e.g., mixprecision training, cyclic learning rate), and instead apply standard piecewise learning rate decay used in Madry et al. (2018); Zhang et al. (2019) with the decay point set at the th and
th epochs.
Performance Stability As observed in Li et al. (2020), Fast AT can be highly unstable (i.e., large variance in robust performance) when using traditional piecewise learning rate decay schedule. We argue that this is because Wong et al. (2020) utilized a drastically large attack step size (, even larger than the perturbation limit ), which causes unstable training behavior.
To validate this, we run Fast AT on CIFAR10 using ResNet18 model
(He et al., 2016) for times with different step sizes. Note that we adopt earlystopping and record the bestperforming model among epochs. As shown in Figure 1, although the singlebest robustness performance is obtained by using step size , the variance is very high. Moreover, most trials lead to weak robust performance with a low average and median robust accuracy. On the other hand, we observe that when using step size , model robustness is more stable and higher on average. Note that using a too small step size would by nature hurt model robustness. These observations suggest that Fast AT cannot achieve the best performance on robust performance and stability simultaneously.Potential for Robustness Improvement Fast AT uses standard adversarial training (Madry et al., 2018) as the baseline, and can obtain similar robustness performance. However, later work (Rice et al., 2020) shows that adversarial training can cause the overfitting problem, while early stopping can largely improve robustness. Zhang et al. (2019) further achieves even better model robustness that is much higher than what Fast AT obtains. From Table 1, we observe that there exists an robust accuracy gap between Fast AT (average over 10 runs) and the bestperforming TRADES model. Even for the best out of trials, there is still a gap. This indicates that Fast AT is still far from optimal, and there is still big room for further robustness improvement.
Method  Nat (%)  Rob (%) 
Fast AT (avg. over 10 runs)  84.58  44.52 
Fast AT (best over 10 runs)  84.79  46.30 
AT (earlystop)  82.36  51.14 
TRADES  82.33  52.74 
4.2 A Naive Try: Randomized Smoothing for TRADES
As shown in Table 1, TRADES enjoys better model robustness compared with standard adversarial training. A naive attempt is to apply randomized smoothing to TRADES and see if this leads to better robustness than Fast AT. Let us recall the inner maximization formulation for TRADES:
(5) 
Similarly, we can smooth this objective and solve the following objective instead:
(6) 
This leads to the same adversarial example formulation as using random initialization and then performing singlestep projected gradient ascent. We refer to this strategy as Fast TRADES.
We experiment with Fast TRADES using different step sizes, and find that its performance is sensitive to step size, similar to Fast AT. As shown in Figure 3, using a step size of also leads to very low average and median robust accuracy. We notice that reducing the step size to yields better onaverage robustness, which is slightly higher than the average of Fast AT. However, in terms of the bestrun’s performance, there is no significant gain over Fast AT. This suggests that directly applying randomized smoothing to TRADES does not lead to major improvement.
Recall the results from Section 4.1. Applying overlylarge step size in Fast AT and Fast TRADES can lead to unstable training with deteriorated robustness. This suggests that the randomized smoothing effect might not be strong enough (i.e., the objective function is not smooth enough) to enable the use of a larger step size. However, unlike the general randomized smoothing setting, one of the special constraints in the adversarial setting is that random perturbation on the input vector is subject to the ball constraint, therefore cannot be too large. This means that we cannot further increase the smoothing effect by simply using larger random perturbations.
To further validate the claim that the current smoothing effect is not sufficient, we carefully study the changes made by randomized smoothing. Figure 3 shows the KLdivergence between original logits and logits after random initialization, i.e., , as well as KLdivergence between original logits and logits after both random initialization and adversarial perturbation, i.e., . We can see that the KLdivergence after random initialization is negligible compared with that of adversarial perturbation. In fact, it is nearly zero almost all the time, suggesting that the network output at random perturbation point is almost the same as that at the original point. This further suggests that the current smoothing effect by using random initialization is not sufficient, which motivates us to consider how to further boost the smoothing effect within the perturbation ball.
4.3 Backward Smoothing
Now we introduce our proposed method to address the above issue. The goal is to further boost the smoothing effect of randomized smoothing without violating the perturbation constraint. Note that if we are allowed to use larger random perturbations, we expect that will also be larger, meaning that the neural network output of the random initialization should be more different from the original output (as shown in Figure 4). This inspires us to generate the initialization point in a backward fashion. Specifically, let us denote the input domain as the input space, and their corresponding neural network output as the output space, where
is the number of classes for the classifier. We first generate random points in the output space just as randomized smoothing does in the input space,
i.e., , whereis the random variable and
is a small number. Then we find the corresponding input perturbation in a backward fashion and use it as our initialization. An illustrative sketch of our proposed method is provided in Figure 4.Now we formalize our proposed method in mathematical language. The key step in our proposed method is to find the input perturbation such that:
(7) 
In order to find the best to satisfy (7), we turn to solve the following problem:
(8) 
For the sake of computational efficiency, we solve (8) using singlestep PGD in practice. Then, similar to Wong et al. (2020), we use singlestep gradient update for the inner maximization problem:
(9) 
Finally, we update the neural network parameter using stochastic gradients at the adversarial point . A summary of our proposed algorithm is provided in Algorithm 1. Note that the proposed Backward Smoothing seems also compatible with Adversarial Training. However, Adversarial Training does not contain terms using KL divergence loss, which may hinder its performance. We will show this empirically in Section 5.
5 Experiments
In this section, we empirically evaluate the performance of our proposed method. We first compare our proposed method with other robust training baselines on CIFAR10, CIFAR100 (Krizhevsky et al., 2009)
and Tiny ImageNet
(Deng et al., 2009)^{2}^{2}2We do not test on ImageNet dataset mainly due to that TRADES does not perform well on ImageNet as mentioned in
Qin et al. (2019). datasets. We also provide multiple ablation studies as well as robustness evaluation with stateoftheart adversarial attack methods to validate that our proposed method provides effective robustness improvement.5.1 Experimental Setting
Following previous work on robust training (Madry et al., 2018; Zhang et al., 2019; Wong et al., 2020), we set for all three datasets. In terms of model architecture, we adopt standard ResNet18 model (He et al., 2016) for both CIFAR10 and CIFAR100 datasets, and ResNet50 model for Tiny ImageNet. We follow the standard piecewise learning rate decay schedule as used in Madry et al. (2018); Zhang et al. (2019) and set decaying point at th and th epochs. The starting learning rate for all methods are set to , the same as previous work (Madry et al., 2018; Zhang et al., 2019). For Adversarial Training and TRADES methods, we adopt step iterative PGD attack with step size for both. For TRADES, we adopt as reported in their original paper for the best performance. For our proposed method, we set the backward smoothing parameter , and attack step size as . For robust accuracy evaluation, we typically adopt step PGD attack with step size . To ensure the validity of the model robustness improvement is not because of the obfuscated gradient (Athalye et al., 2018), we further test our method with current stateoftheart attacks (Croce & Hein, 2020b; Chen & Gu, 2020).
5.2 Performance Comparison with Robust Training Baselines
We compare the adversarial robustness of Backward Smoothing against standard Adversarial Training (Madry et al., 2018), TRADES (Zhang et al., 2019), as well as fast training methods such as Fast AT (Wong et al., 2020) and our naive baseline Fast TRADES. We also compare with recently proposed Fast AT+ method (Li et al., 2020) that achieves high robustness with reduced training time.^{3}^{3}3Since Li et al. (2020) does not have code released yet, we only compare with theirs in the same setting (combined with acceleration techniques) using reported numbers. Since our proposed backward smoothing initialization utilizes an extra step of gradient backpropagation, we also compare with Fast TRADES using 2step attack for fair comparison.
Method  Nat (%)  Rob (%)  Time (m) 
AT  82.36  51.14  430 
Fast AT  84.79  46.30  82 
TRADES  82.33  52.74  482 
Fast TRADES  84.80  46.25  126 
Fast TRADES (2step)  83.46  48.08  164 
Backward Smoothing  82.38  52.50  164 
Method  Nat (%)  Rob (%)  Time (m) 
AT  55.22  28.53  428 
Fast AT  60.35  24.64  83 
TRADES  56.99  29.41  480 
Fast TRADES  60.22  19.40  126 
Fast TRADES (2step)  58.53  23.87  165 
Backward Smoothing  56.96  30.50  164 
Table 3 shows the performance comparison on the CIFAR10 dataset using ResNet18 model. Our Backward Smoothing method significantly closes the robustness gap between stateoftheart robust training methods, achieving high robust accuracy that is almost as good as TRADES, while consuming much less ( 3x) training time. Compared with Fast AT, Backward Smoothing typically costs twice the training time, yet achieving significantly higher model robustness. Our method also achieves similar performance gain against Fast TRADES. Note that even compared with Fast TRADES using 2step attack, which costs about the same training time as ours, our method still achieves a nearly improvement.
Method  Nat (%)  Rob (%)  Time (m) 
AT  44.50  21.34  2666 
Fast AT  49.58  18.56  575 
TRADES  47.02  21.04  2928 
Fast TRADES  49.20  15.50  805 
Fast TRADES (2step)  46.40  18.20  1045 
Backward Smoothing  46.68  22.32  1035 
Table 3 shows the performance comparison on CIFAR100 using ResNet18 model. We can observe patterns similar to CIFAR10 experiments. Backward Smoothing achieves slightly higher robustness compared with TRADES, while costing much less training time. Compared with Fast TRADES using 2step attack, our method achieves a robustness improvement with roughly the same training cost. Table 4 shows that on Tiny ImageNet using ResNet50 model, Backward Smoothing also achieves significant robustness improvement over other singlestep robust training methods and even outperforms the stateoftheart robust training methods.
5.3 Evaluation with Stateoftheart Attacks
To ensure that Backward Smoothing does not cause obfuscated gradient problem (Athalye et al., 2018) or presents a false sense of security, we further evaluate our method using stateoftheart attacks, by considering two evaluation methods: AutoAttack (Croce & Hein, 2020b), which is an ensemble of four diverse (whitebox and blackbox) attacks (APGDCE, APGDDLR, FAB (Croce & Hein, 2020a) and Square Attack (Andriushchenko et al., 2019)) to reliably evaluate robustness; RayS attack (Chen & Gu, 2020), which only requires the prediction labels of the target model (completely gradientfree) and is able to detect falsely robust models. It also measures another robustness metric, average decision boundary distance (ADBD), defined as examples’ average distance to their closest decision boundary. ADBD reflects the overall model robustness beyond constraint. Both evaluations provide online robustness leaderboards for public comparison with other models.
We train our method with WideResNet3410 model (Zagoruyko & Komodakis, 2016) and evaluate via AutoAttack and RayS. Table 5 shows that under stateoftheart attacks, Backward Smoothing still holds high robustness comparable to TRADES. Specifically, in terms of robust accuracy, Backward Smoothing is only behind TRADES, while significantly higher than AT (Madry et al., 2018) and Fast AT (Wong et al., 2020). In terms of ADBD metric, Backward Smoothing achieves the same level of overall model robustness as TRADES, much higher than the other two methods.
Method  AutoAttack  RayS  
Metric  Rob (%)  Rob (%)  ADBD 
AT  44.04  50.70  0.0344 
Fast AT  43.21  50.10  0.0334 
TRADES  53.08  57.30  0.0403 
Backward Smoothing  51.13  55.08  0.0403 
5.4 Stability and Compatibility
Figure 5 shows that Backward Smoothing is much more stable than singlestep robust training methods. Compared with Fast AT and Fast TRADES, Backward Smoothing has much smaller variance while maintaining much higher average model robustness. This demonstrates the superiority on robustness stability for Backward Smoothing method. We also wonder whether Backward Smoothing is compatible with Adversarial Training, i.e., can we use a similar initialization strategy for improving Fast AT? We test this on CIFAR10 using ResNet18 model, and the resulting model achieves robust accuracy, improving the stability of Fast AT as well as the average robustness. However, the best run out of trials does not achieve better robustness. We conjecture that the main reason for the deteriorated performance is the different choices of inner maximization loss for Adversarial Training (CrossEntropy) and TRADES (KL divergence). Considering the random perturbation generated on the output space, the CrossEntropy loss mainly focuses on the logit while KL divergence is closely related to all logits. This partially explains the above observations.
5.5 Ablation Studies
We also perform a set of ablation studies to provide a more indepth analysis on Backward Smoothing. Due to the space limit, here we present the sensitivity analysis on smoothing parameter and the step size, and leave more ablation studies in the supplemental materials.
Effect of : We analyze the effect of in Backward Smoothing by fixing and the attack step size. Table 7 summarizes the results. In general, does not have a significant effect on the final model robustness; however, using too large or too small would lead to slightly worse robustness. Empirically, achieves the best performance on both datasets.
Dataset  CIFAR10  CIFAR100  
Nat (%)  Rob (%)  Nat (%)  Rob (%)  
0.1  82.43  52.13  56.62  29.34 
0.5  82.53  52.34  56.95  29.85 
1.0  82.38  52.50  56.96  30.50 
2.0  82.29  52.42  56.16  29.88 
5.0  81.50  52.32  56.10  429.83 
Dataset  CIFAR10  CIFAR100  
Step Size  Nat (%)  Rob (%)  Nat (%)  Rob (%) 
6/255  81.38  52.38  56.83  29.78 
7/255  81.96  52.40  56.61  29.82 
8/255  82.38  52.50  56.96  30.50 
9/255  82.47  52.16  56.45  29.35 
10/255  81.71  52.04  60.85  24.21 
11/255  67.43  42.45  40.40  20.92 
12/255  65.56  41.12  37.90  18.83 
Effect of Attack Step Size: To verify the effect of attack step size, we fix and . From Table 7, we can observe that different from singlestep robust training methods, Backward Smoothing achieves similar robustness with slightly smaller step size, while the best performance is obtained with step size . This suggests that we do not need to pursue overlylarge step size for better robustness as in Fast AT. This avoids the stability issue in Fast AT.
5.6 Combining with Other Acceleration Techniques
Method  Nat (%)  Rob (%)  Time (m) 
AT  81.48  50.32  62 
Fast AT  83.26  45.30  12 
Fast AT+  83.54  48.43  28 
TRADES  79.64  50.86  88 
Fast TRADES  85.14  44.98  18 
Fast TRADES (2step)  81.44  47.10  24 
Backward Smoothing  78.76  50.58  24 
Aside from random initialization, Wong et al. (2020) also adopts two additional acceleration techniques to further improve training efficiency with a minor sacrifice on robustness performance: cyclic learning rate decay schedule (Smith, 2017) and mixprecision training (Micikevicius et al., 2017). We show that such strategies are also applicable to Backward Smoothing. Table 8 provides the results when these acceleration techniques are applied. We can observe that both work universally well for all methods, significantly reducing training time (in comparison with Table 3). Yet it does not alter the conclusions that Backward Smoothing achieves similar robustness to TRADES with much less training time. Also when compared with the recent proposed Fast AT+ method, Backward Smoothing achieves higher robustness and training efficiency. Note that the idea of Fast AT+ method is orthogonal to ours and we can also adopt such hybrid approach for further reduction on training time.
6 Conclusions
In this paper, we propose a new understanding towards Fast Adversarial Training by viewing random initialization as performing randomized smoothing for the inner maximization problem. We then show that the smoothing effect by random initialization is not enough under adversarial perturbation constraint. To address this issue, we propose a new initialization strategy, Backward Smoothing. The resulting method closes the robustness gap to stateoftheart robust training methods and significantly improves model robustness over singlestep robust training methods.
References
 AlDujaili & O’Reilly (2020) Abdullah AlDujaili and UnaMay O’Reilly. Sign bits are all you need for blackbox attacks. In ICLR, 2020.
 Alayrac et al. (2019) JeanBaptiste Alayrac, Jonathan Uesato, PoSen Huang, Alhussein Fawzi, Robert Stanforth, and Pushmeet Kohli. Are labels required for improving adversarial robustness? In NeurIPS, pp. 12214–12223, 2019.
 Andriushchenko & Flammarion (2020) Maksym Andriushchenko and Nicolas Flammarion. Understanding and improving fast adversarial training. arXiv preprint arXiv:2007.02617, 2020.
 Andriushchenko et al. (2019) Maksym Andriushchenko, Francesco Croce, Nicolas Flammarion, and Matthias Hein. Square attack: a queryefficient blackbox adversarial attack via random search. arXiv preprint arXiv:1912.00049, 2019.
 Athalye et al. (2018) Anish Athalye, Nicholas Carlini, and David Wagner. Obfuscated gradients give a false sense of security: Circumventing defenses to adversarial examples. In ICML, 2018.
 Boyd et al. (2004) Stephen Boyd, Stephen P Boyd, and Lieven Vandenberghe. Convex optimization. Cambridge university press, 2004.

Brendel et al. (2018)
Wieland Brendel, Jonas Rauber, and Matthias Bethge.
Decisionbased adversarial attacks: Reliable attacks against blackbox machine learning models.
In ICLR, 2018.  Carlini & Wagner (2017) Nicholas Carlini and David Wagner. Towards evaluating the robustness of neural networks. In SP, pp. 39–57. IEEE, 2017.
 Carmon et al. (2019) Yair Carmon, Aditi Raghunathan, Ludwig Schmidt, John C Duchi, and Percy S Liang. Unlabeled data improves adversarial robustness. In NeurIPS, pp. 11192–11203, 2019.
 Chen & Gu (2020) Jinghui Chen and Quanquan Gu. Rays: A ray searching method for hardlabel adversarial attack. In SIGKDD, 2020.
 Chen et al. (2020) Jinghui Chen, Dongruo Zhou, Jinfeng Yi, and Quanquan Gu. A frankwolfe framework for efficient and effective adversarial attacks. In AAAI, 2020.
 Chen et al. (2017) PinYu Chen, Huan Zhang, Yash Sharma, Jinfeng Yi, and ChoJui Hsieh. Zoo: Zeroth order optimization based blackbox attacks to deep neural networks without training substitute models. In AISec, pp. 15–26. ACM, 2017.
 Cheng et al. (2019) Minhao Cheng, Thong Le, PinYu Chen, Huan Zhang, JinFeng Yi, and ChoJui Hsieh. Queryefficient hardlabel blackbox attack: An optimizationbased approach. In ICLR, 2019.
 Cheng et al. (2020) Minhao Cheng, Simranjit Singh, Patrick H. Chen, PinYu Chen, Sijia Liu, and ChoJui Hsieh. Signopt: A queryefficient hardlabel adversarial attack. In ICLR, 2020.
 Cohen et al. (2019) Jeremy Cohen, Elan Rosenfeld, and Zico Kolter. Certified adversarial robustness via randomized smoothing. In ICML, pp. 1310–1320, 2019.
 Croce & Hein (2020a) F. Croce and M. Hein. Minimally distorted adversarial examples with a fast adaptive boundary attack. In ICML, 2020a.
 Croce & Hein (2020b) Francesco Croce and Matthias Hein. Reliable evaluation of adversarial robustness with an ensemble of diverse parameterfree attacks. In ICML, 2020b.
 Deng et al. (2009) Jia Deng, Wei Dong, Richard Socher, LiJia Li, Kai Li, and Li FeiFei. Imagenet: A largescale hierarchical image database. In CVPR, pp. 248–255. Ieee, 2009.
 Dhillon et al. (2018) Guneet S Dhillon, Kamyar Azizzadenesheli, Zachary C Lipton, Jeremy Bernstein, Jean Kossaifi, Aran Khanna, and Anima Anandkumar. Stochastic activation pruning for robust adversarial defense. ICLR, 2018.
 Duchi et al. (2012) John C Duchi, Peter L Bartlett, and Martin J Wainwright. Randomized smoothing for stochastic optimization. SIAM Journal on Optimization, 22(2):674–701, 2012.
 Goodfellow et al. (2015) Ian J Goodfellow, Jonathon Shlens, and Christian Szegedy. Explaining and harnessing adversarial examples. ICLR, 2015.
 Guo et al. (2018) Chuan Guo, Mayank Rana, Moustapha Cisse, and Laurens Van Der Maaten. Countering adversarial images using input transformations. ICLR, 2018.
 He et al. (2016) Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In CVPR, pp. 770–778, 2016.
 Hendrycks et al. (2019) Dan Hendrycks, Kimin Lee, and Mantas Mazeika. Using pretraining can improve model robustness and uncertainty. In ICML, pp. 2712–2721, 2019.
 Ilyas et al. (2018) Andrew Ilyas, Logan Engstrom, Anish Athalye, Jessy Lin, Anish Athalye, Logan Engstrom, Andrew Ilyas, and Kevin Kwok. Blackbox adversarial attacks with limited queries and information. In ICML, 2018.
 Krizhevsky et al. (2009) Alex Krizhevsky, Geoffrey Hinton, et al. Learning multiple layers of features from tiny images. 2009.
 Kurakin et al. (2016) Alexey Kurakin, Ian Goodfellow, and Samy Bengio. Adversarial examples in the physical world. arXiv preprint arXiv:1607.02533, 2016.
 Li et al. (2020) Bai Li, Shiqi Wang, Suman Jana, and Lawrence Carin. Towards understanding fast adversarial training. arXiv preprint arXiv:2006.03089, 2020.
 Ma et al. (2018) Xingjun Ma, Bo Li, Yisen Wang, Sarah M Erfani, Sudanthi Wijewickrema, Grant Schoenebeck, Dawn Song, Michael E Houle, and James Bailey. Characterizing adversarial subspaces using local intrinsic dimensionality. ICLR, 2018.

Madry et al. (2018)
Aleksander Madry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, and
Adrian Vladu.
Towards deep learning models resistant to adversarial attacks.
ICML, 2018.  Micikevicius et al. (2017) Paulius Micikevicius, Sharan Narang, Jonah Alben, Gregory Diamos, Erich Elsen, David Garcia, Boris Ginsburg, Michael Houston, Oleksii Kuchaiev, Ganesh Venkatesh, et al. Mixed precision training. arXiv preprint arXiv:1710.03740, 2017.

Moon et al. (2019)
Seungyong Moon, Gaon An, and Hyun Oh Song.
Parsimonious blackbox adversarial attacks via efficient combinatorial optimization.
In ICML, pp. 4636–4645, 2019.  MoosaviDezfooli et al. (2016) SeyedMohsen MoosaviDezfooli, Alhussein Fawzi, and Pascal Frossard. Deepfool: a simple and accurate method to fool deep neural networks. In CVPR, pp. 2574–2582, 2016.
 Papernot et al. (2016) Nicolas Papernot, Patrick McDaniel, Somesh Jha, Matt Fredrikson, Z Berkay Celik, and Ananthram Swami. The limitations of deep learning in adversarial settings. In EuroS&P, pp. 372–387. IEEE, 2016.
 Qin et al. (2019) Chongli Qin, James Martens, Sven Gowal, Dilip Krishnan, Krishnamurthy Dvijotham, Alhussein Fawzi, Soham De, Robert Stanforth, and Pushmeet Kohli. Adversarial robustness through local linearization. In NeurIPS, pp. 13847–13856, 2019.
 Rice et al. (2020) Leslie Rice, Eric Wong, and J Zico Kolter. Overfitting in adversarially robust deep learning. ICML, 2020.
 Salman et al. (2019) Hadi Salman, Jerry Li, Ilya Razenshteyn, Pengchuan Zhang, Huan Zhang, Sebastien Bubeck, and Greg Yang. Provably robust deep learning via adversarially trained smoothed classifiers. In NeurIPS, pp. 11292–11303, 2019.
 Samangouei et al. (2018) Pouya Samangouei, Maya Kabkab, and Rama Chellappa. Defensegan: Protecting classifiers against adversarial attacks using generative models. ICLR, 2018.
 Shafahi et al. (2019) Ali Shafahi, Mahyar Najibi, Mohammad Amin Ghiasi, Zheng Xu, John Dickerson, Christoph Studer, Larry S Davis, Gavin Taylor, and Tom Goldstein. Adversarial training for free! In NeurIPS, pp. 3358–3369, 2019.
 Smith (2017) Leslie N Smith. Cyclical learning rates for training neural networks. In WACV, pp. 464–472. IEEE, 2017.
 Song et al. (2018) Yang Song, Taesup Kim, Sebastian Nowozin, Stefano Ermon, and Nate Kushman. Pixeldefend: Leveraging generative models to understand and defend against adversarial examples. ICLR, 2018.
 Szegedy et al. (2013) Christian Szegedy, Wojciech Zaremba, Ilya Sutskever, Joan Bruna, Dumitru Erhan, Ian Goodfellow, and Rob Fergus. Intriguing properties of neural networks. arXiv preprint arXiv:1312.6199, 2013.
 Wang et al. (2019) Yisen Wang, Xingjun Ma, James Bailey, Jinfeng Yi, Bowen Zhou, and Quanquan Gu. On the convergence and robustness of adversarial training. In ICML, pp. 6586–6595, 2019.
 Wang et al. (2020) Yisen Wang, Difan Zou, Jinfeng Yi, James Bailey, Xingjun Ma, and Quanquan Gu. Improving adversarial robustness requires revisiting misclassified examples. In ICLR, 2020.
 Wong et al. (2020) Eric Wong, Leslie Rice, and J. Zico Kolter. Fast is better than free: Revisiting adversarial training. In ICLR, 2020.
 Xie et al. (2018) Cihang Xie, Jianyu Wang, Zhishuai Zhang, Zhou Ren, and Alan Yuille. Mitigating adversarial effects through randomization. ICLR, 2018.
 Zagoruyko & Komodakis (2016) Sergey Zagoruyko and Nikos Komodakis. Wide residual networks. arXiv preprint arXiv:1605.07146, 2016.
 Zhang et al. (2019) Hongyang Zhang, Yaodong Yu, Jiantao Jiao, Eric Xing, Laurent El Ghaoui, and Michael Jordan. Theoretically principled tradeoff between robustness and accuracy. In ICML, pp. 7472–7482, 2019.
Appendix A Randomized Smoothing
Randomized smoothing technique (Duchi et al., 2012) was originally proposed for solving convex nonsmooth optimization problems. It is based on the observations that random perturbation of the optimization variable can be used to transform the loss into a smoother one. Instead of using only and to solve
randomized smoothing turns to solve the following objective function, which utilizes more global information from neighboring areas:
(10) 
where is a random variable, and is a small number. Duchi et al. (2012) showed that randomized smoothing makes the loss in (10) smoother than before. Hence, even if the original loss
is nonsmooth, it can still be solved by stochastic gradient descent with provable guarantees.
Appendix B Additional Ablation Studies
In this section, we conduct additional ablation studies to provide a comprehensive view to the Backward Smoothing method.
b.1 The Effect of
We conduct the ablation studies to figure out the effect of in Backward Smoothing method by fixing and the attack step size. Table 9 shows the experimental results. Similar to what does in TRADES (Zhang et al., 2019), here in Backward Smoothing, still controls the tradeoff between natural accuracy and robust accuracy. We observe that with a larger , natural accuracy keeps decreasing and the best robustness is obtained with .
Dataset  CIFAR10  CIFAR100  
Nat (%)  Rob (%)  Nat (%)  Rob (%)  
2.0  84.87  46.46  62.22  24.83 
4.0  84.58  50.01  59.03  27.58 
6.0  83.96  51.65  57.46  28.66 
8.0  82.48  51.88  57.51  29.38 
10.0  82.38  52.50  56.96  30.50 
12.0  81.63  52.38  56.46  29.95 
b.2 Does Backward Smoothing alone works?
To further understand the role of Backward Smoothing in robust training, we conduct experiments on using Backward Smoothing alone, i.e., only use Backward Smoothing initialization but do not perform gradientbased attack at all. Table 11 and Table 11 show the experimental results. We can observe that Backward Smoothing as an initialization itself only provides a limited level of robustness (not as good as singlestep attack). This is reasonable since the loss for Backward Smoothing does not directly promote adversarial attacks. Therefore it only serves as an initialization to help singlestep attacks better solve the inner maximization problems.
Method  Nat (%)  Rob (%) 
Fast AT  84.79  46.30 
Fast TRADES  84.80  46.25 
Backward Smoothing Alone  69.87  39.26 
Method  Nat (%)  Rob (%) 
Fast AT  60.35  24.64 
Fast TRADES  60.22  19.40 
Backward Smoothing Alone  43.47  18.51 