1 Introduction
It is easy to generate humanimperceptible perturbations that put prediction of a deep neural network (DNN) out. Such perturbated samples are called adversarial examples Szegedy et al. (2014) and algorithms for generating adversarial examples are called adversarial attacks. Adversarial attacks can be divided into two types  whitebox attacks Goodfellow et al. (2015); Madry et al. (2018); Carlini and Wagner (2017); Croce and Hein (2020a) and blackbox attacks Papernot et al. (2016, 2017); Chen et al. (2017); Ilyas et al. (2018); Papernot et al. (2018). In the whitebox attack setting, an adversary has access to the model architecture and the value of parameters of a given DNN , while in the blackbox attack setting, the adversary has no access to them but can access only to outputs of the prediction model for given inputs.
It is well known that adversarial attacks can greatly reduce the accuracy of DNNs, for example from about 96% accuracy on clean data to almost zero accuracy on adversarial examples Madry et al. (2018). This vulnerability of DNNs can cause serious security problems when DNNs are applied to security critical applications Kurakin et al. (2017); Jiang et al. (2019) such as medicine Ma et al. (2020); Finlayson et al. (2019) and autonomous driving Kurakin et al. (2017); Deng et al. (2020); Morgulis et al. (2019); Li et al. (2020).
The aim of this paper is to develop a new adversarial training algorithm for DNNs, which is theoretically well founded and empirically superior to other existing competitors. A novel feature of the proposed algorithm is to use a dataadaptive regularization for robustifying a prediction model. We impose more regularization for data more vulnerable to adversarial attacks and vice versa. Even though the idea of dataadaptive regularization is not new, our dataadaptive regularization has a firm theoretical base of reducing an upper bound of the robust risk while most of other dataadaptive regularizations are rather heuristically motivated.
1.1 Related Works
Most of existing adversarial training algorithms can be understood as a procedure of estimating the optimal robust prediction model that is the minimizer of the robust population risk. The two most representative adversarial training algorithms are PGDTraining of
Madry et al. (2018) and TRADES of Zhang et al. (2019). Madry et al. (2018) develops the PGDTraining algorithm which minimizes the robust empirical risk directly. In contrast, Zhang et al. (2019) decomposes the robust population risk as the sum of the two missclassification errors  one for clean data and the other for adversarial data, and treats the second term as a regularization term to propose a regularized empirical risk minimization algorithm called TRADES.Dataadaptive modifications of existing adversarial training algorithms have also received much attention. Zhang et al. (2020) decides the number of iterations of the PGD algorithm, which is a whitebox attack algorithm to generate adversarial examples, dataadaptively when generating an adversarial example (earlystopped PGD). Ding et al. (2020)
generates an adversarial example only when a given datum is correctly classified and uses the datum itself as an adversarial example when it is missclassified.
Zhang et al. (2021) proposes to minimize the weighted robust empirical risk, where the weight is reciprocally proportional to the distance to the decision boundary. Wang et al. (2020)devises the regularized robust empirical risk as the sum of the robust empirical risk and regularization term where the weights proportional to the conditional probability of a given datum being missclassified are employed in the regularization term. See Section
2 for details of dataadaptive adversarial training algorithms.1.2 Our Contributions
We propose a new dataadaptive adversarial training algorithm. Novel features of our algorithm compared to the aforementioned dataadaptive adversarial training algorithms are that it is theoretically well motivated, easier to implement and empirically superior. First, we derive an upper bound of the robust risk . Then, we devise a dataadaptive regularized empirical risk which is a surrogate version of our theoretical upper bound. Finally, we learn a robust prediction model by minimizing the proposed dataadaptive regularized empirical risk. By analyzing benchmark data sets, we show that our proposed algorithm is superior to other competitors in view of the generalization (accuracy on clean samples) and robustness (accuracy on adversarial examples) simultaneously to achieve the stateoftheart performance. In addition, we illustrate that our algorithm is helpful to improve the fairness of the prediction model in the sense that the error rates of each class become more similar compared to a nonadaptive adversarial training algorithm.
A summary of our contributions is as follows :

Theoretically, we derive an upper bound of the robust risk.

We propose a dataadaptive regularized empirical risk which is a surrogate version of the derived upper bound of the robust risk.

Numerical experiments are conducted to show that our algorithm improves the robustness and generalization simultaneously and outperforms the existing stateoftheart methods.

Our algorithm can mitigate the unfairness due to the disparity between classwise accuracies.
2 Preliminaries
2.1 Robust Population Risk
Let be the input space, be the set of output labels and be the score function parametrized by neural network parameters such that
is the vector of the conditional class probabilities. Let
and be the indicator function.The robust population risk used in adversarial training is defined as
(1) 
Most adversarial training algorithms learn by minimizing an empirical version of the above robust population risk. In turn, most empirical versions of (1) require to generate an adversarial example which is an empirical counterpart of
Any method of generating an adversarial example is called an adversarial attack.
2.2 Algorithms for Generating Adversarial Examples
Existing adversarial attacks can be categorized into either the whitebox attack Goodfellow et al. (2015); Madry et al. (2018); Carlini and Wagner (2017); Croce and Hein (2020a) or the blackbox attack Papernot et al. (2016, 2017); Chen et al. (2017); Ilyas et al. (2018); Papernot et al. (2018). For the whitebox attack, the model structure and parameters are known to may be adversary who use these information for generating adversarial examples.
The most popular method for the whitebox attack is PGD (Projected Gradient Descent) Madry et al. (2018). Let be a surrogate loss of with for given and PGD finds the adversarial example defined as
by applying the gradient ascent algorithm to to update and projecting it to That is, the update rule of PGD is
(2) 
where is the projection operator to and . For the surrogate loss , the cross entropy Madry et al. (2018) or the KL divergence Zhang et al. (2019) is used.
There exists a case that an adversary can only use information about outputs for given inputs but cannot access the structure and parameters of the model. In this case, usually, the adversary generates dataset where is an output of a given input . Then, the adversary trains a substitute model by this data sets, and generates adversarial examples from the substitute model Papernot et al. (2017). These kinds of attacks are called the blackbox attack.
2.3 Review of Adversarial Training Algorithms
We review some of the adversarial training algorithms which, we think, are related to our proposed algorithm. Typically, adversarial training algorithms consist of the maximization and minimization steps. In the maximization step, we generate adversarial examples for given . In the minimization step, we fix the adversarial example and update . For notational simplicity, we drop in the adversarial examples.
PGDTraining
Madry et al. (2018) proposes PGDTraining which updates by minimizing
where is the crossentropy loss and is an adversarial example obtained by PGD.
Trades
Robust risk, natural risk and boundary risk are defined by
(3)  
(4)  
(5) 
Zhang et al. (2019) shows
and proposes the following regularized empirical risk which is a surrogate version of the upper bound of the robust risk:
where is an adversarial example generated by PGD with the KLdivergence.
DataAdaptive Methods Zhang et al. (2020); Ding et al. (2020); Zhang et al. (2021); Wang et al. (2020)

Geometry Aware Instacne Reweighted Adversarial Training (GAIRAT) Zhang et al. (2021) is the method which increases the weights of samples that are close to the decision boundary. In other words, GAIRAT minimizes
where for a prespecified maximum iteration and .

Zhang et al. (2020) suggests earlystopped PGD which stops the iteration of PGD when the model first misclassifies the adversarial example. Friendly Adversarial Training (FAT) minimizes
where is an adversarial sample by PGD and for a prespecified maximum iteration .
3 Adaptive Regularization
In this section, we develop a new dataadaptive regularization algorithm for adversarial training called AntiRobust Weighted Regularization (ARoW).
3.1 An Upper Bound of the Robust Population Risk
In this subsection, we consider the case of binary response
and the surrogate loss function
given as where is a function bounded below by . Examples of include the binary crossentropyWe take into account the regularized robust risk defined as for a given . The robust risk of (3) is the regularized robust risk with . In the regularized robust risk, is considered as a regularization term, and the regularization parameter controls the tradeoff between the generalization ability and robustness to adversarial attacks.
The following theorem provides an upper bound of the regularized robust risk.
Theorem 1.
Let
For any score function , we have
(7) 
The upper bound (7) consists of the two terms where the first term is a surrogate loss of the natural risk and the second term is a surrogate version of the boundary risk. The upper bound in (7) can be served as a surrogate regularized robust risk of which will be done in the next subsection.
The second term on the righthand side of (7) can be reformulated as the expectation of
(8) 
with respect to and . For given can be considered as a measure of robustness of the prediction model at and therefore the second term is considered to be dataadaptive regularization term. That is, that term enforces the prediction model to be more robust for data whose weights are large.
3.2 AntiRobust Weighted Regularization (ARoW) Algorithm
Motivated by the upper bound (7) of the robust risk in Section 3.1, in this subsection, we propose a new dataadaptive adversarial training algorithm called the AntiRobust Weighted Regularization (ARoW) algorithm, which learns by minimizing the following regularized empirical risk:
(9) 
where , , is the onehot vector whose the th entry is 1 and is the vector whose entries are all 1.
The empirical counterpart (3.2) of the upper bound (7) is obtained by modifying the three terms after the population expectation is replaced by the empirical averages.
First, we employ the label smoothing to estimate the conditional class probabilities more accurately Müller et al. (2019). It is well known that DNNs trained on the original labels are poorly calibrated Guo et al. (2017). The accurate estimation of is important since is used in the regularization term of ARoW.
Second, we replace by because the KL divergence can be considered as multiclass calibration of the term . This modification is also used in TRADES Zhang et al. (2019).
The last modification is to replace by . We employ this modification since it would be helpful not to regularize highly robust samples. That is, we want to consider samples with and as equally robust samples.
The ARoW algorithm is summarized in Algorithm 1.
3.3 Remarks
An Alternative Upper Bound
An alternative adaptive upper bound of the robust population risk can be derived by replacing the term in (7) with . See Appendix B for derivation. From this alternative upper bound, we propose an adversarial training algorithm which minimizes the following regularized empirical risk:
(10) 
We call the algorithm Confidence Weighted Regularization (CoW).
The roles of the two dataadaptive regularization terms in (3.2) and (10) are quite different. The regularization term in (3.2) encourages the prediction model to be more robust for samples which are more vulnerable to adversarial attacks (i.e. is small). In contrast, the regularization term in (10) puts more regularization on highly confident data (i.e. is large). The idea of focusing more on highly confident data would be reasonable since adversarial attacks for highly confident data result in serious damages. However, numerical studies in Section 4 show ARoW is superior to CoW.
Comparison to MART
The objective functions in MART (6) and ARoW (3.2) are similar. But, there are three main differences. First, the supervised loss term of ARoW is the label smoothing loss with clean samples, whereas MART uses the margin cross entropy loss with adversarial examples. Second, the surrogate loss functions used in PGD are different. In MART, the cross entropy is used while the KL divergence is used in ARoW. Third, the weight in regularization term in MART is proportional to while the weight in ARoW is proportional to . In the numerical studies, we find that ARoW outperforms MART with large margin. This would be partly because ARoW is theoretically well motivated.
4 Experiments
In this section, we investigate the ARoW algorithm in view of robustness and generalization by analyzing benchmark data sets. We show that ARoW is superior to other competitors such as Madry et al. (2018); Zhang et al. (2019); Wang et al. (2020); Rade and MoosaviDezfolli (2022); Zhang et al. (2021) as well as CoW. In addition, we carry out ablation studies to illustrate that the adaptive regularization is a key component for the success of ARoW compared to TRADES. We show that ARoW improves the robustness of data vulnerable to adversarial attacks more than TRADES does. Moreover, ARoW improves the fairness of the prediction model in the sense that the error rates of each class become more similar. For benchmark data sets, we use CIFAR10 without and with unlabeled data Carmon et al. (2019), FMINST Xiao et al. (2017)
and SVHN dataset
Netzer et al. (2011).For CIFAR10, we use two CNN architectures  WideResNet3410 Zagoruyko and Komodakis (2016) and ResNet18 He et al. (2016)
for investigating how ARoW works well depending on the capacity of the model. WideResNet3410 is more complex than ResNet18. For FMNIST and SVHN, ResNet18
He et al. (2016) is used.In the main manuscript, we only present the results for CIFAR10 with WideResNet3410, and defer the results for unlabeled CIFAR10 of Carmon et al. (2019) with WideResNet3410, CIFAR10 with ResNet18, FMINST and SVHN to Appendices D.1.5, D.2, E.1 and E.2, respectively.
Experimental Setup
The datasets are normalized into [0, 1]. For generating adversarial examples in the training phase, PGD with random start, , and is used, where PGD is PGD in (2) with iterations. For learning prediction models, the SGD with momentum , weight decay , an initial learning rate of 0.1 and batch size of 128 are used and the learning rate is reduced by a factor of 10 at 60 and 90 epochs. The final model is set to be the best model against PGD on the test data among those obtained until 120 epochs. The random crop and random horizontal flip with probability 0.5 are applied for data augmentation. For CIFAR10, stochastic weighting average (SWA) Izmailov et al. (2018) is employed after 50epochs for preventing from robust overfitting Rice et al. (2020) as Chen et al. (2021) does.
For evaluating the robust accuracy in the test phase, PGD and AutoAttack are used for adversarial attacks, where AutoAttack consists of three white box attacks  APGD and APGDDLR in Croce and Hein (2020b) and FAB Croce and Hein (2020a) and one black box attack  Square Attack Andriushchenko et al. (2020). To the best of our knowledge, AutoAttack is the strongest attack.
Method  Standard  Madry et al. (2018)  AutoAttack Croce and Hein (2020b) 

PGDTraining Madry et al. (2018)  87.02(0.20)  57.50(0.12)  53.98(0.14) 
TRADES Zhang et al. (2019)  85.86(0.09)  56.79(0.08)  54.31(0.08) 
HAT Rade and MoosaviDezfolli (2022)  86.98(0.10)  56.81(0.17)  54.63(0.07) 
GAIRAT Zhang et al. (2021)  85.44(0.10)  67.27(0.07)  46.41(0.07) 
ARoW  88.59(0.02)  58.18(0.09)  54.82(0.14) 
CoW  88.20(0.09)  57.33(0.05)  54.63(0.12) 
We conduct the experiment three times with different seeds and present the averages of the accuracies with the standard errors in the brackets.
Method  Standard  APGD  APGDDLR  FAB Croce and Hein (2020a)  SQUARE Andriushchenko et al. (2020) 

GAIRAT Zhang et al. (2021)  85.44(0.170)  63.14(0.16)  46.48(0.07)  49.35(0.05)  55.19(0.16) 
ARoW  88.59(0.03)  57.78(0.08)  54.83(0.13)  55.69(0.15)  62.31(0.06) 
4.1 Performance Evaluation
Initial Robustness  Rob.TRADES  Rob.ARoW  Diff.  Rate of Impro. (%) 

Highly Vulnerable  317  357  40  12.62 
Vulnerable  945  1008  63  6.67 
Robust  969  1027  58  5.99 
Highly Robust  3524  3529  5  0.142 
Method  Standard  Madry et al. (2018)  AutoAttack Croce and Hein (2020b) 

TRADES w/oLS  85.86(0.09)  56.79(0.08)  54.31(0.08) 
TRADES w/LS  86.83(0.08)  57.75(0.02)  54.76(0.08) 
ARoW w/oLS  87.68(0.16)  57.54(0.09)  54.58(0.10) 
ARoW w/LS  88.59(0.02)  58.18(0.09)  54.83(0.14) 
Standard  

Method  Acc  WCAcc  SD  Acc  WCAcc  SD 
TRADES()  87.73  70.70  8.17  57.17  26.40  16.75 
TRADES()  85.69  67.10  9.27  57.38  27.10  16.97 
TRADES()  84.94  65.90  9.58  58.01  27.30  16.92 
ARoW  88.58  75.10  7.16  59.23  30.80  15.68 
CoW  88.41  72.20  7.22  58.34  26.40  17.09 
. We report the accuracy (ACC), the worstclass accuracy (WCAcc) and the standard deviation of classwise accuracies (SD) for each method.
Class  Rob.Both (%)  Rob.TRADES (%)  Rob.ARoW (%) 

0(Airplane)  61.3  3.5  5.4 
1(Automobile)  75.8  1.7  1.7 
2(Bird)  36.6  1.9  6.5 
3(Cat)  23.1  3.0  7.1 
4(Deer)  32.2  3.4  8.1 
5(Dog)  44.7  3.9  2.5 
6(Frog)  61.4  6.4  2.2 
7(Horse)  67.1  2.6  2.2 
8(Ship)  60.4  1.9  9.7 
9(Truck)  73.0  2.3  3.3 
Table 1 reports the standard accuracies on clean samples (generalization) and the robust accuracies for adversarial examples generated by and AutoAttack Croce and Hein (2020b) (robustness) for various adversarial training algorithms including ARoW.
The regularization parameters, if any, are set to be the ones given in the corresponding articles while the regularization parameter in ARoW and CoW is set to be 6 which is the value used in TRADES. The other regularization parameters and in ARoW and CoW are selected so that the robust accuracy against PGD is maximized. The regularization parameters used in the training phase are summarized in Appendix D.1.1.
HAT Rade and MoosaviDezfolli (2022) in Table 1 is the stateofart algorithm against to AutoAttack Croce and Hein (2020b). HAT is a variation of TRADES with an additional regularization term regarding helper examples. The additional regularization term restrains decision boundary from excessive margins. The objective function of HAT is given in Appendix 12.
The results indicate that ARoW is superior to the other competitors in view of robustness as well as generalization. Moreover, we can improve the robustness of ARoW a lot if the generalization sacrifices a little bit, whose results are presented in Appendix D.1.2.
We observe that GAIRAT Zhang et al. (2021) is robust to , but not robust to AutoAttack Croce and Hein (2020b) in Table 1. For cheking whether the gradient masking occurs, we evaluate the robustness of GAIRAT against to the four attacks in AutoAttack. In Table 2, the robustness of GAIRAT is degraded much for the three attacks in AutoAttack except APGD while the robustness of ARoW remains stable regardless of adversarial attacks. This observations suggest that the gradient masking occurs in GAIRAT while it does not in ARoW.
4.2 Ablation Study : Comparison of ARoW and TRADES
In this subsection, we compare ARoW and TRADES Zhang et al. (2019) since ARoW is a dataadaptive modification of TRADES.
Performance
In Figure 1, we compare the performances (standard accuracy vs robust accuracy) of ARoW and TRADES Zhang et al. (2019) for various choices of the regularization parameter We can see that ARoW uniformly dominates TRADES with respect to the regularization parameters regardless of the methods for adversarial attack.
The Effect of Adaptive Regularization
We investigate how the adaptive regularization in ARoW affects the robustness of each sample of CIFRA10. First, we divide the test data into four groups  highly vulnerable, vulnerable, robust and highly robust according to the values of ( , and ), where is the parameter learned by PGDTraining Madry et al. (2018). Then, for samples of each group, we check how many samples become robust for ARoW and TRADES, respectively, whose results are presented in Table 3. We can see that ARoW is superior in making nonrobust samples (highly vulnerable or vulnerable) to robust ones compared with TRADES. We believe that this improvement is mainly due to the adaptive regularization term in ARoW that enforces more regularization on more vulnerable samples.
The Effect of Label Smoothing
In Table 4, we investigate the effects of label smoothing in ARoW and TRADES Zhang et al. (2019). The label smoothing is helpful not only for ARoW but also for TRADES. This would be partly because the regularization terms depend on the conditional class probabilities and it is well known that the label smoothing is helpful for the calibration of the conditional class probabilities Pereyra et al. (2017). Note that ARoW is even superior to TRADES even without label smoothing.
Improved Fairness
Xu et al. (2021) reports that TRADES Zhang et al. (2019) increases the variation of the perclass accuracies (accuracy in each class) which is not desirable in view of fairness. In turn, Xu et al. (2021) proposes the FairRobustLearning (FRL) algorithm to alleviate this problem. Even if the fairness becomes better, the standard and robust accuracies of FRL become worse than TRADES.
In contrast, Table 5 shows that ARoW improves the fairness as well as the standard and robust accuracies compared to TRADES. Also, in Table 6, we can see that ARoW is highly effective in difficult classes such as Bird, Cat and Deer which contain a lot of nonrobust samples. These desirable properties of ARoW can be partly understood as follows. The main idea of ARoW is to impose more robust regularization to nonrobust samples. In turn, samples in less accurate classes tend to be vulnerable to adversarial attacks. Thus, ARoW improves the robustness of samples in less accurate classes which results in improved robustness as well as standard accuracies for such less accurate classes.
5 Conclusion and Future Works
In this paper, we derived an upper bound of the robust risk to develop the adaptive regularization algorithm for adversarial training and showed by numerical experiments that the adaptive regularization improves the robust accuracy as well as the standard accuracy.
Our proposed algorithms can be considered as a modifications of TRADES Zhang et al. (2019), which is a nonadaptive regularization algorithm. The idea of adaptive regularization, however, is not limited to TRADES and could be applied to other existing adversarial training algorithms including HAT Rade and MoosaviDezfolli (2022), GAIR Zhang et al. (2021), MMA Ding et al. (2020), FAT Zhang et al. (2020) and so on without much difficulty.
We have seen in Section 4.2 that ARoW improves the fairness as well as the accuracy compared to TRADES. The advantage of ARoW in terms of the fairness is an unexpected byproduct, and it would be interesting to develop a more principled way of enhancing the fairness further without hampering the accuracy.
References
 Square attack: a queryefficient blackbox adversarial attack via random searchg. In ECCV. Cited by: §4, Table 2.
 Towards evaluating the robustness of neural networks. IEEE Symposium on Security and Privacy. Cited by: §1, §2.2.
 Unlabeled data improves adversarial robustness. In NeurIPS . Cited by: §D.1.5, §4, §4.
 ZOO: zeroth order optimization based blackbox attacks to deep neural networks without training substitute models. In ACM. External Links: Link, Document Cited by: §1, §2.2.
 Robust overfitting may be mitigated by properly learned smoothening. In ICLR. Cited by: §4.

Minimally distorted adversarial examples with a fast adaptive boundary attack.
In The European Conference on Computer Vision(ECCV)
. Cited by: §1, §2.2, §4, Table 2.  Reliable evaluation of adversarial robustness with an ensemble of diverse parameterfree attacks. Cited by: §D.2.3, §4, §4.1, §4.1, §4.1, Table 1, Table 4.
 An analysis of adversarial attacks and defenses on autonomous driving models. IEEE International Conference on Pervasive Computing and Communications(PerCom). Cited by: §1.
 MMA training: direct input space margin maximization through adversarial training.. In International Conference on Learning Representataions(ICLR) . Cited by: §1.1, 4th item, §2.3, §3.1, §5.

Adversarial attacks against medical deep learning systems
. In Science. Cited by: §1.  Explaining and harnessing adversarial examples. In ICLR . Cited by: §1, §2.2.
 On calibration of modern neural networks. In ICML. Cited by: §3.2.
 Deep residual learning for image recognition. In CVPR. Cited by: §4.
 Blackbox adversarial attacks with limited queries and information. In ICML. Cited by: §1, §2.2.

Averaging weights leads to wider optima and better generalization.
Proceedings of the international conference on Uncertainty in Artificial Intelligence
. Cited by: Table 9, §4.  Blackbox adversarial attacks on video recognition models. In ACM. Cited by: §1.
 Adversarial examples in the physical world. In ICLR . Cited by: §1.
 Adaptive square attack: fooling autonomous cars with adversarial traffic signs. IEEE Internet of Things Journal. Cited by: §1.

SGDR: stochastic gradient descent with warm restarts
. In ICLR. Cited by: §D.1.5.  Understanding adversarial attacks on deep learning based medical image analysis systems. Pattern Recognition. Cited by: §1.
 Towards deep learning models resistant to adversarial attacks. In ICLR. Cited by: Table 11, Table 13, Table 9, Table 18, Table 20, §1.1, §1, §1, §2.2, §2.2, §2.3, §4.2, Table 1, Table 4, §4.
 Fooling a real car with adversarial traffic signs. ArXiv. Cited by: §1.
 When does label smoothing help?. In NeurIPS. Cited by: §3.2.
 Reading digits in natural images with unsupervised feature learning. NIPS Workshop on Deep Learning and Unsupervised Feature Learning 2011. Cited by: Appendix E, §4.

Practical blackbox attacks against machine learning
. In ACM. Cited by: §1, §2.2, §2.2.  Transferability in machine learning: from phenomena to blackbox attacks using adversarial samples. arXiv. Cited by: §1, §2.2.
 Towards the science of security and privacy in machine learning. 2018 IEEE European Symposium on Security and Privacy (EuroS&P). Cited by: §1, §2.2.
 Regularizing neural networks by penalizing confident output distributions. In ICLR. Cited by: §4.2.
 Recuding excessive margin to achieve a better accuracy vs. robustness tradeoff. In ICLR. Cited by: §C.2, §D.1.1, §D.1.2, §D.1.2, §D.2.1, Table 11, Table 13, Table 8, Table 9, Table 18, Table 20, Appendix E, §4.1, Table 1, §4, §5.
 Overfitting in adversarially robust deep learning. In ICML . Cited by: §4.
 Intriguing properties of neural networks. In ICLR. Cited by: §1.
 Improving adversarial robustness requires revisiting misclassified examples. In International Conference on Learning Representataions(ICLR)) . Cited by: §D.1.1, Table 11, Table 13, Table 9, Table 18, Table 20, §1.1, 3rd item, §2.3, §3.1, §4.
 Fashionmnist: a novel image dataset for benchmarking machine learning algorithms. archive. Cited by: Appendix E, §4.
 To be robust or to be fair: towards fairness in adversarial training. In ICML. Cited by: §4.2.
 Wide residual networks. Proceedings of the British Machine Vision Conference 2016. Cited by: §4.
 Theoretically principled tradeoff between robustness and accuracy. In International Conference on Machine Learning (ICML) . Cited by: §D.1.2, Table 11, Table 13, Table 8, Table 9, Table 18, Table 20, Appendix E, §1.1, §2.2, §2.3, §3.1, §3.2, §4.2, §4.2, §4.2, §4.2, Table 1, §4, §5.
 Attacks which do not kill training make adversarial learning stronger. In ICML. Cited by: §D.2.3, §D.2.3, §1.1, 2nd item, §2.3, §3.1, §5.
 Geometryaware instancereweighted adversarial training. In ICLR. Cited by: Table 13, Table 9, Table 18, Table 20, §1.1, 1st item, §2.3, §3.1, §4.1, Table 1, Table 2, §4, §5.
Appendix A Implementation Details
All experiments are performed on NVIDIA TITAN RTX and NVIDIA Quadro RTX 4000. In the training phase, automatic mixed precision package in pytorch is used to speed up learning.
Appendix B Theoretical Results
In this section, we provide the proofs of Theorem 1 and the alternative upper bound.
See 1
Proof.
Note that and . It suffices to show that , which holds since
∎
Theorem 2 (Alternative Upper Bound).
For any score function , we have
Proof.
Note that and . It suffices to show that , which holds since
∎
Appendix C Adversarial Training Algorithms
In this section, we explain the CoW algorithm and extra adversarial training algorithms included in our experiments but not discussed in the manuscript.
c.1 Confidence Weighted Regularzation (CoW)
For notational simplicity, we define as
(11) 
The CoW algorithm is summarized in Algorithm 2.
c.2 Helper Based Adversarial Training (HAT)
HAT Rade and MoosaviDezfolli [2022] is a variation of TRADES with an additional regularization term by using helper examples. The role of helper examples is to restrain the decision boundary from excessive margins. HAT minimizes the following regularized empirical risk:
(12) 
where is the parameter of a pretrained model only with clean samples, and is an adversarial example generated by PGD with the KLdivergence.
c.3 Helper Based Adversarial Training  AntiRobust Weighted Regularization (HAT  ARoW)
We consider the combination of HAT and ARoW for improving the performance when SWA is not used. We call it HATARoW which minimizes the following regularized empirical risk:
(13) 
In Appendix D.1.3, we investigate the performance of HATARoW.
Appendix D Additional Experiments  CIFAR10
In this section, we present the results of additional experiments and ablation studies for CIFAR10.
d.1 CIFAR10  WideResNet3410
In this subsection, we present the results for analyzing CIFAR10 with the WideResNet3410 architecture.
d.1.1 Hyperparameter Selection
Method  Weight Decay  

SWA  PGDtraining          
TRADES  6        
TRADES w/LS  6    0.2    
HAT  4  0.25      
GAIRAT          
ARoW  6    0.2  0.2  
CoW  6    0.2  0.2  
ARoW w/oLS  7      0.2  
HATARoW  7  0.15  0.2  0.2  
w/oSWA  PGDTraining          
TRADES  6        
MART  6        
HAT  4  0.25      
GAIRAT        
ARoW  8    0.2  0.2  
CoW  8    0.2  0.2  
HATARoW  8  0.15  0.2  0.2 
The regularization parameters are set to be the ones given in the corresponding articles if available while the regularization parameter in MART, ARoW and CoW is set to be 6 which is the value used in TRADES. The other regularization parameters and in ARoW and CoW are selected so that the robust accuracy against is maximized for fixed . Also, the regularization parameters and in HATARoW (C.3) are selected so that the robust accuracy against PGD is maximized. In HAT (12), the regularization parameters and are set to be 4 and 0.25 which are values used in Rade and MoosaviDezfolli [2022] for CIFAR10. When SWA is not used, the regularization parameter in ARoW and CoW is selected so that the robust accuracy against is similar to that of HAT. In HATARoW, the regularization parameter is set to be 8 that is used in ARoW and CoW. Weight decay parameter is set to be which is used in Wang et al. [2020], for MART while it is set to be for the other methods since MART works poorly with
d.1.2 Comparison of TRADES, HAT and ARoW for various values of the regularization parameter
In this subsection, we present the generalization and robustness of TRADES, HAT and ARoW against and AutoAttack with varying . The experiments are implemented with SWA, and in (12) used in HAT is set to be 0.25 which is the value used in Rade and MoosaviDezfolli [2022].
Method  Standard  AutoAttack  

TRADES  2  89.69  54.81  53.52 
3  88.45  55.42  53.88  
4  87.73  56.34  54.23  
5  86.45  56.86  54.23  
6  85.86  56.86  54.31  
7  85.71  57.17  54.75  
8  84.94  57.15  54.46  
HAT  3  88.29  56.97  54.10 
3.5  87.60  57.15  54.24  
4  87.15  56.95  54.64  
4.5  86.60  57.52  54.44  
5  86.27  57.65  54.89  
5.5  85.45  57.04  54.71  
6  84.87  56.98  54.44  
ARoW  4  89.49  56.96  54.00 
5  88.78  57.60  54.52  
6  88.59  58.18  54.82  
7  87.90  58.63  54.94  
8  87.59  58.61  55.21  
9  87.24  58.91  55.22  
10  86.51  58.90  55.41 
d.1.3 Effect of Stochastic Weight Averaging (SWA)
We conduct the experiments with and without SWA to identify the effectiveness of SWA.
Method  Standard  AutoAttack  

SWA Izmailov et al. [2018]  PGDTraining Madry et al. [2018]  87.02(0.20)  57.50(0.12)  53.98(0.14) 
TRADES Zhang et al. [2019]  85.86(0.09)  56.79(0.08)  54.31(0.08)  
HAT Rade and MoosaviDezfolli [2022]  86.98(0.10)  56.81(0.17)  54.63(0.07)  
GAIRAT Zhang et al. [2021]  85.44(0.10)  67.27(0.07)  46.41(0.07)  
ARoW  88.59(0.02)  58.18(0.09)  54.82(0.14)  
CoW  88.20(0.09)  57.33(0.05)  54.63(0.12)  
HATARoW  87.77(0.03)  58.54(0.11)  54.95(0.15)  
w/oSWA  PGDTraining Madry et al. [2018]  86.88(0.09)  54.15(0.16)  51.35(0.14) 
TRADES Zhang et al. [2019]  85.48(0.12)  56.06(0.08)  53.16(0.17)  
MART Wang et al. [2020]  84.69(0.18)  55.67(0.13)  50.95(0.09)  
HAT Rade and MoosaviDezfolli [2022]  87.53(0.02)  56.41(0.09)  53.38(0.10)  
GAIRAT Zhang et al. [2021]  84.49(0.06)  62.11(0.12)  38.48(0.36)  
ARoW  87.60(0.02)  56.47(0.10)  52.95(0.06)  
CoW  86.94(0.08)  56.19(0.13)  53.39(0.08)  
HATARoW  87.90(0.05)  57.28(0.08)  53.56(0.05) 
In Table 9, we observe that SWA is effective for most of methods. Note that ARoW performs well even without SWA. But, HAT, which is known to be the SOTA method, has the best robust accuracy against to AutoAttack without SWA. In this case, combining ARoW and HAT together (C.3), we can improves the accuracies further. We omit the results for MART since SWA degrades the performance.
d.1.4 Perclass accuracies of ARoW and TRADES
In Table 10, we present the perclass robust and standard accuracies of the prediction models trained by ARoW and TRADES.
Class  Rob.ARoW (%)  Rob.TRADES (%)  ARoW (%)  TRADES (%) 

0(Airplane)  67.3  65.6  91.6  88.3 
1(Automobile)  77.8  77.8  95.3  93.7 
2(Bird)  43.9  39.3  80.6  72.5 
3(Cat)  30.9  27.2  75.1  65.9 
4(Deer)  41.6  37.3  87.5  83.4 
5(Dog)  48.1  48.8  79.3  76.0 
6(Frog)  64.2  68.8  95.2  94.2 
7(Horse)  70.1  70.4  92.7  91.0 
8(Ship)  70.7  63.3  94.9  90.9 
9(Truck)  76.7  75.7  93.5  93.5 
In Table 10, we can see that ARoW is highly effective for classes difficult to be classified such as such as Bird, Cat and Deer. For such classes, ARoW improves much not only the standard accuracies but also the robust accuracies. For example, in the class ’Cat’, which is the most difficult class (the lowest standard accuarcy for TRADES), the robustness and generalization are improved by and by ARoW compared with TRADES. This phenomenon would be partly due to the dataadaptive regularization used in ARoW. Usually, difficult classes are less robust to adversarial attacks. In turn, ARoW puts more regularization on nonrobust classes and thus improves the accuracies of nonrobust classes more.
d.1.5 Additional Unlabeled Data
In this subsection, we present the results on CIFAR10 with unlabeled data used in Carmon et al. [2019]. In the training phase, SWA is not used and the cosine annealing learning rate scheduler Loshchilov and Hutter [2017] is used. The final model is set to be the best model against PGD on the test data among those obtained until 400 epochs.
Method  Standard  AutoAttack  

PGDTraining Madry et al. [2018]  91.97  61.42  58.90 
TRADES Zhang et al. [2019]  90.59  62.72  59.99 
MART Wang et al. [2020]  91.04  64.10  59.33 
HAT Rade and MoosaviDezfolli [2022]  91.54  63.45  60.15 
ARoW  92.09  63.72  60.12 
CoW  91.14  63.27  60.12 
In Table 11, we observe that ARoW outperforms the other competitors in terms of generalization while maintaining good robustness. MART has the best robustness against but has poor robustness against AutoAttack.
d.2 CIFAR10  ResNet18
In this subsection, we summarize the results for analyzing CIFAR10 with the ResNet18 architecture which is smaller than WideResNet3410.
d.2.1 Hyperparameter Selection and Performance
Method  Weight Decay  

PGDTraining          
TRADES  6        
MART  6        
HAT  4  0.5      
GAIRAT        
ARoW  6    0.2  0.2  
CoW  6    0.2  0.2 
The regularization parameter in TRADES, MART, HAT, ARoW and CoW is set to the values used in Table 7. For HAT, the regularization parameters and are set to be 4 and 0.5 which are the values used in Rade and MoosaviDezfolli [2022].
Method  Standard  AutoAttack  

PGDTraining Madry et al. [2018]  82.42(0.05)  53.48(0.11)  49.30(0.07) 
TRADES Zhang et al. [2019]  82.41(0.07)  52.68(0.22)  49.63(0.25) 
MART Wang et al. [2020]  74.87(0.95)  53.68(0.30)  46.61(0.24) 
HAT Rade and MoosaviDezfolli [2022]  83.05(0.03)  52.91(0.08)  49.60(0.02) 
GAIRAT Zhang et al. [2021]  81.09(0.12)  64.89(0.04)  41.35(0.16) 
ARoW  85.30(0.13)  54.10(0.16)  49.66(0.18) 
CoW  85.24(0.14)  52.91(0.25)  49.86(0.23) 
Table 13 shows that ARoW outperforms the other competitors in terms of the generalization and robustness against while maintaining comparable robustness against AutoAttack.
d.2.2 Ablation Studies on and in (3.2)
We present the results of the ablation studies on and in ARoW algorithm with being fixed at 6.
Standard  AutoAttack  

0.10  0  84.10  53.29  49.75 
0.10  0.05  84.40  53.13  49.67 
0.10  0.10  84.49  53.13  49.55 
0.10  0.15  84.02  52.92  49.24 
0.10  0.20  85.30  53.37  49.35 
0.10  0.25  85.48  52.98  49.38 
0.10  0.30  85.96  52.53  48.83 
0.20  0  84.52  53.68  49.96 
0.20  0.05  84.49  53.77  49.86 
0.20  0.10  84.21  53.17  49.12 
0.20  0.15  85.15  53.66  49.96 
0.20  0.20  85.31  54.29  49.67 
0.20  0.25  85.42  53.27  49.52 
0.20  0.30  85.88  53.41  48.86 
0.30  0  84.55  53.53  49.89 
0.30  0.05  84.96  54.23  49.85 
0.30  0.10  85.07  53.90  49.88 
0.30  0.15  85.02  54.18  49.77 
0.30  0.20  85.55  53.92  49.40 
0.30  0.25  85.71  53.57  49.18 
0.30  0.30  86.05  53.62  49.17 
0.40  0  84.65  54.23  50.11 
0.40  0.05  84.80  53.67  49.66 
0.40  0.10  85.10  53.55  49.66 
0.40  0.15  85.30  53.40  49.61 
0.40  0.20  85.59  53.30  49.73 
0.40  0.25  85.98  53.31  49.41 
0.40  0.30  86.15  52.92  49.20 
Table 14 suggests that the choice of and does not affect much to the standard and robust accuracies. However, the results with being either 0.2 and 0.3 and the slightly larger than 0 are favorably compared with the results of other choices. That is, ARoW is not very sensitive to the choice of and but fine tuning of and would be helpful.
d.2.3 Combination of FAT and ARoW
FATTRADES Zhang et al. [2020] is an adversarial training algorithm which uses the earlystopped PGD for generating adversarial examples in TRADES. Similarly, we can apply the earlystopped PGD to ARoW for improving the performance. We propose an adversarial training algorithm FATARoW which is a combination of ARoW and the earlystopped PGD. It minimizes the following regularized empirical risk:
(14) 
where