It is easy to generate human-imperceptible perturbations that put prediction of a deep neural network (DNN) out. Such perturbated samples are called adversarial examples Szegedy et al. (2014) and algorithms for generating adversarial examples are called adversarial attacks. Adversarial attacks can be divided into two types - white-box attacks Goodfellow et al. (2015); Madry et al. (2018); Carlini and Wagner (2017); Croce and Hein (2020a) and black-box attacks Papernot et al. (2016, 2017); Chen et al. (2017); Ilyas et al. (2018); Papernot et al. (2018). In the white-box attack setting, an adversary has access to the model architecture and the value of parameters of a given DNN , while in the black-box attack setting, the adversary has no access to them but can access only to outputs of the prediction model for given inputs.
It is well known that adversarial attacks can greatly reduce the accuracy of DNNs, for example from about 96% accuracy on clean data to almost zero accuracy on adversarial examples Madry et al. (2018). This vulnerability of DNNs can cause serious security problems when DNNs are applied to security critical applications Kurakin et al. (2017); Jiang et al. (2019) such as medicine Ma et al. (2020); Finlayson et al. (2019) and autonomous driving Kurakin et al. (2017); Deng et al. (2020); Morgulis et al. (2019); Li et al. (2020).
The aim of this paper is to develop a new adversarial training algorithm for DNNs, which is theoretically well founded and empirically superior to other existing competitors. A novel feature of the proposed algorithm is to use a data-adaptive regularization for robustifying a prediction model. We impose more regularization for data more vulnerable to adversarial attacks and vice versa. Even though the idea of data-adaptive regularization is not new, our data-adaptive regularization has a firm theoretical base of reducing an upper bound of the robust risk while most of other data-adaptive regularizations are rather heuristically motivated.
1.1 Related Works
Most of existing adversarial training algorithms can be understood as a procedure of estimating the optimal robust prediction model that is the minimizer of the robust population risk. The two most representative adversarial training algorithms are PGD-Training ofMadry et al. (2018) and TRADES of Zhang et al. (2019). Madry et al. (2018) develops the PGD-Training algorithm which minimizes the robust empirical risk directly. In contrast, Zhang et al. (2019) decomposes the robust population risk as the sum of the two missclassification errors - one for clean data and the other for adversarial data, and treats the second term as a regularization term to propose a regularized empirical risk minimization algorithm called TRADES.
Data-adaptive modifications of existing adversarial training algorithms have also received much attention. Zhang et al. (2020) decides the number of iterations of the PGD algorithm, which is a white-box attack algorithm to generate adversarial examples, data-adaptively when generating an adversarial example (early-stopped PGD). Ding et al. (2020)
generates an adversarial example only when a given datum is correctly classified and uses the datum itself as an adversarial example when it is missclassified.Zhang et al. (2021) proposes to minimize the weighted robust empirical risk, where the weight is reciprocally proportional to the distance to the decision boundary. Wang et al. (2020)
devises the regularized robust empirical risk as the sum of the robust empirical risk and regularization term where the weights proportional to the conditional probability of a given datum being missclassified are employed in the regularization term. See Section2 for details of data-adaptive adversarial training algorithms.
1.2 Our Contributions
We propose a new data-adaptive adversarial training algorithm. Novel features of our algorithm compared to the aforementioned data-adaptive adversarial training algorithms are that it is theoretically well motivated, easier to implement and empirically superior. First, we derive an upper bound of the robust risk . Then, we devise a data-adaptive regularized empirical risk which is a surrogate version of our theoretical upper bound. Finally, we learn a robust prediction model by minimizing the proposed data-adaptive regularized empirical risk. By analyzing benchmark data sets, we show that our proposed algorithm is superior to other competitors in view of the generalization (accuracy on clean samples) and robustness (accuracy on adversarial examples) simultaneously to achieve the state-of-the-art performance. In addition, we illustrate that our algorithm is helpful to improve the fairness of the prediction model in the sense that the error rates of each class become more similar compared to a non-adaptive adversarial training algorithm.
A summary of our contributions is as follows :
Theoretically, we derive an upper bound of the robust risk.
We propose a data-adaptive regularized empirical risk which is a surrogate version of the derived upper bound of the robust risk.
Numerical experiments are conducted to show that our algorithm improves the robustness and generalization simultaneously and outperforms the existing state-of-the-art methods.
Our algorithm can mitigate the unfairness due to the disparity between class-wise accuracies.
2.1 Robust Population Risk
Let be the input space, be the set of output labels and be the score function parametrized by neural network parameters such that
is the vector of the conditional class probabilities. Letand be the indicator function.
The robust population risk used in adversarial training is defined as
Most adversarial training algorithms learn by minimizing an empirical version of the above robust population risk. In turn, most empirical versions of (1) require to generate an adversarial example which is an empirical counterpart of
Any method of generating an adversarial example is called an adversarial attack.
2.2 Algorithms for Generating Adversarial Examples
Existing adversarial attacks can be categorized into either the white-box attack Goodfellow et al. (2015); Madry et al. (2018); Carlini and Wagner (2017); Croce and Hein (2020a) or the black-box attack Papernot et al. (2016, 2017); Chen et al. (2017); Ilyas et al. (2018); Papernot et al. (2018). For the white-box attack, the model structure and parameters are known to may be adversary who use these information for generating adversarial examples.
The most popular method for the white-box attack is PGD (Projected Gradient Descent) Madry et al. (2018). Let be a surrogate loss of with for given and PGD finds the adversarial example defined as
by applying the gradient ascent algorithm to to update and projecting it to That is, the update rule of PGD is
There exists a case that an adversary can only use information about outputs for given inputs but cannot access the structure and parameters of the model. In this case, usually, the adversary generates dataset where is an output of a given input . Then, the adversary trains a substitute model by this data sets, and generates adversarial examples from the substitute model Papernot et al. (2017). These kinds of attacks are called the black-box attack.
2.3 Review of Adversarial Training Algorithms
We review some of the adversarial training algorithms which, we think, are related to our proposed algorithm. Typically, adversarial training algorithms consist of the maximization and minimization steps. In the maximization step, we generate adversarial examples for given . In the minimization step, we fix the adversarial example and update . For notational simplicity, we drop in the adversarial examples.
Madry et al. (2018) proposes PGD-Training which updates by minimizing
where is the cross-entropy loss and is an adversarial example obtained by PGD.
Robust risk, natural risk and boundary risk are defined by
Zhang et al. (2019) shows
and proposes the following regularized empirical risk which is a surrogate version of the upper bound of the robust risk:
where is an adversarial example generated by PGD with the KL-divergence.
Data-Adaptive Methods Zhang et al. (2020); Ding et al. (2020); Zhang et al. (2021); Wang et al. (2020)
Geometry Aware Instacne Reweighted Adversarial Training (GAIR-AT) Zhang et al. (2021) is the method which increases the weights of samples that are close to the decision boundary. In other words, GAIR-AT minimizes
where for a prespecified maximum iteration and .
Zhang et al. (2020) suggests early-stopped PGD which stops the iteration of PGD when the model first misclassifies the adversarial example. Friendly Adversarial Training (FAT) minimizes
where is an adversarial sample by PGD and for a prespecified maximum iteration .
3 Adaptive Regularization
In this section, we develop a new data-adaptive regularization algorithm for adversarial training called Anti-Robust Weighted Regularization (ARoW).
3.1 An Upper Bound of the Robust Population Risk
In this subsection, we consider the case of binary response
and the surrogate loss functiongiven as where is a function bounded below by . Examples of include the binary cross-entropy
We take into account the regularized robust risk defined as for a given . The robust risk of (3) is the regularized robust risk with . In the regularized robust risk, is considered as a regularization term, and the regularization parameter controls the trade-off between the generalization ability and robustness to adversarial attacks.
The following theorem provides an upper bound of the regularized robust risk.
For any score function , we have
The upper bound (7) consists of the two terms where the first term is a surrogate loss of the natural risk and the second term is a surrogate version of the boundary risk. The upper bound in (7) can be served as a surrogate regularized robust risk of which will be done in the next subsection.
The second term on the right-hand side of (7) can be reformulated as the expectation of
with respect to and . For given can be considered as a measure of robustness of the prediction model at and therefore the second term is considered to be data-adaptive regularization term. That is, that term enforces the prediction model to be more robust for data whose weights are large.
3.2 Anti-Robust Weighted Regularization (ARoW) Algorithm
Motivated by the upper bound (7) of the robust risk in Section 3.1, in this subsection, we propose a new data-adaptive adversarial training algorithm called the Anti-Robust Weighted Regularization (ARoW) algorithm, which learns by minimizing the following regularized empirical risk:
where , , is the one-hot vector whose the -th entry is 1 and is the vector whose entries are all 1.
First, we employ the label smoothing to estimate the conditional class probabilities more accurately Müller et al. (2019). It is well known that DNNs trained on the original labels are poorly calibrated Guo et al. (2017). The accurate estimation of is important since is used in the regularization term of ARoW.
Second, we replace by because the KL divergence can be considered as multi-class calibration of the term . This modification is also used in TRADES Zhang et al. (2019).
The last modification is to replace by . We employ this modification since it would be helpful not to regularize highly robust samples. That is, we want to consider samples with and as equally robust samples.
The ARoW algorithm is summarized in Algorithm 1.
An Alternative Upper Bound
An alternative adaptive upper bound of the robust population risk can be derived by replacing the term in (7) with . See Appendix B for derivation. From this alternative upper bound, we propose an adversarial training algorithm which minimizes the following regularized empirical risk:
We call the algorithm Confidence Weighted Regularization (CoW).
The roles of the two data-adaptive regularization terms in (3.2) and (10) are quite different. The regularization term in (3.2) encourages the prediction model to be more robust for samples which are more vulnerable to adversarial attacks (i.e. is small). In contrast, the regularization term in (10) puts more regularization on highly confident data (i.e. is large). The idea of focusing more on highly confident data would be reasonable since adversarial attacks for highly confident data result in serious damages. However, numerical studies in Section 4 show ARoW is superior to CoW.
Comparison to MART
The objective functions in MART (6) and ARoW (3.2) are similar. But, there are three main differences. First, the supervised loss term of ARoW is the label smoothing loss with clean samples, whereas MART uses the margin cross entropy loss with adversarial examples. Second, the surrogate loss functions used in PGD are different. In MART, the cross entropy is used while the KL divergence is used in ARoW. Third, the weight in regularization term in MART is proportional to while the weight in ARoW is proportional to . In the numerical studies, we find that ARoW outperforms MART with large margin. This would be partly because ARoW is theoretically well motivated.
In this section, we investigate the ARoW algorithm in view of robustness and generalization by analyzing benchmark data sets. We show that ARoW is superior to other competitors such as Madry et al. (2018); Zhang et al. (2019); Wang et al. (2020); Rade and Moosavi-Dezfolli (2022); Zhang et al. (2021) as well as CoW. In addition, we carry out ablation studies to illustrate that the adaptive regularization is a key component for the success of ARoW compared to TRADES. We show that ARoW improves the robustness of data vulnerable to adversarial attacks more than TRADES does. Moreover, ARoW improves the fairness of the prediction model in the sense that the error rates of each class become more similar. For benchmark data sets, we use CIFAR10 without and with unlabeled data Carmon et al. (2019), F-MINST Xiao et al. (2017)
and SVHN datasetNetzer et al. (2011).
for investigating how ARoW works well depending on the capacity of the model. WideResNet34-10 is more complex than ResNet18. For F-MNIST and SVHN, ResNet18He et al. (2016) is used.
In the main manuscript, we only present the results for CIFAR10 with WideResNet34-10, and defer the results for unlabeled CIFAR10 of Carmon et al. (2019) with WideResNet34-10, CIFAR10 with ResNet18, FMINST and SVHN to Appendices D.1.5, D.2, E.1 and E.2, respectively.
The datasets are normalized into [0, 1]. For generating adversarial examples in the training phase, PGD with random start, , and is used, where PGD is PGD in (2) with iterations. For learning prediction models, the SGD with momentum , weight decay , an initial learning rate of 0.1 and batch size of 128 are used and the learning rate is reduced by a factor of 10 at 60 and 90 epochs. The final model is set to be the best model against PGD on the test data among those obtained until 120 epochs. The random crop and random horizontal flip with probability 0.5 are applied for data augmentation. For CIFAR10, stochastic weighting average (SWA) Izmailov et al. (2018) is employed after 50-epochs for preventing from robust overfitting Rice et al. (2020) as Chen et al. (2021) does.
For evaluating the robust accuracy in the test phase, PGD and AutoAttack are used for adversarial attacks, where AutoAttack consists of three white box attacks - APGD and APGD-DLR in Croce and Hein (2020b) and FAB Croce and Hein (2020a) and one black box attack - Square Attack Andriushchenko et al. (2020). To the best of our knowledge, AutoAttack is the strongest attack.
|Method||Standard||Madry et al. (2018)||AutoAttack Croce and Hein (2020b)|
|PGD-Training Madry et al. (2018)||87.02(0.20)||57.50(0.12)||53.98(0.14)|
|TRADES Zhang et al. (2019)||85.86(0.09)||56.79(0.08)||54.31(0.08)|
|HAT Rade and Moosavi-Dezfolli (2022)||86.98(0.10)||56.81(0.17)||54.63(0.07)|
|GAIR-AT Zhang et al. (2021)||85.44(0.10)||67.27(0.07)||46.41(0.07)|
We conduct the experiment three times with different seeds and present the averages of the accuracies with the standard errors in the brackets.
|Method||Standard||APGD||APGD-DLR||FAB Croce and Hein (2020a)||SQUARE Andriushchenko et al. (2020)|
|GAIR-AT Zhang et al. (2021)||85.44(0.170)||63.14(0.16)||46.48(0.07)||49.35(0.05)||55.19(0.16)|
4.1 Performance Evaluation
|Initial Robustness||Rob.TRADES||Rob.ARoW||Diff.||Rate of Impro. (%)|
|Method||Standard||Madry et al. (2018)||AutoAttack Croce and Hein (2020b)|
. We report the accuracy (ACC), the worst-class accuracy (WC-Acc) and the standard deviation of class-wise accuracies (SD) for each method.
|Class||Rob.Both (%)||Rob.TRADES (%)||Rob.ARoW (%)|
Table 1 reports the standard accuracies on clean samples (generalization) and the robust accuracies for adversarial examples generated by and AutoAttack Croce and Hein (2020b) (robustness) for various adversarial training algorithms including ARoW.
The regularization parameters, if any, are set to be the ones given in the corresponding articles while the regularization parameter in ARoW and CoW is set to be 6 which is the value used in TRADES. The other regularization parameters and in ARoW and CoW are selected so that the robust accuracy against PGD is maximized. The regularization parameters used in the training phase are summarized in Appendix D.1.1.
HAT Rade and Moosavi-Dezfolli (2022) in Table 1 is the state-of-art algorithm against to AutoAttack Croce and Hein (2020b). HAT is a variation of TRADES with an additional regularization term regarding helper examples. The additional regularization term restrains decision boundary from excessive margins. The objective function of HAT is given in Appendix 12.
The results indicate that ARoW is superior to the other competitors in view of robustness as well as generalization. Moreover, we can improve the robustness of ARoW a lot if the generalization sacrifices a little bit, whose results are presented in Appendix D.1.2.
We observe that GAIR-AT Zhang et al. (2021) is robust to , but not robust to AutoAttack Croce and Hein (2020b) in Table 1. For cheking whether the gradient masking occurs, we evaluate the robustness of GAIR-AT against to the four attacks in AutoAttack. In Table 2, the robustness of GAIR-AT is degraded much for the three attacks in AutoAttack except APGD while the robustness of ARoW remains stable regardless of adversarial attacks. This observations suggest that the gradient masking occurs in GAIR-AT while it does not in ARoW.
4.2 Ablation Study : Comparison of ARoW and TRADES
In this subsection, we compare ARoW and TRADES Zhang et al. (2019) since ARoW is a data-adaptive modification of TRADES.
In Figure 1, we compare the performances (standard accuracy vs robust accuracy) of ARoW and TRADES Zhang et al. (2019) for various choices of the regularization parameter We can see that ARoW uniformly dominates TRADES with respect to the regularization parameters regardless of the methods for adversarial attack.
The Effect of Adaptive Regularization
We investigate how the adaptive regularization in ARoW affects the robustness of each sample of CIFRA10. First, we divide the test data into four groups - highly vulnerable, vulnerable, robust and highly robust according to the values of ( , and ), where is the parameter learned by PGD-Training Madry et al. (2018). Then, for samples of each group, we check how many samples become robust for ARoW and TRADES, respectively, whose results are presented in Table 3. We can see that ARoW is superior in making non-robust samples (highly vulnerable or vulnerable) to robust ones compared with TRADES. We believe that this improvement is mainly due to the adaptive regularization term in ARoW that enforces more regularization on more vulnerable samples.
The Effect of Label Smoothing
In Table 4, we investigate the effects of label smoothing in ARoW and TRADES Zhang et al. (2019). The label smoothing is helpful not only for ARoW but also for TRADES. This would be partly because the regularization terms depend on the conditional class probabilities and it is well known that the label smoothing is helpful for the calibration of the conditional class probabilities Pereyra et al. (2017). Note that ARoW is even superior to TRADES even without label smoothing.
Xu et al. (2021) reports that TRADES Zhang et al. (2019) increases the variation of the per-class accuracies (accuracy in each class) which is not desirable in view of fairness. In turn, Xu et al. (2021) proposes the Fair-Robust-Learning (FRL) algorithm to alleviate this problem. Even if the fairness becomes better, the standard and robust accuracies of FRL become worse than TRADES.
In contrast, Table 5 shows that ARoW improves the fairness as well as the standard and robust accuracies compared to TRADES. Also, in Table 6, we can see that ARoW is highly effective in difficult classes such as Bird, Cat and Deer which contain a lot of non-robust samples. These desirable properties of ARoW can be partly understood as follows. The main idea of ARoW is to impose more robust regularization to non-robust samples. In turn, samples in less accurate classes tend to be vulnerable to adversarial attacks. Thus, ARoW improves the robustness of samples in less accurate classes which results in improved robustness as well as standard accuracies for such less accurate classes.
5 Conclusion and Future Works
In this paper, we derived an upper bound of the robust risk to develop the adaptive regularization algorithm for adversarial training and showed by numerical experiments that the adaptive regularization improves the robust accuracy as well as the standard accuracy.
Our proposed algorithms can be considered as a modifications of TRADES Zhang et al. (2019), which is a non-adaptive regularization algorithm. The idea of adaptive regularization, however, is not limited to TRADES and could be applied to other existing adversarial training algorithms including HAT Rade and Moosavi-Dezfolli (2022), GAIR Zhang et al. (2021), MMA Ding et al. (2020), FAT Zhang et al. (2020) and so on without much difficulty.
We have seen in Section 4.2 that ARoW improves the fairness as well as the accuracy compared to TRADES. The advantage of ARoW in terms of the fairness is an unexpected by-product, and it would be interesting to develop a more principled way of enhancing the fairness further without hampering the accuracy.
- Square attack: a query-efficient black-box adversarial attack via random searchg. In ECCV. Cited by: §4, Table 2.
- Towards evaluating the robustness of neural networks. IEEE Symposium on Security and Privacy. Cited by: §1, §2.2.
- Unlabeled data improves adversarial robustness. In NeurIPS . Cited by: §D.1.5, §4, §4.
- ZOO: zeroth order optimization based black-box attacks to deep neural networks without training substitute models. In ACM. External Links: Cited by: §1, §2.2.
- Robust overfitting may be mitigated by properly learned smoothening. In ICLR. Cited by: §4.
Minimally distorted adversarial examples with a fast adaptive boundary attack.
In The European Conference on Computer Vision(ECCV). Cited by: §1, §2.2, §4, Table 2.
- Reliable evaluation of adversarial robustness with an ensemble of diverse parameter-free attacks. Cited by: §D.2.3, §4, §4.1, §4.1, §4.1, Table 1, Table 4.
- An analysis of adversarial attacks and defenses on autonomous driving models. IEEE International Conference on Pervasive Computing and Communications(PerCom). Cited by: §1.
- MMA training: direct input space margin maximization through adversarial training.. In International Conference on Learning Representataions(ICLR) . Cited by: §1.1, 4th item, §2.3, §3.1, §5.
Adversarial attacks against medical deep learning systems. In Science. Cited by: §1.
- Explaining and harnessing adversarial examples. In ICLR . Cited by: §1, §2.2.
- On calibration of modern neural networks. In ICML. Cited by: §3.2.
- Deep residual learning for image recognition. In CVPR. Cited by: §4.
- Black-box adversarial attacks with limited queries and information. In ICML. Cited by: §1, §2.2.
Averaging weights leads to wider optima and better generalization.
Proceedings of the international conference on Uncertainty in Artificial Intelligence. Cited by: Table 9, §4.
- Black-box adversarial attacks on video recognition models. In ACM. Cited by: §1.
- Adversarial examples in the physical world. In ICLR . Cited by: §1.
- Adaptive square attack: fooling autonomous cars with adversarial traffic signs. IEEE Internet of Things Journal. Cited by: §1.
SGDR: stochastic gradient descent with warm restarts. In ICLR. Cited by: §D.1.5.
- Understanding adversarial attacks on deep learning based medical image analysis systems. Pattern Recognition. Cited by: §1.
- Towards deep learning models resistant to adversarial attacks. In ICLR. Cited by: Table 11, Table 13, Table 9, Table 18, Table 20, §1.1, §1, §1, §2.2, §2.2, §2.3, §4.2, Table 1, Table 4, §4.
- Fooling a real car with adversarial traffic signs. ArXiv. Cited by: §1.
- When does label smoothing help?. In NeurIPS. Cited by: §3.2.
- Reading digits in natural images with unsupervised feature learning. NIPS Workshop on Deep Learning and Unsupervised Feature Learning 2011. Cited by: Appendix E, §4.
Practical black-box attacks against machine learning. In ACM. Cited by: §1, §2.2, §2.2.
- Transferability in machine learning: from phenomena to black-box attacks using adversarial samples. arXiv. Cited by: §1, §2.2.
- Towards the science of security and privacy in machine learning. 2018 IEEE European Symposium on Security and Privacy (EuroS&P). Cited by: §1, §2.2.
- Regularizing neural networks by penalizing confident output distributions. In ICLR. Cited by: §4.2.
- Recuding excessive margin to achieve a better accuracy vs. robustness trade-off. In ICLR. Cited by: §C.2, §D.1.1, §D.1.2, §D.1.2, §D.2.1, Table 11, Table 13, Table 8, Table 9, Table 18, Table 20, Appendix E, §4.1, Table 1, §4, §5.
- Overfitting in adversarially robust deep learning. In ICML . Cited by: §4.
- Intriguing properties of neural networks. In ICLR. Cited by: §1.
- Improving adversarial robustness requires revisiting misclassified examples. In International Conference on Learning Representataions(ICLR)) . Cited by: §D.1.1, Table 11, Table 13, Table 9, Table 18, Table 20, §1.1, 3rd item, §2.3, §3.1, §4.
- Fashion-mnist: a novel image dataset for benchmarking machine learning algorithms. archive. Cited by: Appendix E, §4.
- To be robust or to be fair: towards fairness in adversarial training. In ICML. Cited by: §4.2.
- Wide residual networks. Proceedings of the British Machine Vision Conference 2016. Cited by: §4.
- Theoretically principled trade-off between robustness and accuracy. In International Conference on Machine Learning (ICML) . Cited by: §D.1.2, Table 11, Table 13, Table 8, Table 9, Table 18, Table 20, Appendix E, §1.1, §2.2, §2.3, §3.1, §3.2, §4.2, §4.2, §4.2, §4.2, Table 1, §4, §5.
- Attacks which do not kill training make adversarial learning stronger. In ICML. Cited by: §D.2.3, §D.2.3, §1.1, 2nd item, §2.3, §3.1, §5.
- Geometry-aware instance-reweighted adversarial training. In ICLR. Cited by: Table 13, Table 9, Table 18, Table 20, §1.1, 1st item, §2.3, §3.1, §4.1, Table 1, Table 2, §4, §5.
Appendix A Implementation Details
All experiments are performed on NVIDIA TITAN RTX and NVIDIA Quadro RTX 4000. In the training phase, automatic mixed precision package in pytorch is used to speed up learning.
Appendix B Theoretical Results
In this section, we provide the proofs of Theorem 1 and the alternative upper bound.
Note that and . It suffices to show that , which holds since
Theorem 2 (Alternative Upper Bound).
For any score function , we have
Note that and . It suffices to show that , which holds since
Appendix C Adversarial Training Algorithms
In this section, we explain the CoW algorithm and extra adversarial training algorithms included in our experiments but not discussed in the manuscript.
c.1 Confidence Weighted Regularzation (CoW)
For notational simplicity, we define as
The CoW algorithm is summarized in Algorithm 2.
c.2 Helper Based Adversarial Training (HAT)
HAT Rade and Moosavi-Dezfolli  is a variation of TRADES with an additional regularization term by using helper examples. The role of helper examples is to restrain the decision boundary from excessive margins. HAT minimizes the following regularized empirical risk:
where is the parameter of a pre-trained model only with clean samples, and is an adversarial example generated by PGD with the KL-divergence.
c.3 Helper Based Adversarial Training - Anti-Robust Weighted Regularization (HAT - ARoW)
We consider the combination of HAT and ARoW for improving the performance when SWA is not used. We call it HAT-ARoW which minimizes the following regularized empirical risk:
In Appendix D.1.3, we investigate the performance of HAT-ARoW.
Appendix D Additional Experiments - CIFAR10
In this section, we present the results of additional experiments and ablation studies for CIFAR10.
d.1 CIFAR10 - WideResNet34-10
In this subsection, we present the results for analyzing CIFAR10 with the WideResNet34-10 architecture.
d.1.1 Hyperparameter Selection
The regularization parameters are set to be the ones given in the corresponding articles if available while the regularization parameter in MART, ARoW and CoW is set to be 6 which is the value used in TRADES. The other regularization parameters and in ARoW and CoW are selected so that the robust accuracy against is maximized for fixed . Also, the regularization parameters and in HAT-ARoW (C.3) are selected so that the robust accuracy against PGD is maximized. In HAT (12), the regularization parameters and are set to be 4 and 0.25 which are values used in Rade and Moosavi-Dezfolli  for CIFAR10. When SWA is not used, the regularization parameter in ARoW and CoW is selected so that the robust accuracy against is similar to that of HAT. In HAT-ARoW, the regularization parameter is set to be 8 that is used in ARoW and CoW. Weight decay parameter is set to be which is used in Wang et al. , for MART while it is set to be for the other methods since MART works poorly with
d.1.2 Comparison of TRADES, HAT and ARoW for various values of the regularization parameter
In this subsection, we present the generalization and robustness of TRADES, HAT and ARoW against and AutoAttack with varying . The experiments are implemented with SWA, and in (12) used in HAT is set to be 0.25 which is the value used in Rade and Moosavi-Dezfolli .
d.1.3 Effect of Stochastic Weight Averaging (SWA)
We conduct the experiments with and without SWA to identify the effectiveness of SWA.
|SWA Izmailov et al. ||PGD-Training Madry et al. ||87.02(0.20)||57.50(0.12)||53.98(0.14)|
|TRADES Zhang et al. ||85.86(0.09)||56.79(0.08)||54.31(0.08)|
|HAT Rade and Moosavi-Dezfolli ||86.98(0.10)||56.81(0.17)||54.63(0.07)|
|GAIR-AT Zhang et al. ||85.44(0.10)||67.27(0.07)||46.41(0.07)|
|w/o-SWA||PGD-Training Madry et al. ||86.88(0.09)||54.15(0.16)||51.35(0.14)|
|TRADES Zhang et al. ||85.48(0.12)||56.06(0.08)||53.16(0.17)|
|MART Wang et al. ||84.69(0.18)||55.67(0.13)||50.95(0.09)|
|HAT Rade and Moosavi-Dezfolli ||87.53(0.02)||56.41(0.09)||53.38(0.10)|
|GAIR-AT Zhang et al. ||84.49(0.06)||62.11(0.12)||38.48(0.36)|
In Table 9, we observe that SWA is effective for most of methods. Note that ARoW performs well even without SWA. But, HAT, which is known to be the SOTA method, has the best robust accuracy against to AutoAttack without SWA. In this case, combining ARoW and HAT together (C.3), we can improves the accuracies further. We omit the results for MART since SWA degrades the performance.
d.1.4 Per-class accuracies of ARoW and TRADES
In Table 10, we present the per-class robust and standard accuracies of the prediction models trained by ARoW and TRADES.
|Class||Rob.ARoW (%)||Rob.TRADES (%)||ARoW (%)||TRADES (%)|
In Table 10, we can see that ARoW is highly effective for classes difficult to be classified such as such as Bird, Cat and Deer. For such classes, ARoW improves much not only the standard accuracies but also the robust accuracies. For example, in the class ’Cat’, which is the most difficult class (the lowest standard accuarcy for TRADES), the robustness and generalization are improved by and by ARoW compared with TRADES. This phenomenon would be partly due to the data-adaptive regularization used in ARoW. Usually, difficult classes are less robust to adversarial attacks. In turn, ARoW puts more regularization on non-robust classes and thus improves the accuracies of non-robust classes more.
d.1.5 Additional Unlabeled Data
In this subsection, we present the results on CIFAR10 with unlabeled data used in Carmon et al. . In the training phase, SWA is not used and the cosine annealing learning rate scheduler Loshchilov and Hutter  is used. The final model is set to be the best model against PGD on the test data among those obtained until 400 epochs.
|PGD-Training Madry et al. ||91.97||61.42||58.90|
|TRADES Zhang et al. ||90.59||62.72||59.99|
|MART Wang et al. ||91.04||64.10||59.33|
|HAT Rade and Moosavi-Dezfolli ||91.54||63.45||60.15|
In Table 11, we observe that ARoW outperforms the other competitors in terms of generalization while maintaining good robustness. MART has the best robustness against but has poor robustness against AutoAttack.
d.2 CIFAR10 - ResNet18
In this subsection, we summarize the results for analyzing CIFAR10 with the ResNet18 architecture which is smaller than WideResNet34-10.
d.2.1 Hyperparameter Selection and Performance
The regularization parameter in TRADES, MART, HAT, ARoW and CoW is set to the values used in Table 7. For HAT, the regularization parameters and are set to be 4 and 0.5 which are the values used in Rade and Moosavi-Dezfolli .
|PGD-Training Madry et al. ||82.42(0.05)||53.48(0.11)||49.30(0.07)|
|TRADES Zhang et al. ||82.41(0.07)||52.68(0.22)||49.63(0.25)|
|MART Wang et al. ||74.87(0.95)||53.68(0.30)||46.61(0.24)|
|HAT Rade and Moosavi-Dezfolli ||83.05(0.03)||52.91(0.08)||49.60(0.02)|
|GAIR-AT Zhang et al. ||81.09(0.12)||64.89(0.04)||41.35(0.16)|
Table 13 shows that ARoW outperforms the other competitors in terms of the generalization and robustness against while maintaining comparable robustness against AutoAttack.
d.2.2 Ablation Studies on and in (3.2)
We present the results of the ablation studies on and in ARoW algorithm with being fixed at 6.
Table 14 suggests that the choice of and does not affect much to the standard and robust accuracies. However, the results with being either 0.2 and 0.3 and the slightly larger than 0 are favorably compared with the results of other choices. That is, ARoW is not very sensitive to the choice of and but fine tuning of and would be helpful.
d.2.3 Combination of FAT and ARoW
FAT-TRADES Zhang et al.  is an adversarial training algorithm which uses the early-stopped PGD for generating adversarial examples in TRADES. Similarly, we can apply the early-stopped PGD to ARoW for improving the performance. We propose an adversarial training algorithm FAT-ARoW which is a combination of ARoW and the early-stopped PGD. It minimizes the following regularized empirical risk: