Deep neural networks (DNNs) has been successfully used in wide applications such as image, speech 4]. Though DNNs have high generalization performance, they are vulnerable to adversarial examples, which are imperceptibly perturbed data to make DNNs misclassify data 
. For the real-world applications of deep learning, DNNs need to be made secure against this vulnerability.
Adversarial training trains models with adversarial examples. wu2020adversarial wu2020adversarial experimentally shows that models obtain a high generalization performance in adversarial training when they have a flat weight loss landscape, which is the loss change with respect to the weight. However, prabhu2019understanding prabhu2019understanding experimentally finds that adversarial training sharpens the weight loss landscape more than standard training, which is training without adversarial noise. Thus, the following question is important to answer: Does Adversarial Training Always Sharpen the Weight Loss Landscape? If the answer is yes, adversarial training always has a larger generalization gap than standard training because the sharp weight loss landscape degrades the generalization performance.
In this paper, to answer this question theoretically, we focus on the logistic regression model with adversarial training as the first theoretical analysis step. First, we use the definition of sharpness of the weight loss landscape that uses the eigenvalues of the Hessian matrix. Next, to simplify the analysis, we decompose the adversarial example into magnitude and direction. Finally, we show that the eigenvalues of the Hessian matrix in the weight loss landscape are proportional to the norm of the noise in the adversarial noise. As a results, we theoretically show that the weight loss landscape becomes sharper as the noise in the adversarial training becomes larger as shown in Theorem 1. We experimentally confirmed Theorem 1
on dataset MNIST2, which restricts MNIST to two classes. Moreover, we experimentally show that in multi-class classification with the nonlinear model (ResNet18) as a more general case, the weight loss landscape becomes sharper as the noise in the adversarial training becomes larger. Finally, to check whether the sharpness of the weight loss landscape is a problem specific to the adversarial example, we compare the weight loss landscapes of training on random noise with training on adversarially perturbed data in logistic regression. As a result, we confirmed theoretically and experimentally that the adversarial noise much sharpens the weight loss landscape more than the random noise. This is caused by noise always being added to adversarial examples in the direction that makes the loss worse. We conclude that the sharpness of the weight loss landscape needs to be reduced for reducing the generalization gap in adversarial training because the weight loss landscape becomes sharp in adversarial training, and the generalization gap becomes large.
Our contributions are as follows:
We show theoretically and experimentally that in logistic regression with the constrained norm, the weight loss landscape becomes sharper as the norm of the adversarial training noise increases.
We show theoretically and experimentally that adversarial noise in the data space sharpens the weight loss landscape in logistic regression much more than random noise (random noise does not sharpen it extremely).
We experimentally show that the larger norm of the noise of adversarial training makes the loss landscape sharper in the nonlinear model (ResNet18) with softmax. As a results, the generalization gap becomes larger as the norm of adversarial noise becomes large.
2.1 Logistic Regression
We consider a binary classification task with and . A data point is represented as , where is the data index and its true label is
. A loss function of logistic regression is defined as
where is the total number of data and is the training parameter of the model.
2.2 Adversarial Example
Adversarial example is defined as
where is the region around where the norm is less than . Projected gradient decent (PGD) , which is a powerful adversarial attack, uses multi-step search as
where is step size and is projection to constrained space. Recently, since PGD can be prevented by gradient obfuscation, auto attack , which tries various attacks, is used to more reliably evaluate robustness.
2.3 Adversarial Training
Since the adversarial training uses the information of the adversarial example, the adversarial training can improve the robustness of the model against the adversarial attack used for learning. In this paper, we describe models trained in standard training and adversarial training as clean and robust models, respectively.
2.4 Visualizing Loss Landscape
li2018visualizing li2018visualizing presented a filter normalization for visualizing the weight loss landscape. The filter normalization works for not only standard training but also for adversarial training . The filter normalization with adversarial training visualizes the change of loss by adding noise to the weight as
where is the magnitude of noise and is the direction of noise. The
is sampled from a Gaussian distribution and filter-wise normalized by
where is the -th filter at the -th layer of and is the Frobenius norm. We note that the Frobenius norm is equal to the norm in logistic regression because can be regarded as . This normalization is used to remove the scaling freedom of weights for avoiding the fake loss landscape . For instance, dinh2017sharp dinh2017sharp exploit this scaling freedom of weights to build pairs of equivalent networks that have different apparent sharpnesses. The filter normalization absorbs the scaling freedom of the weights and enables the loss landscape to be visualized. The sharpness of the weight loss landscape strongly correlates with the generalization gap when this normalization is used . Moreover, the sharpness of the weight loss landscape on adversarial training strongly correlates with a robust generalization gap when this normalization is used .
3 Theoretical Analysis
Since the robust model is trained by adding adversarial noise, the data loss landscape becomes flat. On the other hand, wu2020adversarial wu2020adversarial experimentally confirmed that the weight loss landscape becomes sharp and the generalization gap becomes large in adversarial training. In this section, we theoretically show that the weight loss landscape becomes sharp in adversarial training.
3.1 Definition of Weight Loss Landscape Sharpness
Several weight loss landscape sharpness/flatness definitions have been presented in the study of generalization performance and loss landscape sharpness in deep learning [9, 13, 1]. In this paper, we use the simple definition, which defines flatness as the eigenvalues of the Hessian matrix. This definition is presented in  and also used in . For clarifying the relationship between the weight loss landscape and the generalization gap, we use filter normalization with weights. In other words, the definition of weight loss landscape sharpness is the eigenvalue of with the normalized weights. The larger the eigenvalue becomes, the sharper the weight loss landscape becomes.
3.2 Main Results
This section presents Theorem 1, which provides the relation between weight loss landscape and the norm of adversarial noise.
When the loss of the linear logistic regression model converges to the minimum for each data point (), the weight loss landscape become sharper for robust models trained with a large norm of adversarial noise that is the constrained in the norm.
where and is the norm due to the inner product. Since an adversarial example must increase the loss, Lemma 1 shows .
is a monotonically increasing function of when , while is a monotonically decreasing function when .
All the proofs of lemmas are provided in the supplementary material.
The gradient of loss with respect to weight is
An optimal weight of for each point satisfies . Next, we consider the Hessian matrix of loss on an optimal weight. The -th element of the Hessian matrix is obtained as
See the supplementary material for derivation. We consider the eigenvalues of the Hessian matrix. This matrix has a trivial eigenvector as
where is the element of
which is an arbitrary vector orthogonal to. This matrix has eigenvalues and . Since is positive-semidefinite, determines the sharpness of the weight loss landscape. Let be an adversarial example for perturbation strength . The optimal weights in adversarial training with and are and . The relation between the eigenvalues is as in
since the filter normalization makes the scale of the weights the same (In particularly in the case of logistic regression, this is natural because ). Therefore, Theorem 1 is proved from Eq. (12). Moreover, since the eigenvalues are never negative, the results of adversarial training are always convex, which is a natural result.
To clarify whether noise in data space sharpens the weight loss landscape is a phenomenon unique to adversarial training, we consider the weight loss landscape which trained on random noise. Let us assume a uniform distribution noise. We have the following theorem:
The logistic regression model trained with a uniform distribution noise, which is the constrained norm as . The eigenvalues of the Hessian matrix of this model converge to as the loss converges to minimum for each data point (). Furthermore, when the loss deviates slightly from the minimum, the Hessian matrix’s eigenvalues become larger along with the norm of the random noise.
This theorem shows that the weight loss landscape of the model trained with random noise is not as sharp as the weight loss landscape of the adversarial training model.
Let us prove the Theorem 2. The derivative of the loss with arbitrary noise is
where . The -th element of the Hessian matrix is obtained as
We used . Since the derivative of is zero on the optimal weight, the numerator of Eq. (13) becomes zero as . Thus, the eigenvalue of the Hessian matrix becomes on . In other words, the weight loss landscape of the model trained with random noise converges closer to flat as the loss converges to the minimum. Let us consider the case where is sufficiently small but takes a finite value independent of . Since we assume the uniform distribution noise, and uncorrelated and the mean of is zero
, and the variance of the noise is. Thus, in the large can be written as
Thus, when is sufficiently small but takes a finite value independent of , the weight loss landscape becomes sharp along with the norm of random noise. We note that since is an element of the unit matrix, it increases the eigenvalue of the Hessian matrix. Considering Theorem 1 and Theorem 2, the adversarial noise sharpens the weight loss landscape much more than the weight landscape on training with random noise. We have considered random noise constrained to the norm, but this statement holds if the mean of the random noise is zero and the variance of the random noise increases as the random noise norm increases. For example, considering Gaussian noise , the same conclusion holds when the noise is norm constrained.
4 Related Work
4.1 Weight Loss Landscape in Adversarial Training
prabhu2019understanding prabhu2019understanding and wu2020adversarial wu2020adversarial experimentally studied the weight loss landscape in adversarial training. wu2020adversarial wu2020adversarial demonstrated that the model with a flatter weight loss landscape has a smaller generalization gap and presented the Adversarial Weight Perturbation (AWP), which consequently achieves a more robust accuracy . prabhu2019understanding prabhu2019understanding reported that the robust model has a sharper loss landscape than the clean model. However, these studies did not theoretically analyze the weight loss landscape. Recently, liu2020loss liu2020loss theoretically analyzed the loss landscapes in the robust model. The main topic of  is a discussion of Lipschitzian smoothness in adversarial training, and their supplementary material contains a theoretical analysis of the weight loss landscape in logistic regression. The difference between the analysis of  and our analysis is that liu2020loss liu2020loss compared the weight loss landscape of different positions in the single model ( and ), while we compare the weight loss landscape of the different models trained with different magnitudes of adversarial noise ( and , where and mean each optimal weights). Also, liu2020loss liu2020loss used an approximation of the loss function as to simplify the problem. In contrast, we do not use an approximation of the loss function as Eq. (7). liu2020loss liu2020loss derived that since the logistic loss is a monotonically decreasing convex function, adding noise in the direction of increasing the loss must increase the gradient, which leads to a sharp weight loss landscape.
4.2 Weight Loss Landscape in Standard Training
The relationship between weight loss landscape and generalization performance in deep learning has been theoretically and experimentally investigated [6, 12, 13, 5]. In a large experiment evaluating 40 complexity measures by , measures based on the weight loss landscape had the highest correlation with the generalization error, and the generalization performance becomes better as the weight loss landscape becomes flatter. For improving the generalization performance by flattening the weight loss landscape, several methods have been presented, such as operating on diffused loss landscape , local entropy regularization , and optimizer Sharpness-Aware Minimization (SAM) , which searches for a flat weight loss landscape. In particular, since a recently presented SAM 
is an improvement of the optimizer, it can be adapted to various methods and achieved a strong experimental result that updated the state-of-the-art for many datasets including CIFAR10, CIFAR100, and ImageNet. Since the weight loss landscape becomes sharper, and the generalization gap becomes larger in adversarial training than in standard training, we believe that finding a flat solution is more important in adversarial training than in standard training.
To verify the validity of Theorem 1, we visualize the sharpness of the weight loss landscape with various noise magnitudes in logistic regression in section 5.1. Next, to investigate whether the sharpness of the weight loss landscape is a problem peculiar to adversarial training, we compare the training on random noise with the training on adversarial noise in section 5.1. Finally, we visualize the weight loss landscape in a more general case (multi-class classification by softmax with deep learning) in section 5.2. We also show the relationship between the weight loss landscape and the generalization gap.
We provide details of the experimental conditions in the supplementary material. We used three datasets: MNIST2, CIFAR10 
, and SVHN. MNIST2 is explained the below subsection. We used PGD for adversarial training and robust generalization gap evaluation. For robustness evaluation, auto attack should be used. However, we did not use auto attack in our experiments since we are focusing on the generalization gap. For visualizing the weight loss landscape, we used filter normalization introduced in section 2.4. The hyper-parameter settings for PGD were based on  in MNIST2 and CIFAR10. Since there were no experiments using SVHN in , we used the hyper-parameter setting based on  for SVHN. The norm of the perturbation was set to for MNIST2. The norm of the perturbation was set to for MNIST2 and for CIFAR10 and SVHN. For PGD, we updated the perturbation for iterations at training time and iterations at evaluation time on MNIST2. A step size of PGD is , while a step size of PGD is . For CIFAR10 and SVHN, we updated the perturbation for iterations at training time and iterations at evaluation time. A step size of PGD is in CIFAR10, while a step size of PGD is in SVHN. For random noise, we used uniformly distributed noise that is constrained in the or norm.
|train acc||test acc||gap|
|train acc||test acc||gap|
|train acc||test acc||gap|
|train acc||test acc||gap|
5.1 Binary Logistic Regression
We conducted experiments on image dataset MNIST 
, which is well known for its adversarial example settings. Since the linear logistic regression model did not perform well in classifying the two-class CIFAR10, we evaluated on MNIST. To experiment with logistic regression for binary classification, we created a two-class dataset MNIST2 from MNIST. We made MNIST2 using only MNIST class 0 and class 1.Figures 2 and 2 show the weight loss landscape with various noise magnitudes of the and norm in logistic regression. We can confirm that the weight loss landscape becomes sharp as the noise magnitude increases. For the sake of clarity, we have excluded the ranges with large in from Fig. 2. The results for the large in norm, which is often used in the 10-class classification of MNIST, are included in the supplementary material. The results are similar: the larger the noise magnitude, the sharper the weight loss landscape. Table 1 also shows that the absolute value of the generalization gap is larger in adversarial training than in standard training (). The test robust accuracy is larger than the training accuracy because of early stopping in the test robust accuracy , but we emphasize that the training accuracy and test accuracy diverge. In Table. 3, the relationship between the generalization gap and is difficult to understand because the experiment was designed with a small noise to achieve training loss becomes zero. However, where is a large range in Tab. 3, the absolute value of the generalization gap increases as increases in the experiment.
|train acc||test acc||gap|
|train acc||test acc||gap|
Eigenvalue of Hessian matrix
We analyze the eigenvalue of the Hessian matrix for confirming Eq. (9). The results of the linear logistic regression model in MNIST2 are shown. Figure 3 shows the top three eigenvalues of the model trained with PGD and
PGD with different epsilons. These figures show that the eigenvalues are nearly linear with respect to epsilon, as in our theoretical analysis. We estimated the eigenvalues without filter normalization, but we checked that the norm of weight does not change significantly with each epsilon.
Comparison with Random Noise
To clarify whether noise in data space sharpens the weight loss landscape is a phenomenon unique to adversarial training, we compare the weight loss landscape of learning with random noise and learning with adversarial noise (adversarial training with PGD). We used a logistic regression model in MNIST2, and random noise was generated with a uniform distribution constrained to the and , and then constrained to fit in the range of the normalized image along with the image. Figures 2 and 2 compare the weight loss landscape in adversarial noise with random noise in the and norm, respectively. As seen in our theoretical analysis, these figures show that the adversarial noise sharpens the weight loss landscape much more than the weight landscape on training with random noise for both the and norms. As a result, the generalization gap in random noise is smaller than that in the adversarial noise as shown in Tables 4 and 2.
5.2 Multi-class Classification
In the more general case as multi-class classification using softmax with residual network , we confirm that the weight loss landscape becomes sharper in adversarial training as well as logistic regression. We use ResNet18  with softmax in CIFAR10 and SVHN. Figure 4 shows the weight loss landscape in . We confirmed that the weight loss landscape becomes sharper as the noise magnitude of adversarial training becomes larger. Tables 6 and 5 also shows that the generalization gap becomes larger in most cases as the magnitude of noise becomes larger.
6 Conclusion and Future work
In this paper, we show theoretically and experimentally that the weight loss landscape becomes sharper when the noise in adversarial training is strong with the linear logistic regression model. In linear logistic regression, we also showed that not all data space noises make the landscape extremely sharp, but adversarial examples make the weight loss landscape extremely sharp, both theoretically and experimentally. The theoretical analysis in a more general nonlinear model (such as a residual network) with softmax is future work. To motivate future work, we experimentally showed that the weight loss landscape becomes sharper when the noise of adversarial training becomes larger. We conclude that the sharpness of the weight loss landscape needs to be reduced for reducing the generalization gap in adversarial training because the weight loss landscape becomes sharp in adversarial training, and the generalization gap becomes large.
Appendix A Proof of Lemma 1
We show that loss is monotonically increasing for when and monotonically decreasing when .
Appendix B Hessian matrix on the optimal weight
Compute the Hessian matrix of the adversarial training model of logistic regression. The loss function is
The Hessian matrix of loss along weight is
The optimal weight condition eliminate the second terms. The first term is
Appendix C Experimental Details
In this chapter we describe the experimental details. We use five image datasets: MNIST , MNIST2, CIFAR10 , CIFAR2, and SVHN . MNIST2 and CIFAR2 are the original datasets for binary classification. We made MNIST2 using only MNIST class 0 and class 1, and CIFAR2 using only frog and ship classes in CIFAR10. We chose the frog and ship classes because they have the highest classification accuracy. In CIFAR10 and CIFAR2, model architecture, data standardization, and the parameters of the PGD attack were set the same as in . In SVHN, model architecture, data standardization, and the parameters of the PGD attack were set the same as in .
Appendix D Additional Experiments
d.1 MNIST2 with the larger magnitude of noise
Figure 5 shows the results of the experiment using the noise magnitude range , which is commonly used in MNIST for 10-class classification.
-  (2017) Entropy-sgd: biasing gradient descent into wide valleys. In ICLR, Cited by: §3.1, §4.2.
-  (2019) Certified adversarial robustness via randomized smoothing. In ICML, pp. 1310–1320. Cited by: §1.
-  (2020) Reliable evaluation of adversarial robustness with an ensemble of diverse parameter-free attacks. ICML. Cited by: §2.2.
-  (2019) BERT: pre-training of deep bidirectional transformers for language understanding. In NAACL, Cited by: §1.
-  (2017) Sharp minima can generalize for deep nets. In ICML, Cited by: §4.2.
-  (2020) Sharpness-aware minimization for efficiently improving generalization. arXiv preprint arXiv:2010.01412. Cited by: §4.2.
-  (2015) Explaining and harnessing adversarial examples. In ICLR, Cited by: §1, §2.3.
-  (2016) Deep residual learning for image recognition. In CVPR, pp. 770–778. Cited by: §1, §5.2.
-  (1997) Flat minima. Neural Computation 9 (1), pp. 1–42. Cited by: §3.1.
-  (2019) A new defense against adversarial images: turning a weakness into a strength. In NeurIPS, Vol. 32, pp. 1635–1646. Cited by: §1.
-  (2019) Model-agnostic adversarial detection by random perturbations.. In IJCAI, pp. 4689–4696. Cited by: §1.
-  (2020) Fantastic generalization measures and where to find them. In ICLR, Cited by: §4.2.
-  (2017) On large-batch training for deep learning: generalization gap and sharp minima. In ICLR, Cited by: §1, §3.1, §4.2.
-  (2009) Learning multiple layers of features from tiny images. Technical report. Cited by: Appendix C, §5.
-  (2017) Adversarial machine learning at scale. In ICLR, Cited by: §1, §2.3.
-  (1998) Gradient-based learning applied to document recognition. Proceedings of the IEEE. Cited by: Appendix C, §5.1.
-  (2018) Visualizing the loss landscape of neural nets. In NeurIPS, pp. 6389–6399. Cited by: §2.4, §3.1.
-  (2020) On the loss landscape of adversarial training: identifying challenges and how to overcome them. NeurIPS 33. Cited by: §4.1.
-  (2019) Detection based defense against adversarial examples from the steganalysis point of view. In CVPR, pp. 4825–4834. Cited by: §1.
-  (2018) Towards deep learning models resistant to adversarial attacks. In ICLR, Cited by: Appendix C, §1, §2.2, §2.3, §5.
-  (2017) On detecting adversarial perturbations. ICLR. Cited by: §1.
Training recurrent neural networks by diffusion. CoRR abs/1601.04114. Cited by: §4.2.
-  (2011) Reading digits in natural images with unsupervised feature learning. In NeurIPS Workshop on Deep Learning and Unsupervised Feature Learning. Cited by: Appendix C, §5.
-  (2016) Distillation as a defense to adversarial perturbations against deep neural networks. In S&P (Oakland), pp. 582–597. Cited by: §1.
-  (2020) Overfitting in adversarially robust deep learning. In ICML, pp. 8093–8104. Cited by: §5.1.
-  (2019) Provably robust deep learning via adversarially trained smoothed classifiers. In NeurIPS, pp. 11292–11303. Cited by: §1.
-  (2014) Intriguing properties of neural networks. In ICLR, External Links: Cited by: §1.
Residual convolutional ctc networks for automatic speech recognition. arXiv preprint arXiv:1702.07793. Cited by: §1.
-  (2020) Improving adversarial robustness requires revisiting misclassified examples. In ICLR, Cited by: §1.
-  (2020) Adversarial weight perturbation helps robust generalization. NeurIPS 33. Cited by: Appendix C, §1, §2.4, §4.1, §5.
-  (2020) Randomized smoothing of all shapes and sizes. In ICML, Vol. 119, pp. 10693–10705. Cited by: §1.
-  (2019) Theoretically principled trade-off between robustness and accuracy. In ICML, Cited by: §1.