1 Introduction
Despite their impressive performance on many supervised learning tasks, deep neural networks (DNNs) are often highly susceptible to adversarial perturbations imperceptible to the human eye (Szegedy et al., 2013; Goodfellow et al., 2014b)
. These “adversarial attacks" have received enormous attention in the machine learning literature over recent years
(Goodfellow et al., 2014b; Moosavi Dezfooli et al., 2016; Carlini & Wagner, 2016; Kurakin et al., 2016; Papernot et al., 2016; Carlini & Wagner, 2017; Papernot et al., 2017; Madry et al., 2018; Tramèr et al., 2018). Adversarial attack studies have mainly focused on developing effective attack and defense schemes. While attack schemes attempt to mislead a trained classifier via additive perturbations to the input, defense mechanisms aim to train classifiers robust to these perturbations. Although existing defense methods result in considerably better performance compared to standard training methods, the improved performance can still be far below the performance in non-adversarial settings
(Athalye et al., 2018; Schmidt et al., 2018).A standard adversarial training scheme involves fitting a classifier using adversarially-perturbed samples (Szegedy et al., 2013; Goodfellow et al., 2014b) with the intention of producing a trained classifier with better robustness to attacks on future (i.e. test) samples. Madry et al. (2018) provides a robust optimization interpretation of the adversarial training approach, demonstrating that this strategy finds the optimal classifier minimizing the average worst-case loss over an adversarial ball centered at each training sample. This minimax interpretation can also be extended to distributionally-robust training methods (Sinha et al., 2018) where the offered robustness is over a Wasserstein-ball around the empirical distribution of training data.
Recently, Schmidt et al. (2018) have shown that standard adversarial training produces networks that generalize poorly. The performance of adversarially-trained DNNs over test samples can be significantly worse than their training performance, and this gap can be far greater than the generalization gap achieved using standard empirical risk minimization (ERM). This discrepancy suggests that the overall adversarial test performance can be improved by applying effective regularization schemes during adversarial training.

In this work, we propose using spectral normalization (SN) (Miyato et al., 2018) as a computationally-efficient and statistically-powerful regularization scheme for adversarial training of DNNs. SN has been successfully implemented and applied for DNNs in the context of generative adversarial networks (GANs) (Goodfellow et al., 2014a), resulting in state-of-the-art deep generative models for several benchmark tasks (Miyato et al., 2018). Moreover, SN (Tsuzuku et al., 2018) and other similar Lipschitz regularization techniques (Cisse et al., 2017) have been successfully applied in non-adversarial training settings to improve the robustness of ERM-trained networks to adversarial attacks. The theoretical results in (Bartlett et al., 2017; Neyshabur et al., 2017a) and empirical results in (Yoshida & Miyato, 2017) also suggest that SN can close the generalization gap for DNNs in non-adversarial ERM setting.
On the theoretical side, we first extend the standard notion of margin loss to adversarial settings. We then leverage the PAC-Bayes generalization framework (McAllester, 1999) to prove generalization bounds for spectrally-normalized DNNs in terms of our defined adversarial margin loss. Our approach parallels the approach used by Neyshabur et al. (2017a) to derive generalization bounds in non-adversarial settings. We obtain adversarial generalization error bounds for three well-known gradient-based attack schemes: fast gradient method (FGM) (Goodfellow et al., 2014b), projected gradient method (PGM) (Kurakin et al., 2016; Madry et al., 2018), and Wasserstein risk minimization (WRM) (Sinha et al., 2018). Our theoretical analysis shows that the adversarial generalization component will vanish by applying SN to all layers for sufficiently small spectral norm values.
On the empirical side, we show that SN can significantly improve test performance after adversarial training. We perform numerical experiments for three standard datasets (MNIST, CIFAR-10, SVHN) and various standard DNN architectures (including AlexNet (Krizhevsky et al., 2012), Inception (Szegedy et al., 2015), and ResNet (He et al., 2016)); in almost all of the experiments we obtain a better test performance after applying SN. Figure 1 shows the training and validation performance for AlexNet fit on the CIFAR10 dataset using FGM, PGM, and WRM, resulting in adversarial test accuracy improvements of , , and
percent, respectively. Furthermore, we numerically validate the correlation between the spectral-norm capacity term in our bounds and the actual generalization performance. To perform our numerical experiments, we develop a computationally-efficient approach for normalizing the spectral norm of convolution layers with arbitrary stride and padding schemes. We provide the TensorFlow code as spectral normalization of convolutional layers can also be useful for other deep learning tasks. To summarize, the main contributions of this work are:
-
Proposing SN as a regularization scheme for adversarial training of DNNs,
-
Extending concepts of margin-based generalization analysis to adversarial settings and proving margin-based generalization bounds for three gradient-based adversarial attack schemes,
-
Developing an efficient method for normalizing the spectral norm of convolutional layers in deep convolution networks,
-
Numerically demonstrating the improved test and generalization performance of DNNs trained with SN.
2 Preliminaries
In this section, we first review some standard concepts of margin-based generalization analysis in learning theory. We then extend these notions to adversarial training settings.
2.1 Supervised learning, Deep neural networks, Generalization error
Consider samples drawn i.i.d from underlying distribution . We suppose and where
represents the number of different labels. Given loss function
and function class parameterized by , a supervised learner aims to find the optimal function in minimizing the expected loss (risk) averaged over the underlying distribution .We consider as the class of -layer neural networks with
hidden units per layer and activation functions
. Each in maps a data point to an-dimensional vector. Specifically, we can express each
as . We use to denote the spectral norm of matrix, defined as the largest singular value of
, and to denote ’s Frobenius norm.A classifier ’s performance over the true distribution of data can be different from the training performance over the empirical distribution of training samples . The difference between the empirical and true averaged losses, evaluated on respectively training and test samples, is called the generalization error. Similar to Neyshabur et al. (2017a), we evaluate a DNN’s generalization performance using its expected margin loss defined for margin parameter as
(1) |
where denotes the th entry of . For a given data point , we predict the label corresponding to the maximum entry of . Also, we use to denote the empirical margin loss averaged over the training samples. The goal of margin-based generalization analysis is to provide theoretical comparison between the true and empirical margin risks.
2.2 Adversarial attacks, Adversarial training
A supervised learner observes only the training samples and hence does not know the true distribution of data. Then, a standard approach to train a classifier is to minimize the empirical expected loss over function class , which is
(2) |
This approach is called empirical risk minimization (ERM). For better optimization performance, the loss function is commonly chosen to be smooth. Hence, 0-1 and margin losses are replaced by smooth surrogate loss functions such as the cross-entropy loss. However, we still use the margin loss as defined in (1) for evaluating the test and generalization performance of DNN classifiers.
While ERM training usually achieves good performance over DNNs, several recent observations reveal that adding some adversarially-chosen perturbation to each sample can significantly drop the trained DNN’s performance. Given norm function and adversarial noise power , the adversarial additive noise for sample and classifier is defined to be
(3) |
To provide adversarial robustness against the above attack scheme, a standard technique, which is called adversarial training, follows ERM training over the adversarially-perturbed samples by solving
(4) |
Nevertheless, (3) and hence (4) are generally non-convex and intractable optimization problems. Therefore, several schemes have been proposed in the literature to approximate the optimal solution of (3). In this work, we analyze the generalization performance of the following three gradient-based methods for approximating the solution to (3). We note that several other attack schemes such as DeepFool (Moosavi Dezfooli et al., 2016), CW attacks (Carlini & Wagner, 2017), target and least-likely attacks (Kurakin et al., 2016) have been introduced and examined in the literature, which can lead to interesting future directions for this work.
-
[wide, labelwidth=!, labelindent=15pt]
-
Fast Gradient Method (FGM) (Goodfellow et al., 2014b): FGM approximates the solution to (3) by considering a linearized DNN loss around a given data point. Hence, FGM perturbs by adding the following noise vector:
(5) For the special case of -norm , the above representation of FGM recovers the fast gradient sign method (FGSM) where each data point is perturbed by the -normalized sign vector of the loss’s gradient. For -norm , we similarly normalize the loss’s gradient vector to have Euclidean norm.
-
Projected Gradient Method (PGM) (Kurakin et al., 2016): PGM is the iterative version of FGM and applies projected gradient descent to solve (3). PGM follows the following update rules for a given number of steps:
(6) Here, we first find the direction along which the loss at the th perturbed point changes the most, and then we move the perturbed point along this direction by stepsize followed by projecting the resulting perturbation onto the set with -bounded norm.
-
Wasserstein Risk Minimization (WRM) (Sinha et al., 2018): WRM solves the following variant of (3) for data-point where the norm constraint in (3) is replaced by a norm-squared Lagrangian penalty term:
(7) As discussed earlier, the optimization problem (3) is generally intractable. However, in the case of Euclidean norm , if we assume ’s Lipschitz constant is upper-bounded by , then WRM optimization (7) results in solving a convex optimization problem and can be efficiently solved using gradient methods.
To obtain efficient adversarial defense schemes, we can substitute , , or for in (4). Instead of fitting the classifier over true adversarial examples, which are NP-hard to obtain, we can instead train the DNN over FGM, PGM, or WRM-adversarially perturbed samples.
2.3 Adversarial generalization error
The goal of adversarial training is to improve the robustness against adversarial attacks on not only the training samples but also on test samples; however, the adversarial training problem (4) focuses only on the training samples. To evaluate the adversarial generalization performance, we extend the notion of margin loss defined earlier in (1) to adversarial training settings by defining the adversarial margin loss as
(8) |
Here, we measure the margin loss over adversarially-perturbed samples, and we use to denote the empirical adversarial margin loss. We also use , , and to denote the adversarial margin losses with FGM (5), PGM (6), and WRM (7) attacks, respectively.
3 Margin-based adversarial Generalization bounds
As previously discussed, generalization performance can be different between adversarial and non-adversarial settings. In this section, we provide generalization bounds for DNN classifiers under adversarial attacks in terms of the spectral norms of the trained DNN’s weight matrices. The bounds motivate regularizing these spectral norms in order to limit the DNN’s capacity and improve its generalization performance under adversarial attacks.
We use the PAC-Bayes framework (McAllester, 1999, 2003) to prove our main results. To derive adversarial generalization error bounds for DNNs with smooth activation functions
, we first extend a recent result on the margin-based generalization bound for the ReLU activation function
(Neyshabur et al., 2017a) to general -Lipschitz activation functions.Theorem 1.
Consider the class of hidden-layer neural networks with units per hidden-layer with -Lipschitz activation satisfying . Suppose that , ’s support set, is norm-bounded as . Also assume for constant any satisfies
Here denotes the geometric mean of , with probability at least
where we define complexity score .
Proof.
We defer the proof to the Appendix. The proof is a slight modification of Neyshabur et al. (2017a)’s proof of the same result for ReLU activation. ∎
We now generalize this result to adversarial settings where the DNN’s performance is evaluated under adversarial attacks. We prove three separate adversarial generalization error bounds for FGM, PGM, and WRM attacks.
For the following results, we consider , the class of neural nets defined in Theorem 1. Moreover, we assume that the training loss and its first-order derivative are -Lipschitz. Similar to Sinha et al. (2018), we assume the activation is smooth and its derivative is -Lipschitz. This class of activations include ELU (Clevert et al., 2015) and tanh functions but not the ReLU function. However, our numerical results in Table 1 from the Appendix suggest similar generalization performance between ELU and ReLU activations.
Theorem 2.
Consider in Theorem 1 and training loss function satisfying the assumptions stated above. We consider an FGM attack with noise power according to Euclidean norm . For any assume holds for constant , any , and any -close to ’s support set. Then, for any with probability the following bound holds for the FGM margin loss of any
where .
Proof.
We defer the proof to the Appendix. ∎
Note that the above theorem assumes that the change rate for the loss function around test samples is at least , which gives a baseline for measuring the attack power . In our numerical experiments, we validate this assumption over standard image recognition tasks. Next, we generalize this result to adversarial settings with PGM attack, i.e. the iterative version of FGM attack.
Theorem 3.
Consider and training loss function for which the assumptions in Theorem 2 hold. We consider a PGM attack with noise power given Euclidean norm , iterations for attack, and stepsize . Then, for any with probability the following bound applies to the PGM margin loss of any
Here we define as the following expression
where provides an upper-bound on the Lipschitz constant of .
Proof.
We defer the proof to the Appendix. ∎
In the above result, notice that if then for any number of gradient steps the PGM margin-based generalization bound will grow the FGM generalization error bound in Theorem 2 by factor . We next extend our adversarial generalization analysis to WRM attacks.
Theorem 4.
For neural net class and training loss satisfying Theorem 2’s assumptions, consider a WRM attack with Lagrangian coefficient and Euclidean norm . Given parameter , assume defined in Theorem 3 is upper-bounded by for any . For any , the following WRM margin-based generalization bound holds with probability for any :
where we define
Proof.
We defer the proof to the Appendix. ∎
4 Spectral normalization of convolutional layers
To control the Lipschitz constant of our trained network, we need to ensure that the spectral norm associated with each linear operation in the network does not exceed some pre-specified . For fully-connected layers (i.e. regular matrix multiplication), please see Appendix B. For a general class of linear operations including convolution, Tsuzuku et al. (2018) propose to compute the operation’s spectral norm through computing the gradient of the Euclidean norm of the operation’s output. Here, we leverage the deconvolution operation to further simplify and accelerate computing the spectral norm of the convolution operation. Additionally, Sedghi et al. (2018) develop a method for computing all the singular values including the largest one, i.e. the spectral norm. While elegant, the method only applies to convolution filters with stride and zero-padding. However, in practice the normalization factor depends on the stride size and padding scheme governing the convolution operation. Here we develop an efficient approach for computing the maximum singular value, i.e. spectral norm, of convolutional layers with arbitary stride and padding schemes. Note that, as also discussed by Gouk et al. (2018), the th convolutional layer output feature map is a linear operation of the input :
where has feature maps, is a filter, and denotes the convolution operation (which also encapsulates stride size and padding scheme). For simplicity, we ignore the additive bias terms here. By vectorizing and letting represent the overall linear operation associated with , we see that
and therefore the overall convolution operation can be described using
While explicitly reconstructing is expensive, we can still compute , the spectral norm of , by leveraging the convolution transpose operation implemented by several modern-day deep learning packages. This allows us to efficiently performs matrix multiplication with without explicitly constructing . Therefore we can approximate using a modified version of power iteration (Algorithm 1), wrapping the appropriate stride size and padding arguments into the convolution and convolution transpose operations. After obtaining , we compute in the same manner as for the fully-connected layers. Like Miyato et al., we exploit the fact that SGD only makes small updates to from training step to training step, reusing the same and running only one iteration per step. Unlike Miyato et al., rather than enforcing , we instead enforce the looser constraint :
(9) |
which we observe to result in faster training for supervised learning tasks.
5 Numerical Experiments
In this section we provide an array of empirical experiments to validate both the bounds we derived in Section 3 and our implementation of spectral normalization described in section 4. We show that spectral normalization improves both test accuracy and generalization for a variety of adversarial training schemes, datasets, and network architectures.
All experiments are implemented in TensorFlow (Abadi et al., 2016). For each experiment, we cross validate 4 to 6 values of (see (9)) using a fixed validation set of 500 samples. For PGM, we used iterations and . Additionally, for FGM and PGM we used -type attacks (unless specified) with magnitude (this value was approximately 2.44 for CIFAR10). For WRM, we implemented gradient ascent as discussed by Sinha et al. (2018). Additionally, for WRM training we used a Lagrangian coefficient of for CIFAR10 and SVHN and a Lagrangian coefficient of for MNIST in a similar manner to Sinha et al. (2018). The code will be made readily available.
5.1 Validation of spectral normalization implementation and bounds
We first demonstrate the effect of the proposed spectral normalization approach on the final DNN weights by comparing the norm of the input to that of the output . As shown in Figure 2(a), without spectral normalization ( in (9)), the norm gain can be large. Additionally, because we are using cross-entropy loss, the weights (and therefore the norm gain) can grow arbitrarily high if we continue training as reported by Neyshabur et al. (2017b). As we decrease , however, we produce more constrained networks, resulting in a decrease in norm gain. At , the gain of the network cannot be greater than 1, which is consistent with what we observe. Additionally, we provide a comparison of our method to that of Miyato et al. (2018) in Appendix A.1, empirically demonstrating that Miyato et al.’s method does not properly control the spectral norm of convolutional layers, resulting in worse generalization performance.
Figure 2(b) shows that the norms of the gradients with respect to the training samples are nicely distributed after spectral normalization. Additionally, this figure suggests that the minimum gradient -norm assumption (the condition in Theorems 2 and 3) holds for spectrally-normalized networks.
The first column of Figure 3 shows that, as observed by Bartlett et al. (2017), AlexNet trained using ERM generates similar margin distributions for both random and true labels on CIFAR10 unless we normalize the margins appropriately. We see that even without further correction, ERM training with SN allows AlexNet to have distinguishable performance between the two datasets. This observation suggests that SN as a regularization scheme enforces the generalization error bounds shown for spectrally-normalized DNNs by Bartlett et al. (2017) and Neyshabur et al. (2017a). Additionally, the margin normalization factor (the capacity norm in Theorems 1-4) is much smaller for networks trained with SN. As demonstrated by the other columns in Figure 3, a smaller normalization factor results in larger normalized margin values and much tighter margin-based generalization bounds (a factor of for ERM and a factor of for FGM and PGM) (see Theorems 1-4).
![]() |
![]() |

5.2 Spectral normalization improves generalization and adversarial robustness
The phenomenon of overfitting random labels described by Zhang et al. (2016) can be observed even for adversarial training methods. Figure 4 shows how the FGM, PGM, or WRM adversarial training schemes only slightly delay the rate at which AlexNet fits random labels on CIFAR10, and therefore the generalization gap can be quite large without proper regularization. After introducing spectral normalization, however, we see that the network has a much harder time fitting both the random and true labels. With the proper amount of SN (chosen via cross validation), we can obtain networks that struggle to fit random labels while still obtaining the same or better test performance on true labels.
We also observe that training schemes regularized with SN result in networks more robust to adversarial attacks. Figure 5 shows that even without adversarial training, AlexNet with SN becomes more robust to FGM, PGM, and WRM attacks. Adversarial training improves adversarial robustness more than SN by itself; however we see that we can further improve the robustness of the trained networks significantly by combining SN with adversarial training.


5.3 Other datasets and architectures
We demonstrate the power of regularization via SN on several combinations of datasets, network architectures, and adversarial training schemes. The datasets we evaluate are CIFAR10, MNIST, and SVHN. We fit CIFAR10 using the AlexNet and Inception networks described by Zhang et al. (2016)
, 1-hidden-layer and 2-hidden-layer multi layer perceptrons (MLPs) with ELU activation and 512 hidden nodes in each layer, and the ResNet architecture (
He et al. (2016)) provided in TensorFlow for fitting CIFAR10. We fit MNIST using the ELU network described by Sinha et al. (2018) and the 1-hidden-layer and 2-hidden-layer MLPs. Finally, we fit SVHN using the same AlexNet architecture we used to fit CIFAR10. Our implementations do not use any additional regularization schemes including weight decay, dropout (Srivastava et al., 2014), and batch normalization
(Ioffe & Szegedy, 2015) as these approaches are not motivated by the theory developed in this work; however, we provide numerical experiments comparing the proposed approach with weight decay, dropout, and batch normalization in Appendix A.2.Table 1 in the Appendix reports the pre and post-SN test accuracies for all 42 combinations evaluated. Figure 1 in the Introduction and Figures 9-8 in the Appendix show examples of training and validation curves on some of these combinations. We see that the validation curve generally improves after regularization with SN, and the observed improvements in validation accuracy are confirmed by the test accuracies reported in Table 1. Figure 6 visually summarizes Table 1, showing how SN can often significantly improve the test accuracy (and therefore decrease the generalization gap) for several of the combinations. We also provide Table 2 in the Appendix which shows the proportional increase in training time after introducing SN with our TensorFlow implementation.

6 Related Works
Providing theoretical guarantees for adversarial robustness of various classifiers has been studied in multiple works. Wang et al. (2017) targets analyzing the adversarial robustness of the nearest neighbor approach. Gilmer et al. (2018) studies the effect of the complexity of the data-generating manifold on the final adversarial robustness for a specific trained model. Fawzi et al. (2018) proves lower-bounds for the complexity of robust learning in adversarial settings, targeting the population distribution of data. Xu et al. (2009)
shows that the regularized support vector machine (SVM) can be interpreted via robust optimization.
Fawzi et al. (2016) analyzes the robustness of a fixed classifier to random and adversarial perturbations of the input data. While all of these works seek to understand the robustness properties of different classification function classes, unlike our work they do not focus on the generalization aspects of learning over DNNs under adversarial attacks.Concerning the generalization aspect of adversarial training, Sinha et al. (2018) provides optimization and generalization guarantees for WRM under the assumptions discussed after Theorem 4. However, their generalization guarantee only applies to the Wasserstein cost function, which is different from the 0-1 or margin loss and does not explicitly suggest a regularization scheme. In a recent related work, Schmidt et al. (2018) numerically shows the wide generalization gap in PGM adversarial training and theoretically establishes lower-bounds on the sample complexity of linear classifiers in Gaussian settings. While our work does not provide sample complexity lower-bounds, we study the broader function class of DNNs where we provide upper-bounds on adversarial generalization error and suggest an explicit regularization scheme for adversarial training over DNNs.
Generalization in deep learning has been a topic of great interest in machine learning (Zhang et al., 2016). In addition to margin-based bounds (Bartlett et al., 2017; Neyshabur et al., 2017a), various other tools including VC dimension (Anthony & Bartlett, 2009), norm-based capacity scores (Bartlett & Mendelson, 2002; Neyshabur et al., 2015), and flatness of local minima (Keskar et al., 2016; Neyshabur et al., 2017b) have been used to analyze generalization properties of DNNs. Recently, Arora et al. (2018) introduced a compression approach to further improve the margin-based bounds presented by Bartlett et al. (2017); Neyshabur et al. (2017a). The PAC-Bayes bound has also been considered and computed by Dziugaite & Roy (2017), resulting in non-vacuous bounds for MNIST.
References
- Abadi et al. [2016] Martín Abadi, Ashish Agarwal, Paul Barham, Eugene Brevdo, Zhifeng Chen, Craig Citro, Greg S Corrado, Andy Davis, Jeffrey Dean, Matthieu Devin, et al. Tensorflow: Large-scale machine learning on heterogeneous distributed systems. arXiv preprint arXiv:1603.04467, 2016.
- Anthony & Bartlett [2009] Martin Anthony and Peter L Bartlett. Neural network learning: Theoretical foundations. cambridge university press, 2009.
- Arora et al. [2018] Sanjeev Arora, Rong Ge, Behnam Neyshabur, and Yi Zhang. Stronger generalization bounds for deep nets via a compression approach. arXiv preprint arXiv:1802.05296, 2018.
- Athalye et al. [2018] Anish Athalye, Nicholas Carlini, and David Wagner. Obfuscated gradients give a false sense of security: Circumventing defenses to adversarial examples. In Proceedings of the 35th International Conference on Machine Learning, pp. 274–283, 2018.
- Bartlett & Mendelson [2002] Peter L Bartlett and Shahar Mendelson. Rademacher and gaussian complexities: Risk bounds and structural results. Journal of Machine Learning Research, 3(Nov):463–482, 2002.
- Bartlett et al. [2017] Peter L Bartlett, Dylan J Foster, and Matus J Telgarsky. Spectrally-normalized margin bounds for neural networks. In Advances in Neural Information Processing Systems, pp. 6241–6250, 2017.
- Carlini & Wagner [2016] Nicholas Carlini and David Wagner. Defensive distillation is not robust to adversarial examples. arXiv preprint arXiv:1607.04311, 2016.
- Carlini & Wagner [2017] Nicholas Carlini and David Wagner. Towards evaluating the robustness of neural networks. In 2017 IEEE Symposium on Security and Privacy (SP), pp. 39–57. IEEE, 2017.
- Cisse et al. [2017] Moustapha Cisse, Piotr Bojanowski, Edouard Grave, Yann Dauphin, and Nicolas Usunier. Parseval networks: Improving robustness to adversarial examples. arXiv preprint arXiv:1704.08847, 2017.
- Clevert et al. [2015] Djork-Arné Clevert, Thomas Unterthiner, and Sepp Hochreiter. Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289, 2015.
- Dziugaite & Roy [2017] Gintare Karolina Dziugaite and Daniel M Roy. Entropy-sgd optimizes the prior of a pac-bayes bound: Data-dependent pac-bayes priors via differential privacy. arXiv preprint arXiv:1712.09376, 2017.
- Fawzi et al. [2016] Alhussein Fawzi, Seyed-Mohsen Moosavi-Dezfooli, and Pascal Frossard. Robustness of classifiers: from adversarial to random noise. In Advances in Neural Information Processing Systems, pp. 1632–1640, 2016.
- Fawzi et al. [2018] Alhussein Fawzi, Hamza Fawzi, and Omar Fawzi. Adversarial vulnerability for any classifier. arXiv preprint arXiv:1802.08686, 2018.
- Gilmer et al. [2018] Justin Gilmer, Luke Metz, Fartash Faghri, Samuel S Schoenholz, Maithra Raghu, Martin Wattenberg, and Ian Goodfellow. Adversarial spheres. arXiv preprint arXiv:1801.02774, 2018.
- Goodfellow et al. [2014a] Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. Generative adversarial nets. In Advances in neural information processing systems, pp. 2672–2680, 2014a.
- Goodfellow et al. [2014b] Ian J Goodfellow, Jonathon Shlens, and Christian Szegedy. Explaining and harnessing adversarial examples. arXiv preprint arXiv:1412.6572, 2014b.
- Gouk et al. [2018] Henry Gouk, Eibe Frank, Bernhard Pfahringer, and Michael Cree. Regularisation of neural networks by enforcing lipschitz continuity. arXiv preprint arXiv:1804.04368, 2018.
-
He et al. [2016]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun.
Identity mappings in deep residual networks.
In
European conference on computer vision
, pp. 630–645. Springer, 2016. - Ioffe & Szegedy [2015] Sergey Ioffe and Christian Szegedy. Batch normalization: Accelerating deep network training by reducing internal covariate shift. arXiv preprint arXiv:1502.03167, 2015.
- Keskar et al. [2016] Nitish Shirish Keskar, Dheevatsa Mudigere, Jorge Nocedal, Mikhail Smelyanskiy, and Ping Tak Peter Tang. On large-batch training for deep learning: Generalization gap and sharp minima. arXiv preprint arXiv:1609.04836, 2016.
- Krizhevsky et al. [2012] Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems, pp. 1097–1105, 2012.
- Kurakin et al. [2016] Alexey Kurakin, Ian Goodfellow, and Samy Bengio. Adversarial machine learning at scale. arXiv preprint arXiv:1611.01236, 2016.
- Madry et al. [2018] Aleksander Madry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, and Adrian Vladu. Towards deep learning models resistant to adversarial attacks. In International Conference on Learning Representations, 2018.
- McAllester [2003] David McAllester. Simplified pac-bayesian margin bounds. In Learning theory and Kernel machines, pp. 203–215. Springer, 2003.
-
McAllester [1999]
David A McAllester.
Pac-bayesian model averaging.
In
Proceedings of the twelfth annual conference on Computational learning theory
, pp. 164–170. ACM, 1999. - Miyato et al. [2018] Takeru Miyato, Toshiki Kataoka, Masanori Koyama, and Yuichi Yoshida. Spectral normalization for generative adversarial networks. arXiv preprint arXiv:1802.05957, 2018.
-
Moosavi Dezfooli et al. [2016]
Seyed Mohsen Moosavi Dezfooli, Alhussein Fawzi, and Pascal Frossard.
Deepfool: a simple and accurate method to fool deep neural networks.
In
Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
, number EPFL-CONF-218057, 2016. - Neyshabur et al. [2015] Behnam Neyshabur, Ryota Tomioka, and Nathan Srebro. Norm-based capacity control in neural networks. In COLT, pp. 1376–1401, 2015.
- Neyshabur et al. [2017a] Behnam Neyshabur, Srinadh Bhojanapalli, David McAllester, and Nathan Srebro. A pac-bayesian approach to spectrally-normalized margin bounds for neural networks. arXiv preprint arXiv:1707.09564, 2017a.
- Neyshabur et al. [2017b] Behnam Neyshabur, Srinadh Bhojanapalli, David McAllester, and Nati Srebro. Exploring generalization in deep learning. In Advances in Neural Information Processing Systems, pp. 5949–5958, 2017b.
- Papernot et al. [2016] Nicolas Papernot, Patrick McDaniel, Xi Wu, Somesh Jha, and Ananthram Swami. Distillation as a defense to adversarial perturbations against deep neural networks. In 2016 IEEE Symposium on Security and Privacy (SP), pp. 582–597. IEEE, 2016.
- Papernot et al. [2017] Nicolas Papernot, Patrick McDaniel, Ian Goodfellow, Somesh Jha, Z Berkay Celik, and Ananthram Swami. Practical black-box attacks against machine learning. In Proceedings of the 2017 ACM on Asia Conference on Computer and Communications Security, pp. 506–519. ACM, 2017.
- Schmidt et al. [2018] Ludwig Schmidt, Shibani Santurkar, Dimitris Tsipras, Kunal Talwar, and Aleksander Mądry. Adversarially robust generalization requires more data. arXiv preprint arXiv:1804.11285, 2018.
- Sedghi et al. [2018] Hanie Sedghi, Vineet Gupta, and Philip M Long. The singular values of convolutional layers. arXiv preprint arXiv:1805.10408, 2018.
- Sinha et al. [2018] Aman Sinha, Hongseok Namkoong, and John Duchi. Certifiable distributional robustness with principled adversarial training. In International Conference on Learning Representations, 2018.
- Srivastava et al. [2014] Nitish Srivastava, Geoffrey Hinton, Alex Krizhevsky, Ilya Sutskever, and Ruslan Salakhutdinov. Dropout: A simple way to prevent neural networks from overfitting. The Journal of Machine Learning Research, 15(1):1929–1958, 2014.
- Szegedy et al. [2013] Christian Szegedy, Wojciech Zaremba, Ilya Sutskever, Joan Bruna, Dumitru Erhan, Ian Goodfellow, and Rob Fergus. Intriguing properties of neural networks. arXiv preprint arXiv:1312.6199, 2013.
- Szegedy et al. [2015] Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, and Andrew Rabinovich. Going deeper with convolutions. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1–9, 2015.
- Tramèr et al. [2018] Florian Tramèr, Alexey Kurakin, Nicolas Papernot, Ian Goodfellow, Dan Boneh, and Patrick McDaniel. Ensemble adversarial training: Attacks and defenses. In International Conference on Learning Representations, 2018.
- Tropp [2012] Joel A Tropp. User-friendly tail bounds for sums of random matrices. Foundations of computational mathematics, 12(4):389–434, 2012.
- Tsuzuku et al. [2018] Yusuke Tsuzuku, Issei Sato, and Masashi Sugiyama. Lipschitz-margin training: Scalable certification of perturbation invariance for deep neural networks. arXiv preprint arXiv:1802.04034, 2018.
- Wang et al. [2017] Yizhen Wang, Somesh Jha, and Kamalika Chaudhuri. Analyzing the robustness of nearest neighbors to adversarial examples. arXiv preprint arXiv:1706.03922, 2017.
- Xu et al. [2009] Huan Xu, Constantine Caramanis, and Shie Mannor. Robustness and regularization of support vector machines. Journal of Machine Learning Research, 10(Jul):1485–1510, 2009.
- Yoshida & Miyato [2017] Yuichi Yoshida and Takeru Miyato. Spectral norm regularization for improving the generalizability of deep learning. arXiv preprint arXiv:1705.10941, 2017.
- Zhang et al. [2016] Chiyuan Zhang, Samy Bengio, Moritz Hardt, Benjamin Recht, and Oriol Vinyals. Understanding deep learning requires rethinking generalization. arXiv preprint arXiv:1611.03530, 2016.
Appendix A Further experimental results



Dataset | Architecture | Training | Train acc | Test acc | Train acc (SN) | Test acc (SN) |
---|---|---|---|---|---|---|
CIFAR10 | AlexNet | ERM | 1.00 | 0.79 | 1.00 | 0.79 |
CIFAR10 | AlexNet | FGM | 0.98 | 0.54 | 0.93 | 0.63 |
CIFAR10 | AlexNet | FGM | 1.00 | 0.51 | 0.67 | 0.56 |
CIFAR10 | AlexNet | PGM | 0.99 | 0.50 | 0.92 | 0.62 |
CIFAR10 | AlexNet | PGM | 0.99 | 0.44 | 0.86 | 0.54 |
CIFAR10 | AlexNet | WRM | 1.00 | 0.61 | 0.76 | 0.65 |
CIFAR10 | ELU-AlexNet | ERM | 1.00 | 0.79 | 1.00 | 0.79 |
CIFAR10 | ELU-AlexNet | FGM | 0.97 | 0.52 | 0.68 | 0.60 |
CIFAR10 | ELU-AlexNet | PGM | 0.98 | 0.53 | 0.88 | 0.61 |
CIFAR10 | ELU-AlexNet | WRM | 1.00 | 0.60 | 1.00 | 0.60 |
CIFAR10 | Inception | ERM | 1.00 | 0.85 | 1.00 | 0.86 |
CIFAR10 | Inception | PGM | 0.99 | 0.53 | 1.00 | 0.58 |
CIFAR10 | Inception | PGM | 0.98 | 0.48 | 0.62 | 0.56 |
CIFAR10 | Inception | WRM | 1.00 | 0.66 | 1.00 | 0.67 |
CIFAR10 | 1-layer MLP | ERM | 0.98 | 0.49 | 0.68 | 0.53 |
CIFAR10 | 1-layer MLP | FGM | 0.60 | 0.36 | 0.60 | 0.46 |
CIFAR10 | 1-layer MLP | PGM | 0.57 | 0.36 | 0.55 | 0.46 |
CIFAR10 | 1-layer MLP | WRM | 0.60 | 0.41 | 0.62 | 0.50 |
CIFAR10 | 2-layer MLP | ERM | 0.99 | 0.51 | 0.79 | 0.56 |
CIFAR10 | 2-layer MLP | FGM | 0.57 | 0.36 | 0.66 | 0.49 |
CIFAR10 | 2-layer MLP | PGM | 0.93 | 0.35 | 0.66 | 0.48 |
CIFAR10 | 2-layer MLP | WRM | 0.87 | 0.35 | 0.73 | 0.52 |
CIFAR10 | ResNet | ERM | 1.00 | 0.80 | 1.00 | 0.83 |
CIFAR10 | ResNet | PGM | 0.99 | 0.49 | 1.00 | 0.55 |
CIFAR10 | ResNet | PGM | 0.98 | 0.44 | 0.72 | 0.53 |
CIFAR10 | ResNet | WRM | 1.00 | 0.63 | 1.00 | 0.66 |
MNIST | ELU-Net | ERM | 1.00 | 0.99 | 1.00 | 0.99* |
MNIST | ELU-Net | FGM | 0.98 | 0.97 | 1.00 | 0.97 |
MNIST | ELU-Net | PGM | 0.99 | 0.97 | 1.00 | 0.97 |
MNIST | ELU-Net | WRM | 0.95 | 0.92 | 0.95 | 0.93 |
MNIST | 1-layer MLP | ERM | 1.00 | 0.98 | 1.00 | 0.98* |
MNIST | 1-layer MLP | FGM | 0.88 | 0.88 | 1.00 | 0.96 |
MNIST | 1-layer MLP | PGM | 1.00 | 0.96 | 1.00 | 0.96 |
MNIST | 1-layer MLP | WRM | 0.92 | 0.88 | 0.92 | 0.88 |
MNIST | 2-layer MLP | ERM | 1.00 | 0.98 | 1.00 | 0.98 |
MNIST | 2-layer MLP | FGM | 0.97 | 0.91 | 1.00 | 0.96 |
MNIST | 2-layer MLP | PGM | 1.00 | 0.96 | 1.00 | 0.97 |
MNIST | 2-layer MLP | WRM | 0.97 | 0.88 | 0.98 | 0.90 |
SVHN | AlexNet | ERM | 1.00 | 0.93 | 1.00 | 0.93* |
SVHN | AlexNet | FGM | 0.97 | 0.76 | 0.95 | 0.83 |
SVHN | AlexNet | PGM | 1.00 | 0.78 | 0.85 | 0.81 |
SVHN | AlexNet | WRM | 1.00 | 0.83 | 0.87 | 0.84 |
* (i.e. no spectral normalization) achieved the highest validation accuracy.
Dataset | Architecture | Training | no SN runtime | SN runtime | ratio |
---|---|---|---|---|---|
CIFAR10 | AlexNet | ERM | 229 s | 283 s | 1.24 |
CIFAR10 | AlexNet | FGM | 407 s | 463 s | 1.14 |
CIFAR10 | AlexNet | FGM | 408 s | 465 s | 1.14 |
CIFAR10 | AlexNet | PGM | 2917 s | 3077 s | 1.05 |
CIFAR10 | AlexNet | PGM | 2896 s | 3048 s | 1.05 |
CIFAR10 | AlexNet | WRM | 3076 s | 3151 s | 1.02 |
CIFAR10 | ELU-AlexNet | ERM | 231 s | 283 s | 1.23 |
CIFAR10 | ELU-AlexNet | FGM | 410 s | 466 s | 1.14 |
CIFAR10 | ELU-AlexNet | PGM | 2939 s | 3093 s | 1.05 |
CIFAR10 | ELU-AlexNet | WRM | 3094 s | 3150 s | 1.02 |
CIFAR10 | Inception | ERM | 632 s | 734 s | 1.16 |
CIFAR10 | Inception | PGM | 9994 s | 6082 s | 0.61 |
CIFAR10 | Inception | PGM | 9948 s | 6063 s | 0.61 |
CIFAR10 | Inception | WRM | 10247 s | 6356 s | 0.62 |
CIFAR10 | 1-layer MLP | ERM | 22 s | 31 s | 1.42 |
CIFAR10 | 1-layer MLP | FGM | 25 s | 35 s | 1.43 |
CIFAR10 | 1-layer MLP | PGM | 79 s | 93 s | 1.18 |
CIFAR10 | 1-layer MLP | WRM | 73 s | 86 s | 1.18 |
CIFAR10 | 2-layer MLP | ERM | 23 s | 37 s | 1.59 |
CIFAR10 | 2-layer MLP | FGM | 27 s | 41 s | 1.51 |
CIFAR10 | 2-layer MLP | PGM | 91 s | 108 s | 1.19 |
CIFAR10 | 2-layer MLP | WRM | 85 s | 103 s | 1.21 |
CIFAR10 | ResNet | ERM | 315 s | 547 s | 1.73 |
CIFAR10 | ResNet | PGM | 2994 s | 3300 s | 1.10 |
CIFAR10 | ResNet | PGM | 2980 s | 3300 s | 1.11 |
CIFAR10 | ResNet | WRM | 3187 s | 3457 s | 1.08 |
MNIST | ELU-Net | ERM | 55 s | 97 s | 1.76 |
MNIST | ELU-Net | FGM | 91 s | 136 s | 1.49 |
MNIST | ELU-Net | PGM | 614 s | 676 s | 1.10 |
MNIST | ELU-Net | WRM | 635 s | 670 s | 1.06 |
MNIST | 1-layer MLP | ERM | 15 s | 24 s | 1.60 |
MNIST | 1-layer MLP | FGM | 17 s | 27 s | 1.57 |
MNIST | 1-layer MLP | PGM | 57 s | 71 s | 1.24 |
MNIST | 1-layer MLP | WRM | 51 s | 63 s | 1.24 |
MNIST | 2-layer MLP | ERM | 17 s | 31 s | 1.84 |
MNIST | 2-layer MLP | FGM | 20 s | 35 s | 1.77 |
MNIST | 2-layer MLP | PGM | 67 s | 89 s | 1.32 |
MNIST | 2-layer MLP | WRM | 62 s | 81 s | 1.30 |
SVHN | AlexNet | ERM | 334 s | 412 s | 1.23 |
SVHN | AlexNet | FGM | 596 s | 676 s | 1.13 |
SVHN | AlexNet | PGM | 4270 s | 4495 s | 1.05 |
SVHN | AlexNet | WRM | 4501 s | 4572 s | 1.02 |
Runtime increase after introducing spectral normalization for various datasets, network architectures, and training schemes. These ratios were obtained by running the experiments on one NVIDIA Titan Xp GPU for 40 epochs.
a.1 Comparison of proposed method to [26]’s method
For the optimal chosen when fitting AlexNet to CIFAR10 with PGM, we repeat the experiment using the spectral normalization approach suggested by [26]. This approach performs spectral normalization on convolutional layers by scaling the convolution kernel by the spectral norm of the kernel rather than the spectral norm of the overall convolution operation. Because it does not account for how the kernel can amplify perturbations in a single pixel multiple times (see Section 4), it does not properly control the spectral norm.
In Figure 10, we see that for the optimal reported in the main text, using [26]’s SN method results in worse generalization performance. This is because although we specified that , the actual obtained using [26]’s method can be much greater for convolutional layers, resulting in overfitting (hence the training curve quickly approaches 1.0 accuracy). The AlexNet architecture used has two convolutional layers. For the proposed method, the final spectral norms of the convolutional layers were both 1.60; for [26]’s method, the final spectral norms of the convolutional layers were 7.72 and 7.45 despite the corresponding convolution kernels having spectral norms of 1.60.
Our proposed method is less computationally efficient in comparison to [26]’s approach because each power iteration step requires a convolution operation rather than a division operation. As shown in Table 3, the proposed approach is not significantly less efficient with our TensorFlow implementation.

Dataset | Architecture | Training | |
---|---|---|---|
CIFAR10 | AlexNet | ERM | 1.11 |
CIFAR10 | AlexNet | FGM | 1.06 |
CIFAR10 | AlexNet | FGM | 1.11 |
CIFAR10 | AlexNet | PGM | 1.01 |
CIFAR10 | AlexNet | PGM | 1.11 |
CIFAR10 | AlexNet | WRM | 1.02 |
CIFAR10 | Inception | ERM | 0.98 |
CIFAR10 | Inception | PGM | 1.04 |
CIFAR10 | Inception | PGM | 1.06 |
CIFAR10 | Inception | WRM | 1.03 |
a.2 Comparison of proposed method to weight decay, dropout, and batch normalization

Appendix B Spectral normalization of fully-connected layers
For fully-connected layers, we approximate the spectral norm of a given matrix using the approach described by [26]: the power iteration method. For each , we randomly initialize a vector and approximate both the left and right singular vectors by iterating the update rules
The final singular value can be approximated with . Like Miyato et al., we exploit the fact that SGD only makes small updates to from training step to training step, reusing the same and running only one iteration per step. Unlike Miyato et al., rather than enforcing , we instead enforce the looser constraint as described by [17]:
which we observe to result in faster training in practice for supervised learning tasks.
Appendix C Proofs
c.1 Proof of Theorem 1
First let us quote the following two lemmas from [29].
Lemma 1 ([29]).
Consider as the class of neural nets parameterized by where each maps input to . Let be a distribution on parameter vector chosen independently from the training samples. Then, for each with probability at least for any and any random perturbation satisfying we have
(10) |
Lemma 2 ([29]).
Consider a -layer neural net with -Lipschitz activation function where . Then for any norm-bounded input and weight perturbation , we have the following perturbation bound:
(11) |
To prove Theorem 1, consider with weights . Since and , for any weight vector such that for every we have:
(12) |
We apply Lemma 1, choosing
to be a zero-mean multivariate Gaussian distribution with diagonal covariance matrix, where each entry of the
th layer with chosen later in the proof. Note that defined earlier in the theorem is the geometric average of spectral norms across all layers. Then for the th layer’s random perturbation vector , we get the following bound from [40] with representing the width of the th hidden layer:(13) |
We now use a union bound over all layers for a maximum union probability of , which implies the normalized for each layer is upper-bounded by . Then for any satisfying for all ’s
(14) |
Here (a) holds, since is true for each . Hence we choose for which the perturbation vector satisfies the assumptions of Lemma 2. Then, we bound the KL-divergence term in Lemma 1 as
Note that (b) holds, because we assume implying for each . Therefore, Lemma 1 implies with probability we have the following bound hold for any satisfying for all ’s,
(15) |
Then, we can give an upper-bound over all the functions in by finding the covering number of the set of ’s where for each feasible we have the mentioned condition satisfied for at least one of ’s. We only need to form the bound for which can be covered using a cover of size as discussed in [29]. Then, from the theorem’s assumption we know each will be in the interval which we want to cover such that for any in the interval there exists a satisfying . For this purpose we can use a cover of size ,111Note that implying and hence . which combined for all ’s gives a cover with size whose logarithm is growing as . This together with (15) completes the proof.
c.2 Proof of Theorem 2
We start by proving the following lemmas providing perturbation bound for FGM attacks.
Lemma 3.
Consider a -layer neural net with -Lipschitz and -smooth (-Lipschitz derivative) activation where . Let training loss also be -Lipschitz and -smooth for any fixed label . Then, for any input , label , and perturbation vector satisfying we have
(16) | ||||
Proof.
Since for a fixed satisfies the same Lipschitzness and smoothness properties as , then
and applying the chain rule implies:
Comments
There are no comments yet.