# Generalizable Adversarial Training via Spectral Normalization

Deep neural networks (DNNs) have set benchmarks on a wide array of supervised learning tasks. Trained DNNs, however, often lack robustness to minor adversarial perturbations to the input, which undermines their true practicality. Recent works have increased the robustness of DNNs by fitting networks using adversarially-perturbed training samples, but the improved performance can still be far below the performance seen in non-adversarial settings. A significant portion of this gap can be attributed to the decrease in generalization performance due to adversarial training. In this work, we extend the notion of margin loss to adversarial settings and bound the generalization error for DNNs trained under several well-known gradient-based attack schemes, motivating an effective regularization scheme based on spectral normalization of the DNN's weight matrices. We also provide a computationally-efficient method for normalizing the spectral norm of convolutional layers with arbitrary stride and padding schemes in deep convolutional networks. We evaluate the power of spectral normalization extensively on combinations of datasets, network architectures, and adversarial training schemes. The code is available at https://github.com/jessemzhang/dl_spectral_normalization.

## Authors

• 6 publications
• 1 publication
• 24 publications
• ### Robust Design of Deep Neural Networks against Adversarial Attacks based on Lyapunov Theory

Deep neural networks (DNNs) are vulnerable to subtle adversarial perturb...
11/12/2019 ∙ by Arash Rahnama, et al. ∙ 0

• ### On the Effect of Low-Rank Weights on Adversarial Robustness of Neural Networks

Recently, there has been an abundance of works on designing Deep Neural ...
01/29/2019 ∙ by Peter Langeberg, et al. ∙ 0

• ### Adversarial Risk Bounds for Neural Networks through Sparsity based Compression

Neural networks have been shown to be vulnerable against minor adversari...
06/03/2019 ∙ by Emilio Rafael Balda, et al. ∙ 0

• ### GraphDefense: Towards Robust Graph Convolutional Networks

In this paper, we study the robustness of graph convolutional networks (...
11/11/2019 ∙ by Xiaoyun Wang, et al. ∙ 0

• ### Robustness, Privacy, and Generalization of Adversarial Training

Adversarial training can considerably robustify deep neural networks to ...
12/25/2020 ∙ by Fengxiang He, et al. ∙ 0

• ### Bounding Singular Values of Convolution Layers

In deep neural networks, the spectral norm of the Jacobian of a layer bo...
11/22/2019 ∙ by Sahil Singla, et al. ∙ 0

• ### Understanding Adversarial Behavior of DNNs by Disentangling Non-Robust and Robust Components in Performance Metric

The vulnerability to slight input perturbations is a worrying yet intrig...
06/06/2019 ∙ by Yujun Shi, et al. ∙ 0

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

Despite their impressive performance on many supervised learning tasks, deep neural networks (DNNs) are often highly susceptible to adversarial perturbations imperceptible to the human eye (Szegedy et al., 2013; Goodfellow et al., 2014b)

. These “adversarial attacks" have received enormous attention in the machine learning literature over recent years

(Goodfellow et al., 2014b; Moosavi Dezfooli et al., 2016; Carlini & Wagner, 2016; Kurakin et al., 2016; Papernot et al., 2016; Carlini & Wagner, 2017; Papernot et al., 2017; Madry et al., 2018; Tramèr et al., 2018)

. Adversarial attack studies have mainly focused on developing effective attack and defense schemes. While attack schemes attempt to mislead a trained classifier via additive perturbations to the input, defense mechanisms aim to train classifiers robust to these perturbations. Although existing defense methods result in considerably better performance compared to standard training methods, the improved performance can still be far below the performance in non-adversarial settings

(Athalye et al., 2018; Schmidt et al., 2018).

A standard adversarial training scheme involves fitting a classifier using adversarially-perturbed samples (Szegedy et al., 2013; Goodfellow et al., 2014b) with the intention of producing a trained classifier with better robustness to attacks on future (i.e. test) samples. Madry et al. (2018) provides a robust optimization interpretation of the adversarial training approach, demonstrating that this strategy finds the optimal classifier minimizing the average worst-case loss over an adversarial ball centered at each training sample. This minimax interpretation can also be extended to distributionally-robust training methods (Sinha et al., 2018) where the offered robustness is over a Wasserstein-ball around the empirical distribution of training data.

Recently, Schmidt et al. (2018) have shown that standard adversarial training produces networks that generalize poorly. The performance of adversarially-trained DNNs over test samples can be significantly worse than their training performance, and this gap can be far greater than the generalization gap achieved using standard empirical risk minimization (ERM). This discrepancy suggests that the overall adversarial test performance can be improved by applying effective regularization schemes during adversarial training.

In this work, we propose using spectral normalization (SN) (Miyato et al., 2018) as a computationally-efficient and statistically-powerful regularization scheme for adversarial training of DNNs. SN has been successfully implemented and applied for DNNs in the context of generative adversarial networks (GANs) (Goodfellow et al., 2014a), resulting in state-of-the-art deep generative models for several benchmark tasks (Miyato et al., 2018). Moreover, SN (Tsuzuku et al., 2018) and other similar Lipschitz regularization techniques (Cisse et al., 2017) have been successfully applied in non-adversarial training settings to improve the robustness of ERM-trained networks to adversarial attacks. The theoretical results in (Bartlett et al., 2017; Neyshabur et al., 2017a) and empirical results in (Yoshida & Miyato, 2017) also suggest that SN can close the generalization gap for DNNs in non-adversarial ERM setting.

On the theoretical side, we first extend the standard notion of margin loss to adversarial settings. We then leverage the PAC-Bayes generalization framework (McAllester, 1999) to prove generalization bounds for spectrally-normalized DNNs in terms of our defined adversarial margin loss. Our approach parallels the approach used by Neyshabur et al. (2017a) to derive generalization bounds in non-adversarial settings. We obtain adversarial generalization error bounds for three well-known gradient-based attack schemes: fast gradient method (FGM) (Goodfellow et al., 2014b), projected gradient method (PGM) (Kurakin et al., 2016; Madry et al., 2018), and Wasserstein risk minimization (WRM) (Sinha et al., 2018). Our theoretical analysis shows that the adversarial generalization component will vanish by applying SN to all layers for sufficiently small spectral norm values.

On the empirical side, we show that SN can significantly improve test performance after adversarial training. We perform numerical experiments for three standard datasets (MNIST, CIFAR-10, SVHN) and various standard DNN architectures (including AlexNet (Krizhevsky et al., 2012), Inception (Szegedy et al., 2015), and ResNet (He et al., 2016)); in almost all of the experiments we obtain a better test performance after applying SN. Figure 1 shows the training and validation performance for AlexNet fit on the CIFAR10 dataset using FGM, PGM, and WRM, resulting in adversarial test accuracy improvements of , , and

percent, respectively. Furthermore, we numerically validate the correlation between the spectral-norm capacity term in our bounds and the actual generalization performance. To perform our numerical experiments, we develop a computationally-efficient approach for normalizing the spectral norm of convolution layers with arbitrary stride and padding schemes. We provide the TensorFlow code as spectral normalization of convolutional layers can also be useful for other deep learning tasks. To summarize, the main contributions of this work are:

1. Proposing SN as a regularization scheme for adversarial training of DNNs,

2. Extending concepts of margin-based generalization analysis to adversarial settings and proving margin-based generalization bounds for three gradient-based adversarial attack schemes,

3. Developing an efficient method for normalizing the spectral norm of convolutional layers in deep convolution networks,

4. Numerically demonstrating the improved test and generalization performance of DNNs trained with SN.

## 2 Preliminaries

In this section, we first review some standard concepts of margin-based generalization analysis in learning theory. We then extend these notions to adversarial training settings.

### 2.1 Supervised learning, Deep neural networks, Generalization error

Consider samples drawn i.i.d from underlying distribution . We suppose and where

represents the number of different labels. Given loss function

and function class parameterized by , a supervised learner aims to find the optimal function in minimizing the expected loss (risk) averaged over the underlying distribution .

We consider as the class of -layer neural networks with

hidden units per layer and activation functions

. Each in maps a data point to an

-dimensional vector. Specifically, we can express each

as . We use to denote the spectral norm of matrix

, defined as the largest singular value of

, and to denote ’s Frobenius norm.

A classifier ’s performance over the true distribution of data can be different from the training performance over the empirical distribution of training samples . The difference between the empirical and true averaged losses, evaluated on respectively training and test samples, is called the generalization error. Similar to Neyshabur et al. (2017a), we evaluate a DNN’s generalization performance using its expected margin loss defined for margin parameter as

 Lγ(fw):=P(fw(X)[Y]≤γ+maxj≠Yfw(X)[j]), (1)

where denotes the th entry of . For a given data point , we predict the label corresponding to the maximum entry of . Also, we use to denote the empirical margin loss averaged over the training samples. The goal of margin-based generalization analysis is to provide theoretical comparison between the true and empirical margin risks.

A supervised learner observes only the training samples and hence does not know the true distribution of data. Then, a standard approach to train a classifier is to minimize the empirical expected loss over function class , which is

 minw∈W1nn∑i=1ℓ(fw(xi),yi). (2)

This approach is called empirical risk minimization (ERM). For better optimization performance, the loss function is commonly chosen to be smooth. Hence, 0-1 and margin losses are replaced by smooth surrogate loss functions such as the cross-entropy loss. However, we still use the margin loss as defined in (1) for evaluating the test and generalization performance of DNN classifiers.

While ERM training usually achieves good performance over DNNs, several recent observations reveal that adding some adversarially-chosen perturbation to each sample can significantly drop the trained DNN’s performance. Given norm function and adversarial noise power , the adversarial additive noise for sample and classifier is defined to be

To provide adversarial robustness against the above attack scheme, a standard technique, which is called adversarial training, follows ERM training over the adversarially-perturbed samples by solving

Nevertheless, (3) and hence (4) are generally non-convex and intractable optimization problems. Therefore, several schemes have been proposed in the literature to approximate the optimal solution of (3). In this work, we analyze the generalization performance of the following three gradient-based methods for approximating the solution to (3). We note that several other attack schemes such as DeepFool (Moosavi Dezfooli et al., 2016), CW attacks (Carlini & Wagner, 2017), target and least-likely attacks (Kurakin et al., 2016) have been introduced and examined in the literature, which can lead to interesting future directions for this work.

1. [wide, labelwidth=!, labelindent=15pt]

2. Fast Gradient Method (FGM) (Goodfellow et al., 2014b): FGM approximates the solution to (3) by considering a linearized DNN loss around a given data point. Hence, FGM perturbs by adding the following noise vector:

 (5)

For the special case of -norm , the above representation of FGM recovers the fast gradient sign method (FGSM) where each data point is perturbed by the -normalized sign vector of the loss’s gradient. For -norm , we similarly normalize the loss’s gradient vector to have Euclidean norm.

3. Projected Gradient Method (PGM) (Kurakin et al., 2016): PGM is the iterative version of FGM and applies projected gradient descent to solve (3). PGM follows the following update rules for a given number of steps:

 ∀1≤i≤r:δpgm,i+1w(x) :=∏Bϵ,∥⋅∥(0){δpgm,iw(x)+αν(i)w}, (6) ν(i)w :=argmax∥δ∥≤1δT∇xℓ(fw(x+δpgm,iw(x)),y).

Here, we first find the direction along which the loss at the th perturbed point changes the most, and then we move the perturbed point along this direction by stepsize followed by projecting the resulting perturbation onto the set with -bounded norm.

4. Wasserstein Risk Minimization (WRM) (Sinha et al., 2018): WRM solves the following variant of (3) for data-point where the norm constraint in (3) is replaced by a norm-squared Lagrangian penalty term:

 δwrmw(x):=δargmaxℓ(fw(x+δ),y)−λ2∥δ∥2. (7)

As discussed earlier, the optimization problem (3) is generally intractable. However, in the case of Euclidean norm , if we assume ’s Lipschitz constant is upper-bounded by , then WRM optimization (7) results in solving a convex optimization problem and can be efficiently solved using gradient methods.

To obtain efficient adversarial defense schemes, we can substitute , , or for in (4). Instead of fitting the classifier over true adversarial examples, which are NP-hard to obtain, we can instead train the DNN over FGM, PGM, or WRM-adversarially perturbed samples.

The goal of adversarial training is to improve the robustness against adversarial attacks on not only the training samples but also on test samples; however, the adversarial training problem (4) focuses only on the training samples. To evaluate the adversarial generalization performance, we extend the notion of margin loss defined earlier in (1) to adversarial training settings by defining the adversarial margin loss as

Here, we measure the margin loss over adversarially-perturbed samples, and we use to denote the empirical adversarial margin loss. We also use , , and to denote the adversarial margin losses with FGM (5), PGM (6), and WRM (7) attacks, respectively.

## 3 Margin-based adversarial Generalization bounds

As previously discussed, generalization performance can be different between adversarial and non-adversarial settings. In this section, we provide generalization bounds for DNN classifiers under adversarial attacks in terms of the spectral norms of the trained DNN’s weight matrices. The bounds motivate regularizing these spectral norms in order to limit the DNN’s capacity and improve its generalization performance under adversarial attacks.

We use the PAC-Bayes framework (McAllester, 1999, 2003) to prove our main results. To derive adversarial generalization error bounds for DNNs with smooth activation functions

, we first extend a recent result on the margin-based generalization bound for the ReLU activation function

(Neyshabur et al., 2017a) to general -Lipschitz activation functions.

###### Theorem 1.

Consider the class of hidden-layer neural networks with units per hidden-layer with -Lipschitz activation satisfying . Suppose that , ’s support set, is norm-bounded as . Also assume for constant any satisfies

 ∀i:1M≤∥Wi∥2βw≤M,βw:=(d∏i=1∥Wi∥2)1/d.

Here

denotes the geometric mean of

’s spectral norms across all layers. Then, for any

, with probability at least

for any we have:

 L0(fw)≤ˆLγ(fw)+O( ⎷B2d2hlog(dh)Φerm(fw)+dlogdnlogMηγ2n),

where we define complexity score .

###### Proof.

We defer the proof to the Appendix. The proof is a slight modification of Neyshabur et al. (2017a)’s proof of the same result for ReLU activation. ∎

We now generalize this result to adversarial settings where the DNN’s performance is evaluated under adversarial attacks. We prove three separate adversarial generalization error bounds for FGM, PGM, and WRM attacks.

For the following results, we consider , the class of neural nets defined in Theorem 1. Moreover, we assume that the training loss and its first-order derivative are -Lipschitz. Similar to Sinha et al. (2018), we assume the activation is smooth and its derivative is -Lipschitz. This class of activations include ELU (Clevert et al., 2015) and tanh functions but not the ReLU function. However, our numerical results in Table 1 from the Appendix suggest similar generalization performance between ELU and ReLU activations.

###### Theorem 2.

Consider in Theorem 1 and training loss function satisfying the assumptions stated above. We consider an FGM attack with noise power according to Euclidean norm . For any assume holds for constant , any , and any -close to ’s support set. Then, for any with probability the following bound holds for the FGM margin loss of any

 Lfgm0(fw)≤ˆLfgmγ(fw)+O( ⎷(B+ϵ)2d2hlog(dh)Φfgmϵ,κ(fw)+dlogdnlogMηγ2n),

where .

###### Proof.

We defer the proof to the Appendix. ∎

Note that the above theorem assumes that the change rate for the loss function around test samples is at least , which gives a baseline for measuring the attack power . In our numerical experiments, we validate this assumption over standard image recognition tasks. Next, we generalize this result to adversarial settings with PGM attack, i.e. the iterative version of FGM attack.

###### Theorem 3.

Consider and training loss function for which the assumptions in Theorem 2 hold. We consider a PGM attack with noise power given Euclidean norm , iterations for attack, and stepsize . Then, for any with probability the following bound applies to the PGM margin loss of any

 Lpgm0(fw)≤ˆLpgmγ(fw)+O( ⎷(B+ϵ)2d2hlog(dh)Φpgmϵ,κ,r,α(fw)+dlogrdnlogMηγ2n).

Here we define as the following expression

 {d∏i=1∥Wi∥2(1+(α/κ)1−(2α/κ)r¯¯¯¯¯lip(∇ℓ∘fw)r1−(2α/κ)¯¯¯¯¯lip(∇ℓ∘fw)(d∏i=1∥Wi∥2)d∑i=1i∏j=1∥Wj∥2)}2d∑i=1∥Wi∥2F∥Wi∥22,

where provides an upper-bound on the Lipschitz constant of .

###### Proof.

We defer the proof to the Appendix. ∎

In the above result, notice that if then for any number of gradient steps the PGM margin-based generalization bound will grow the FGM generalization error bound in Theorem 2 by factor . We next extend our adversarial generalization analysis to WRM attacks.

###### Theorem 4.

For neural net class and training loss satisfying Theorem 2’s assumptions, consider a WRM attack with Lagrangian coefficient and Euclidean norm . Given parameter , assume defined in Theorem 3 is upper-bounded by for any . For any , the following WRM margin-based generalization bound holds with probability for any :

 Lwrm0(fw)≤ˆLwrmγ(fw)+O( ⎷(B+1λ∏di=1∥Wi∥2)2d2hlog(dh)Φwrmλ(fw)+dlogdnlogMτηγ2n)

where we define

 Φwrmλ(fw):={d∏i=1∥Wi∥2(1+1λ−¯¯¯¯¯lip(∇ℓ∘fw)(d∏i=1∥Wi∥2)d∑i=1i∏j=1∥Wj∥2)}2d∑i=1∥Wi∥2F∥Wi∥22.
###### Proof.

We defer the proof to the Appendix. ∎

As discussed by Sinha et al. (2018), the condition for the actual Lipschitz constant of is in fact required to guarantee WRM’s convergence to the global solution. Notice that the WRM generalization error bound in Theorem 4 is bounded by the product of and the FGM generalization bound in Theorem 2.

## 4 Spectral normalization of convolutional layers

To control the Lipschitz constant of our trained network, we need to ensure that the spectral norm associated with each linear operation in the network does not exceed some pre-specified . For fully-connected layers (i.e. regular matrix multiplication), please see Appendix B. For a general class of linear operations including convolution, Tsuzuku et al. (2018) propose to compute the operation’s spectral norm through computing the gradient of the Euclidean norm of the operation’s output. Here, we leverage the deconvolution operation to further simplify and accelerate computing the spectral norm of the convolution operation. Additionally, Sedghi et al. (2018) develop a method for computing all the singular values including the largest one, i.e. the spectral norm. While elegant, the method only applies to convolution filters with stride and zero-padding. However, in practice the normalization factor depends on the stride size and padding scheme governing the convolution operation. Here we develop an efficient approach for computing the maximum singular value, i.e. spectral norm, of convolutional layers with arbitary stride and padding schemes. Note that, as also discussed by Gouk et al. (2018), the th convolutional layer output feature map is a linear operation of the input :

 ψi(X)=M∑j=1Fi,j⋆Xj,

where has feature maps, is a filter, and denotes the convolution operation (which also encapsulates stride size and padding scheme). For simplicity, we ignore the additive bias terms here. By vectorizing and letting represent the overall linear operation associated with , we see that

 ψi(X)=[V1,1…V1,M]X,

and therefore the overall convolution operation can be described using

 ψ(X)=⎡⎢ ⎢ ⎢⎣V1,1…V1,M⋮⋱⋮VN,1…VN,M⎤⎥ ⎥ ⎥⎦X=WX.

While explicitly reconstructing is expensive, we can still compute , the spectral norm of , by leveraging the convolution transpose operation implemented by several modern-day deep learning packages. This allows us to efficiently performs matrix multiplication with without explicitly constructing . Therefore we can approximate using a modified version of power iteration (Algorithm 1), wrapping the appropriate stride size and padding arguments into the convolution and convolution transpose operations. After obtaining , we compute in the same manner as for the fully-connected layers. Like Miyato et al., we exploit the fact that SGD only makes small updates to from training step to training step, reusing the same and running only one iteration per step. Unlike Miyato et al., rather than enforcing , we instead enforce the looser constraint :

 WSN=W/max(1,σ(W)/β), (9)

which we observe to result in faster training for supervised learning tasks.

## 5 Numerical Experiments

In this section we provide an array of empirical experiments to validate both the bounds we derived in Section 3 and our implementation of spectral normalization described in section 4. We show that spectral normalization improves both test accuracy and generalization for a variety of adversarial training schemes, datasets, and network architectures.

All experiments are implemented in TensorFlow (Abadi et al., 2016). For each experiment, we cross validate 4 to 6 values of (see (9)) using a fixed validation set of 500 samples. For PGM, we used iterations and . Additionally, for FGM and PGM we used -type attacks (unless specified) with magnitude (this value was approximately 2.44 for CIFAR10). For WRM, we implemented gradient ascent as discussed by Sinha et al. (2018). Additionally, for WRM training we used a Lagrangian coefficient of for CIFAR10 and SVHN and a Lagrangian coefficient of for MNIST in a similar manner to Sinha et al. (2018). The code will be made readily available.

### 5.1 Validation of spectral normalization implementation and bounds

We first demonstrate the effect of the proposed spectral normalization approach on the final DNN weights by comparing the norm of the input to that of the output . As shown in Figure 2(a), without spectral normalization ( in (9)), the norm gain can be large. Additionally, because we are using cross-entropy loss, the weights (and therefore the norm gain) can grow arbitrarily high if we continue training as reported by Neyshabur et al. (2017b). As we decrease , however, we produce more constrained networks, resulting in a decrease in norm gain. At , the gain of the network cannot be greater than 1, which is consistent with what we observe. Additionally, we provide a comparison of our method to that of Miyato et al. (2018) in Appendix A.1, empirically demonstrating that Miyato et al.’s method does not properly control the spectral norm of convolutional layers, resulting in worse generalization performance.

Figure 2(b) shows that the norms of the gradients with respect to the training samples are nicely distributed after spectral normalization. Additionally, this figure suggests that the minimum gradient -norm assumption (the condition in Theorems 2 and 3) holds for spectrally-normalized networks.

The first column of Figure 3 shows that, as observed by Bartlett et al. (2017), AlexNet trained using ERM generates similar margin distributions for both random and true labels on CIFAR10 unless we normalize the margins appropriately. We see that even without further correction, ERM training with SN allows AlexNet to have distinguishable performance between the two datasets. This observation suggests that SN as a regularization scheme enforces the generalization error bounds shown for spectrally-normalized DNNs by Bartlett et al. (2017) and Neyshabur et al. (2017a). Additionally, the margin normalization factor (the capacity norm in Theorems 1-4) is much smaller for networks trained with SN. As demonstrated by the other columns in Figure 3, a smaller normalization factor results in larger normalized margin values and much tighter margin-based generalization bounds (a factor of for ERM and a factor of for FGM and PGM) (see Theorems 1-4).

### 5.2 Spectral normalization improves generalization and adversarial robustness

The phenomenon of overfitting random labels described by Zhang et al. (2016) can be observed even for adversarial training methods. Figure 4 shows how the FGM, PGM, or WRM adversarial training schemes only slightly delay the rate at which AlexNet fits random labels on CIFAR10, and therefore the generalization gap can be quite large without proper regularization. After introducing spectral normalization, however, we see that the network has a much harder time fitting both the random and true labels. With the proper amount of SN (chosen via cross validation), we can obtain networks that struggle to fit random labels while still obtaining the same or better test performance on true labels.

We also observe that training schemes regularized with SN result in networks more robust to adversarial attacks. Figure 5 shows that even without adversarial training, AlexNet with SN becomes more robust to FGM, PGM, and WRM attacks. Adversarial training improves adversarial robustness more than SN by itself; however we see that we can further improve the robustness of the trained networks significantly by combining SN with adversarial training.

### 5.3 Other datasets and architectures

We demonstrate the power of regularization via SN on several combinations of datasets, network architectures, and adversarial training schemes. The datasets we evaluate are CIFAR10, MNIST, and SVHN. We fit CIFAR10 using the AlexNet and Inception networks described by Zhang et al. (2016)

, 1-hidden-layer and 2-hidden-layer multi layer perceptrons (MLPs) with ELU activation and 512 hidden nodes in each layer, and the ResNet architecture (

He et al. (2016)) provided in TensorFlow for fitting CIFAR10. We fit MNIST using the ELU network described by Sinha et al. (2018) and the 1-hidden-layer and 2-hidden-layer MLPs. Finally, we fit SVHN using the same AlexNet architecture we used to fit CIFAR10. Our implementations do not use any additional regularization schemes including weight decay, dropout (Srivastava et al., 2014)

, and batch normalization

(Ioffe & Szegedy, 2015) as these approaches are not motivated by the theory developed in this work; however, we provide numerical experiments comparing the proposed approach with weight decay, dropout, and batch normalization in Appendix A.2.

Table 1 in the Appendix reports the pre and post-SN test accuracies for all 42 combinations evaluated. Figure 1 in the Introduction and Figures 9-8 in the Appendix show examples of training and validation curves on some of these combinations. We see that the validation curve generally improves after regularization with SN, and the observed improvements in validation accuracy are confirmed by the test accuracies reported in Table 1. Figure 6 visually summarizes Table 1, showing how SN can often significantly improve the test accuracy (and therefore decrease the generalization gap) for several of the combinations. We also provide Table 2 in the Appendix which shows the proportional increase in training time after introducing SN with our TensorFlow implementation.

## 6 Related Works

Providing theoretical guarantees for adversarial robustness of various classifiers has been studied in multiple works. Wang et al. (2017) targets analyzing the adversarial robustness of the nearest neighbor approach. Gilmer et al. (2018) studies the effect of the complexity of the data-generating manifold on the final adversarial robustness for a specific trained model. Fawzi et al. (2018) proves lower-bounds for the complexity of robust learning in adversarial settings, targeting the population distribution of data. Xu et al. (2009)

shows that the regularized support vector machine (SVM) can be interpreted via robust optimization.

Fawzi et al. (2016) analyzes the robustness of a fixed classifier to random and adversarial perturbations of the input data. While all of these works seek to understand the robustness properties of different classification function classes, unlike our work they do not focus on the generalization aspects of learning over DNNs under adversarial attacks.

Concerning the generalization aspect of adversarial training, Sinha et al. (2018) provides optimization and generalization guarantees for WRM under the assumptions discussed after Theorem 4. However, their generalization guarantee only applies to the Wasserstein cost function, which is different from the 0-1 or margin loss and does not explicitly suggest a regularization scheme. In a recent related work, Schmidt et al. (2018) numerically shows the wide generalization gap in PGM adversarial training and theoretically establishes lower-bounds on the sample complexity of linear classifiers in Gaussian settings. While our work does not provide sample complexity lower-bounds, we study the broader function class of DNNs where we provide upper-bounds on adversarial generalization error and suggest an explicit regularization scheme for adversarial training over DNNs.

Generalization in deep learning has been a topic of great interest in machine learning (Zhang et al., 2016). In addition to margin-based bounds (Bartlett et al., 2017; Neyshabur et al., 2017a), various other tools including VC dimension (Anthony & Bartlett, 2009), norm-based capacity scores (Bartlett & Mendelson, 2002; Neyshabur et al., 2015), and flatness of local minima (Keskar et al., 2016; Neyshabur et al., 2017b) have been used to analyze generalization properties of DNNs. Recently, Arora et al. (2018) introduced a compression approach to further improve the margin-based bounds presented by Bartlett et al. (2017); Neyshabur et al. (2017a). The PAC-Bayes bound has also been considered and computed by Dziugaite & Roy (2017), resulting in non-vacuous bounds for MNIST.

## Appendix A Further experimental results

### a.1 Comparison of proposed method to [26]’s method

For the optimal chosen when fitting AlexNet to CIFAR10 with PGM, we repeat the experiment using the spectral normalization approach suggested by [26]. This approach performs spectral normalization on convolutional layers by scaling the convolution kernel by the spectral norm of the kernel rather than the spectral norm of the overall convolution operation. Because it does not account for how the kernel can amplify perturbations in a single pixel multiple times (see Section 4), it does not properly control the spectral norm.

In Figure 10, we see that for the optimal reported in the main text, using [26]’s SN method results in worse generalization performance. This is because although we specified that , the actual obtained using [26]’s method can be much greater for convolutional layers, resulting in overfitting (hence the training curve quickly approaches 1.0 accuracy). The AlexNet architecture used has two convolutional layers. For the proposed method, the final spectral norms of the convolutional layers were both 1.60; for [26]’s method, the final spectral norms of the convolutional layers were 7.72 and 7.45 despite the corresponding convolution kernels having spectral norms of 1.60.

Our proposed method is less computationally efficient in comparison to [26]’s approach because each power iteration step requires a convolution operation rather than a division operation. As shown in Table 3, the proposed approach is not significantly less efficient with our TensorFlow implementation.

## Appendix B Spectral normalization of fully-connected layers

For fully-connected layers, we approximate the spectral norm of a given matrix using the approach described by [26]: the power iteration method. For each , we randomly initialize a vector and approximate both the left and right singular vectors by iterating the update rules

 ~v ←W~u/∥W~u∥2 ~u ←WT~v/∥WT~v∥2.

The final singular value can be approximated with . Like Miyato et al., we exploit the fact that SGD only makes small updates to from training step to training step, reusing the same and running only one iteration per step. Unlike Miyato et al., rather than enforcing , we instead enforce the looser constraint as described by [17]:

 WSN=W/max(1,σ(W)/β),

which we observe to result in faster training in practice for supervised learning tasks.

## Appendix C Proofs

### c.1 Proof of Theorem 1

First let us quote the following two lemmas from [29].

###### Lemma 1 ([29]).

Consider as the class of neural nets parameterized by where each maps input to . Let be a distribution on parameter vector chosen independently from the training samples. Then, for each with probability at least for any and any random perturbation satisfying we have

 L0(fw)≤ˆLγ(fw)+4 ⎷KL(Pw+u∥Q)+log6nηn−1. (10)
###### Lemma 2 ([29]).

Consider a -layer neural net with -Lipschitz activation function where . Then for any norm-bounded input and weight perturbation , we have the following perturbation bound:

 ∥fw+u(x)−fw(x)∥2≤eB(d∏i=1∥Wi∥2)d∑i=1∥Ui∥2∥Wi∥2. (11)

To prove Theorem 1, consider with weights . Since and , for any weight vector such that for every we have:

 (1/e)dd−1d∏i=1∥˜Wi∥2≤d∏i=1∥Wi∥2≤ed∏i=1∥˜Wi∥2. (12)

We apply Lemma 1, choosing

to be a zero-mean multivariate Gaussian distribution with diagonal covariance matrix, where each entry of the

th layer with chosen later in the proof. Note that defined earlier in the theorem is the geometric average of spectral norms across all layers. Then for the th layer’s random perturbation vector , we get the following bound from [40] with representing the width of the th hidden layer:

 Pr(β˜w∥Ui∥2∥˜Wi∥2>t)≤2hexp(−t22hξ2). (13)

We now use a union bound over all layers for a maximum union probability of , which implies the normalized for each layer is upper-bounded by . Then for any satisfying for all ’s

 max∥x∥2≤B∥fw+u(x)−fw(x)∥2 ≤eB(d∏i=1∥Wi∥2)d∑i=1∥Ui∥2∥Wi∥2 (a)≤e2B(d∏i=1∥˜Wi∥2)d∑i=1∥Ui∥2∥˜Wi∥2 =e2Bβd−1˜wd∑i=1β˜w∥Ui∥2∥˜Wi∥2 ≤e2dBβd−1˜wξ√2hlog(4hd). (14)

Here (a) holds, since is true for each . Hence we choose for which the perturbation vector satisfies the assumptions of Lemma 2. Then, we bound the KL-divergence term in Lemma 1 as

 KL(Pw+u∥Q) ≤d∑i=1∥Wi∥2F2ξ2i =302d2B2β2d˜whlog(4hd)2γ2d∑i=1∥Wi∥2F∥˜Wi∥22 (b)≤302e2d2B2∏di=1∥Wi∥22hlog(4hd)2γ2d∑i=1∥Wi∥2F∥Wi∥22 =O(d2B2hlog(hd)∏di=1∥Wi∥22γ2d∑i=1∥Wi∥2F∥Wi∥22).

Note that (b) holds, because we assume implying for each . Therefore, Lemma 1 implies with probability we have the following bound hold for any satisfying for all ’s,

 L0(fw)≤ˆLγ(fw)+O( ⎷B2d2hlog(dh)Φerm(fw)+logn˜ηγ2n). (15)

Then, we can give an upper-bound over all the functions in by finding the covering number of the set of ’s where for each feasible we have the mentioned condition satisfied for at least one of ’s. We only need to form the bound for which can be covered using a cover of size as discussed in [29]. Then, from the theorem’s assumption we know each will be in the interval which we want to cover such that for any in the interval there exists a satisfying . For this purpose we can use a cover of size ,111Note that implying and hence . which combined for all ’s gives a cover with size whose logarithm is growing as . This together with (15) completes the proof.

### c.2 Proof of Theorem 2

We start by proving the following lemmas providing perturbation bound for FGM attacks.

###### Lemma 3.

Consider a -layer neural net with -Lipschitz and -smooth (-Lipschitz derivative) activation where . Let training loss also be -Lipschitz and -smooth for any fixed label . Then, for any input , label , and perturbation vector satisfying we have

 ∥∥∇xℓ(fw+u(x),y)−∇xℓ(fw(x),y)∥∥2 (16) ≤
###### Proof.

Since for a fixed satisfies the same Lipschitzness and smoothness properties as , then

and applying the chain rule implies:

 ∥∥∇xℓ(fw+u(x),y)−∇xℓ(fw(x),y)∥∥2 = ∥∥(∇xfw+u(x))(∇ℓ)(fw+u(x),y)−(∇xfw(x))(∇ℓ)(fw(x),y)∥∥2 ≤ ∥∥(∇xfw+u(x))(∇ℓ)(fw+u(x),y)−(∇xfw(x))(∇ℓ)(fw+u(x),y)