Does Network Width Really Help Adversarial Robustness?

10/03/2020 ∙ by Boxi Wu, et al. ∙ Zhejiang University 0

Adversarial training is currently the most powerful defense against adversarial examples. Previous empirical results suggest that adversarial training requires wider networks for better performances. Yet, it remains elusive how does neural network width affects model robustness. In this paper, we carefully examine the relation between network width and model robustness. We present an intriguing phenomenon that the increased network width may not help robustness. Specifically, we show that the model robustness is closely related to both natural accuracy and perturbation stability, a new metric proposed in our paper to characterize the model's stability under adversarial perturbations. While better natural accuracy can be achieved on wider neural networks, the perturbation stability actually becomes worse, leading to a potentially worse overall model robustness. To understand the origin of this phenomenon, we further relate the perturbation stability with the network's local Lipschitznesss. By leveraging recent results on neural tangent kernels, we show that larger network width naturally leads to worse perturbation stability. This suggests that to fully unleash the power of wide model architecture, practitioners should adopt a larger regularization parameter for training wider networks. Experiments on benchmark datasets confirm that this strategy could indeed alleviate the perturbation stability issue and improve the state-of-the-art robust models.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

(a) Natural Risk
(b) Robust Regularization
Figure 1: Plots of both natural risk and robust regularization in (1). Models are trained using TRADES (Zhang et al., 2019) on CIFAR10 (Krizhevsky et al., 2009) with -layer WideResNet model (Zagoruyko and Komodakis, 2016) with widen factor of and .

Researchers have found that Deep Neural Networks (DNNs) suffer badly from adversarial examples (Szegedy et al., 2014). By perturbing the original inputs with an intentionally computed, undetectable noise, one can deceive DNNs and even arbitrarily modify their predictions on purpose. To defend against adversarial examples and further improve model robustness, various defense approaches have been proposed (Papernot et al., 2016b; Meng and Chen, 2017; Dhillon et al., 2018; Liao et al., 2018; Xie et al., 2018; Guo et al., 2018; Song et al., 2018; Samangouei et al., 2018). Among them, adversarial training (Goodfellow et al., 2015; Madry et al., 2018) has been shown to be the most effective type of defenses (Athalye et al., 2018)

. Adversarial training can be seen as a form of data augmentation by first finding the adversarial examples and then training DNN models on those examples. Specifically, given a DNN classifier

parameterized by

, a general form of adversarial training with loss function

can be defined as:

(1)

where are training data, denotes the norm ball with radius centered at , and , and is the regularization parameter. Compared with standard empirical risk minimization, the extra robust regularization term encourages the data points within to be classified as the same class. The regularization parameter adjusts the strength of robust regularization. When , it recovers the formulation in Madry et al. (2018), and when , it recovers the formulation in Goodfellow et al. (2015). Furthermore, replacing the loss difference in robust regularization term with the KL-divergence based regularization recovers the formulation in Zhang et al. (2019).

One common belief in the practice of adversarial training is that, compared with the standard empirical risk minimization, adversarial training requires much wider neural networks to achieve better robustness. Madry et al. (2018) provided an intuitive explanation: robust classification requires a much more complicated decision boundary, as it needs to handle the presence of possible adversarial examples. Yet it remains elusive how does the network width affects the model robustness. To answer this question, we first examine whether the larger network width contributes to both the natural risk term and the robust regularization term in (1). Interestingly, when tracing the value changes in (1) during adversarial training, we observe that the value of the robust regularization part actually gets worse on wider models, suggesting that larger network width does not lead to better stability. In Figure 1, we show the loss value comparison of two different wide models trained using Zhang et al. (2019). We can see that the wider model (i.e., WideResNet-34-10) achieves better natural risk but incurs a larger value on robust regularization. This motivates us to find out the cause of this phenomenon.

In this paper, we carefully study the relationship between neural network width and model robustness with a counter-intuitive conclusion that the increased network width may not help robustness. We summarize our main contributions as

  1. We show that the model robustness is closely related to both natural accuracy and perturbation stability, a new metric we proposed to characterize the strength of robust regularization. While the natural accuracy is improved on wider models, the perturbation stability often gets worse. This suggests that the deteriorated perturbation stability is the main reason for the marginal improved or even degenerate model robustness on wider models.

  2. Unlike previous understandings that there exists a trade-off between natural accuracy and robust accuracy, we show that the real trade-off should between natural accuracy and perturbation stability. And the robust accuracy is actually the consequence of this trade-off.

  3. To understand the origin of this problem, we further relate perturbation stability with the network’s local Lipschitznesss. By leveraging recent results on neural tangent kernels (Jacot et al., 2018; Allen-Zhu et al., 2019; Zou et al., 2020; Cao and Gu, 2019; Gao et al., 2019), we show that larger network width naturally leads to worse perturbation stability, which explains our empirical findings.

  4. Our analyses suggest that to fully unleash the potential of wider model architectures, one should mitigate the perturbation stability deterioration on wider models. One natural strategy is to enlarge the corresponding robust regularization parameter. We experimentally verified this strategy with adversarial training methods on benchmark datasets and found that it clearly boosts the robustness of wider models.

The remainder of this paper is organized as follows: in Section 2, we briefly review existing literature on adversarial attacks and defenses as well as robustness and generalization. We empirically study the network width and adversarial robustness for adversarial training methods in Section 3. In Section 4, we theoretically show that larger network width leads to worse perturbation stability. In Section 5, we show that improving the perturbation stability on wide models leads to better robustness on benchmark datasets. Finally, we conclude this paper in Section 6.

Notation. For a

-dimensional vector

, we use with to denote its norm. represents the indicator function and represents the universal quantifier.

2 Related work

There is a huge body of literature on adversarial machine learning. Here we briefly review representative works that are mostly related to our paper.

Adversarial attacks.

Adversarial examples and their intriguing properties were first found in Szegedy et al. (2014)

. Since then, a tremendous amount of works have been done exploring the origins or inevitability of this intriguing property of deep learning

(Gu and Rigazio, 2015; Kurakin et al., 2017; Fawzi et al., 2018; Tramèr et al., 2017; Gilmer et al., 2018; Zhang et al., 2020b) as well as designing more powerful attacks (Goodfellow et al., 2015; Papernot et al., 2016a; Moosavi-Dezfooli et al., 2016; Madry et al., 2018; Carlini and Wagner, 2017; Chen and Gu, 2020) under various attack settings. Athalye et al. (2018) identified the gradient masking problem and showed that many defense methods could be broken with a few changes on the attacker. Chen et al. (2017) proposed gradient-free black-box attacks and Ilyas et al. (2018, 2019a); Chen et al. (2020) further improved its efficiency. Recently, Ilyas et al. (2019b); Jacobsen et al. (2019) pointed out adversarial examples are generated from the non-robust or invariant features hidden in the training data.

Defensive adversarial learning.

Many defense approaches have been proposed aiming to directly learn a robust model that is able to defend against adversarial attacks. Madry et al. (2018) proposed a general framework of robust training by solving a min-max optimization problem. Wang et al. (2019) proposed a new criterion to quantitatively evaluate the convergence quality. Zhang et al. (2019) theoretically studied the trade-off between natural accuracy and robust accuracy for adversarially trained models. Wang et al. (2020) followed this framework and further improved its robustness by differentiating correctly classified and misclassified examples. Cissé et al. (2017); Ross and Doshi-Velez (2018) solve the problem by restricting the variation of outputs with respect to the changing of inputs. Cohen et al. (2019); Salman et al. (2019); Lécuyer et al. (2019) developed provably robust adversarial learning methods that have the theoretical guarantees on robustness. Recent works in Wong et al. (2020); Qin et al. (2019) focus on creating adversarial robust networks with faster training protocol. Another line of works focuses on increasing the effective size of the training data, either by pre-trained models (Hendrycks et al., 2019)

or by semi-supervised learning methods 

(Carmon et al., 2019; Alayrac et al., 2019; Najafi et al., 2019).

Robustness and generalization.

Earlier works like Goodfellow et al. (2015) found that adversarial learning can reduce overfitting and help generalization. However, as the arm race between attackers and defenses keeps going, it is observed that strong adversarial attacks can cause severe damage to the model’s natural accuracy (Madry et al., 2018; Zhang et al., 2019). Many works (Zhang et al., 2019; Tsipras et al., 2019; Raghunathan et al., 2019) attempt to explain this trade-off between robustness and natural generalization, while some other works proposed different perspectives. Schmidt et al. (2018) confirmed that more training data has the potential to close this gap. Bubeck et al. (2019) suggested that a robust model is computationally difficult to learn and optimize. Zhang et al. (2020b) showed that there is still a large gap between the currently achieved model robustness and the theoretically achievable robustness limit on real image distributions. In Nakkiran (2019), the existence of robust models with high natural accuracy has been proved in the setting of classification. However, the origin of this trade-off is not crystal clear, and its relation with the model complexity remains elusive.

3 Empirical study on network width and adversarial robustness

In this section, we empirically study the relation between network width and its adversarial robustness in a more thorough way.

3.1 Characterization of robust examples

Robust accuracy is the standard evaluation metric of robustness, which measures the ratio of robust examples, i.e., examples that can still be correctly classified after adversarial attacks.

Figure 2: An illustration of the robust examples.

Previous empirical results suggest that wide models enjoy both better generalization ability and model robustness. Specifically, Madry et al. (2018) proposed to extend ResNet (He et al., 2016) architecture to WideResNet (Zagoruyko and Komodakis, 2016) with a widen factor for adversarial training on the CIFAR10 dataset and found that the increased model capacity significantly improves both robust accuracy and natural accuracy. Later works such as (Zhang et al., 2019; Wang et al., 2020) follow this finding and report their best result using WideResNet (Zagoruyko and Komodakis, 2016) with widen factor .

However, as shown by our findings in Figure 1, wider models actually lead to worse robust regularization effect, suggesting that wider models are not better in all aspects and the relation between model robustness and network width may be more intricate than what people understood previously. To understand the intrinsic relation between model robustness and network width, let us first take a closer look at the robust examples. Mathematically, robust examples can be defined as , which are examples that can still be correctly classified after adversarial attacks. Note that a robust example should meet the following two conditions at the same time:

(2)

where is the logical conjunction operator. By (2), we notice that the robust examples are the intersection of two other sets: correctly classified examples and stable examples. A more direct illustration of this relationship can be found in Figure 2. While natural accuracy measures the ratio of correctly classified examples against the whole sample set, to our knowledge, there does not exist a metric measuring the ratio of stable examples against whole the sample set. Here we formally define this as perturbation stability, which measures the fraction of examples whose output labels cannot be adversarially perturbed as reflected in the robust regularization term in (1).

3.2 Evaluation of perturbation stability

(a) Robust Accuracy
(b) Natural Accuracy
(c) Perturbation Stability
(d) Metrics against Width
Figure 3:

Plots of (a) robust accuracy, (b) natural accuracy, and (c) perturbation stability against training epochs for networks of different width. Results are acquired on CIFAR10 with the adversarial training method TRADES and architectures of WideResNet-34. Training schedule is the same as the original work

(Zhang et al., 2019). We record all three metrics when robust accuracy reaches the highest point and plot them against network width in (d).

We apply the state-of-the-art adversarial training algorithm TRADES (Zhang et al., 2019) on CIFAR10 and plot the robust accuracy, natural accuracy, and perturbation stability against the training epochs in Figure 3. Experiments are conducted on WideResNet-34 with various widen factors. For each network, when robust accuracy reaches the highest point, we record all three metrics and show their changing trend against network width in Figure 3(d). From Figure 3(d), we can observe that the perturbation stability decreases monotonically as the network width increases. This suggests that wider models are actually more vulnerable to adversarial perturbation. In this sense, the increased network width could hurt the overall model robustness to a certain extent. This can be seen from Figure 3(d), where the robust accuracy of widen-factor is actually slightly better than that of widen-factor .

Aside from the relation with model width, we also gain some other insights from the newly proposed perturbation stability:

  1. Unlike robust accuracy and natural accuracy, perturbation stability gradually gets worse during the training process. This makes sense since an unlearned model that always outputs the same label will have perfect stability, and the training process tends to break this perfect stability. From another perspective, the role of robust regularization in (1) is to encourage perturbation stability, such that the learned models cannot be easily perturbed for the sake of model robustness.

  2. Previous works (Zhang et al., 2019; Tsipras et al., 2019; Raghunathan et al., 2019) have argued that there exists a trade-off between natural accuracy and robust accuracy. However, from (2), we can see that robust accuracy and natural accuracy are coupled with each other, as a robust example must first be correctly classified. When the natural accuracy goes to zero, the robust accuracy will become zero. On the other hand, higher natural accuracy also implies that more examples will likely become robust examples. Therefore, we argue that the real trade-off here should be between natural accuracy and perturbation stability. And the robust accuracy is actually the consequence of this trade-off.

  3. Rice et al. (2020) has recently shown that adversarial training suffers from over-fitting as the robust accuracy might get worse as training proceeds, which can be seen in Figure 3(a). We found that the origin of this over-fitting is largely attributed to the degenerate perturbation stability (See Figure 3(c)) rather than the natural risk (See Figure 3(b)).

4 Why larger network width leads to worse perturbation stability?

Our empirical findings in Section 3 motivates us to find the underlying reason for the decrease of the perturbation stability during the training process. In this section, we show in theory that larger network width naturally leads to worse perturbation stability by relating perturbation stability with the network’s local Lipchitzness and leveraging recent studies on neural tangent kernels (Jacot et al., 2018; Allen-Zhu et al., 2019; Cao and Gu, 2019; Zou et al., 2020; Gao et al., 2019) to illustrate it.

4.1 Perturbation stability and local Lipschitzness

Previous works (Hein and Andriushchenko, 2017; Weng et al., 2018) usually relate local Lipschitzness with network robustness, suggesting that smaller local Lipschitzness leads to robust models. Here we show that local Lipshctzness is more directly linked to perturbation stability, through which it further influences model robustness.

To get started, let us first recall the definition of Lipschitz continuity and its relation with gradient norms.

[Lipschitz continuity and gradient norm (Paulavičius and Žilinskas, 2006)] Let denotes a convex compact set, is a Lipschitz function if for all , it satisfies

where and .

Intuitively speaking, Lipschitz continuity guarantees that small perturbation in the input will not lead to large changes in the function output. In the adversarial training setting where the perturbation can only be chosen within the neighborhood of , we focus on the local Lipschitz constant where we restrict and .

Now suppose our neural network loss function is local Lipschitz, setting as our computed adversarial example and as the original example, we have

(3)

where the equality holds since is the maximizer of the robust regularization term, the first inequality is due to local Lipschitz continuity and . (4.1) shows that the local Lipschitz constant is directly related to the robust regularization term, which can be used as a surrogate loss for the perturbation stability.

4.2 Local Lipschitzness and network width

Now we study how the network width affects the perturbation stability via studying the local Lipschitz constant.

Recently, a line of research emerges, which tries to theoretically understand the optimization and generalization behaviors of over-parameterized deep neural networks through the lens of the neural tangent kernel (NTK) (Jacot et al., 2018; Allen-Zhu et al., 2019; Cao and Gu, 2019; Zou et al., 2020). By showing the equivalence between over-parameterized neural networks and NTK in the finite width setting, this type of analysis characterizes the optimization and generalization performance of deep learning by the network architecture (e.g., network width, which we are particularly interested in). Recently, Gao et al. (2019) also analyzed the convergence of adversarial training for over-parameterized neural networks using NTK. Here, we will show that the local Lipschitz constant increases as the model width.

Figure 4: Plot of approximated local Lipschitz constant along the adversarial training trajectory. Models are trained by TRADES (Zhang et al., 2019) on CIFAR10 dataset using WideResNet model. Wider networks in general have larger local Lipschitz constants.

In specific, let be the network width and be the network depth. Define an -layer fully connected neural network as follows

where , are the weight matrices, is the output layer weight vector, and

is the entry-wise ReLU activation function. For notational simplicity, we denote by

the collection of weight matrices and by the collection of initial weight matrices. Following Gao et al. (2019), we assume the first layer and the last layer’s weights are fixed, and is updated via projected gradient descent with projection set . We have the following lemma upper bounding the input gradient norm.

For any given input and norm perturbation limit , if , for some sufficient small

, then with probability at least

, we have for any and Lipschitz loss , the input gradient norm satisfies

The proof of Lemma 4.2 can be found in the supplemental materials. Note that Lemma 4.2 holds for any , therefore, the maximum input gradient norm in the -ball is also in the order of . Lemma 4.2 suggests that the local Lipschitz constant is closely related to the neural network width . In particular, the local Lipschitz constant scales as the square root of the network width. This in theory explains why wider networks are more vulnerable to adversarial perturbation.

In order to further verify the above theoretical result, we empirically calculate the local Lipschitz constant. In detail, for commonly used norm threat model, we evaluate the quantity along the adversarial training trajectory for networks with different widths. Note that solving this maximization problem along the entire training trajectory is computationally expensive or even intractable. Therefore, we approximate this quantity by choosing the maximum input gradient -norm among the attack steps for each iteration. We plot this result in Figure 4 and we can see that larger network width indeed leads to larger local Lipschitz constant values. This backup the theoretical results in Lemma 4.2.

Robust Accuracy (%) Natural Accuracy (%) Perturbation Stability (%)
width-1 width-5 width-10 width-1 width-5 width-10 width-1 width-5 width-10
6 47.70 55.88 57.19 75.43 84.45 86.52 70.01 69.84 69.18
9 47.69 56.33 57.55 72.99 82.24 84.79 72.44 71.58 70.99
12 47.67 56.38 57.54 71.38 81.23 83.68 74.09 73.39 71.97
15 47.30 56.14 57.68 70.29 80.41 83.01 74.79 73.96 72.78
18 46.80 55.97 57.77 69.27 79.19 82.11 75.43 74.99 73.71
21 46.19 55.88 58.11 67.76 78.15 81.15 76.24 75.86 75.27
Table 1: The three metrics under PGD attack with different on CIFAR10 dataset using WideResNet-34 model.
Dataset Architecture
widen-factor/
growing-rate
regulari-
zation
PGD
()
PGD
()
C&W
()
PGD
()
CIFAR10 WideResNet-34 48.72 47.70 44.32 47.79
48.36 47.67 43.90 47.60
47.11 46.19 42.84 46.59
57.20 55.88 54.04 55.76
57.39 56.38 54.15 56.28
56.62 55.88 53.11 55.78
58.84 57.19 55.42 56.96
59.01 57.54 55.53 57.70
59.13 58.11 55.81 57.96
DenseNet-BC-40 45.40 44.49 41.33 44.60
44.29 43.52 40.11 43.70
43.40 42.08 38.81 43.01
54.42 53.11 49.83 52.61
53.38 52.86 50.12 52.72
52.19 51.03 48.18 51.60
56.17 54.91 52.30 55.07
56.22 55.09 52.63 55.31
55.12 54.56 51.87 54.46
CIFAR100 WideResNet-34 24.04 24.06 24.47 23.95
23.99 23.89 24.46 23.91
22.99 22.84 23.32 23.02
31.38 30.60 27.32 30.88
33.20 32.78 28.88 32.69
33.15 32.35 28.51 32.68
32.34 31.67 28.82 31.75
34.31 33.74 30.17 33.63
34.53 34.18 30.22 34.04
DenseNet-BC-40 22.52 22.24 18.38 22.12
22.41 22.10 17.96 22.11
22.05 21.78 17.46 21.80
29.10 28.63 24.27 28.57
29.61 29.28 24.54 29.26
29.21 28.89 24.41 28.76
31.61 31.18 26.99 31.11
32.79 32.31 27.30 32.26
32.80 32.36 27.40 32.29
Table 2: Robust accuracy () for different datasets, architectures and regularization parameters under various attacks.
Methods AutoAttack
TRADES (Zhang et al., 2019) 53.08
Ours(TRADES/WideResNet-34-10/) 54.73
Ours (RST/WideResNet-34-15/) 60.34
Early-Stop (Rice et al., 2020) 53.42
FAT (Zhang et al., 2020a) 53.51
HE (Pang et al., 2020) 53.74
Ours(TRADES/WideResNet-34-10/) 54.28
MART (Wang et al., 2020) 56.29
HYDRA (Sehwag et al., 2020) 57.14
RST (Carmon et al., 2019) 59.53
Ours (RST/WideResNet-34-15//) 59.78
Table 3: Robust accuracy () comparison on CIFAR10 under AutoAttack. represents models trained with the support of unlabeled data.

5 Experiments

From Section 4, we know that wider networks have worse perturbation stability. This suggests that to fully unleash the potential of wide model architectures, we need to instead improve the perturbation stability of wide models. One natural strategy to do this is by adopting a larger robust regularization parameter in (1). In this section, we conduct thorough experiments to see whether this strategy can mitigate the negative effects on perturbation stability and achieve better performances for wider networks.

5.1 Experimental settings

We conduct our experiments on CIFAR10 (Krizhevsky et al., 2009) dataset, which is the most popular dataset in the adversarial training literature. It contains images from different categories, with images for training and for testing. Note that standard adversarial training does not include the parameter. Here we conduct our experiments using TRADES (Zhang et al., 2019). Networks are chosen from WideResNet (Zagoruyko and Komodakis, 2016) with different widen factor from . The batch size is set to , and we train each model for epochs. The initial learning rate is set to be by default. We adopt a slightly different learning rate decay schedule: instead of dividing the learning rate by after -th epoch and -th epoch in Madry et al. (2018); Zhang et al. (2019); Wang et al. (2020), we halve the learning rate for every epoch after the -th epoch, for the purpose of obtaining better perturbation stability.

For evaluating the model robustness, we perform the standard white-box PGD attack (Madry et al., 2018) using steps with step size , and . Note that previous works (Zhang et al., 2019; Wang et al., 2020) report their results using -step PGD attack with step size , which we found is less effective and may not reveal the true robustness of the trained networks.

5.2 Model robustness with larger robust regularization parameter

We first compare the robustness performance of models with different network width using robust regularization parameter chosen from for TRADES (Zhang et al., 2019). Results of different evaluation metrics are presented in Table 1, including the robust accuracy, natural accuracy, and perturbation stability.

From Table 1, we can observe that the best robust accuracy for width- network is achieved when , yet for width- network, the best robust accuracy is achieved when , and for width- network, the best is . This suggests that wider networks indeed need a larger robust regularization parameter to fully unleash the power of wide model architecture. It is worth noting that enlarging indeed leads to improved perturbation stability. Under the same , wider networks have worse perturbation stability. This observation is rather consistent with our empirical and theoretical findings in Sections 3 and 4.

5.3 Experiments on different datasets and architectures

To show that our theory is universal and is applicable to various datasets and architectures, we conduct extra experiments on the CIFAR100 dataset and DenseNet model (Huang et al., 2017)

. Note that adversarial training is computationally very expensive, and it is not scalable to large datasets like ImageNet so far. For the DenseNet models, the growing rate

denotes how fast the number of channels grows and thus becomes a suitable measure of network width. Following the original paper (Huang et al., 2017), we choose DenseNet-BC with depth equals to and use models with different growing rates to verify our theory.

Experimental results are shown in Table 2. For completeness, we also report the results under different attack methods and settings, including the C&W (Carlini and Wagner, 2017) attack, the standard PGD attack with 20 steps of step size used in Madry et al. (2018); Zhang et al. (2019), and more time-consuming but powerful -steps PGD attack. We adopt the best regularization parameters from Table 1 and show the corresponding performance on models with different width. It can be seen that our strategy of using a larger robust regularization parameter works very well across different datasets and networks. On the WideResNet model, we observe clear patterns as in Section 5.2. On the DenseNet model, although the best regularization is different from that of WideResNet, wider models, in general, still require larger robust regularization for better robustness.

5.4 Comparison of robustness on wide models

Previous experiments in Section 5.2 and Section 5.3 have shown the effectiveness of our proposed strategy on using larger robust regularization parameter for wider models. In order to ensure that this strategy does not lead to any obfuscated gradient problem (Athalye et al., 2018) and gives a false sense of robustness, we further conduct experiments using stronger attacks. In particular, we choose to evaluate our best models on the newly proposed AutoAttack algorithm (Croce and Hein, 2020), which is an ensemble attack method that contains four different white-box and black-box attacks for the best attack performances. We evaluate our trained models in Section 5.3 under AutoAttack and report the robust accuracy in Table 3. Note that the results of other baselines are directly obtained from the AutoAttack’s public leaderboard111https://github.com/fra31/auto-attack.

From Table 3, we can see that our trained model with a larger robust regularization parameter significantly improves the baseline TRADES models (both our reproduced one and their official model on the AutoAttack leaderboard) on WideResNet. This experiment further verifies the effectiveness of our proposed strategy.

6 Conclusions

In this paper, we studied the relation between network width and adversarial robustness in adversarial training. We showed that the model robustness is closely related to both natural accuracy and perturbation stability. While the natural accuracy is better on wider models, the perturbation stability actually becomes worse, leading to a possible decrease in the overall model robustness. We also studied the origin of this problem by relating perturbation stability with local Lipschitznesss and leveraging recent studies on neural tangent kernel to prove that larger network width leads to worse perturbation stability. Our analyses suggest that practitioners should adopt a larger robust regularization parameter for training wider networks. Extensive experiments verified the effectiveness of this strategy.

Appendix A Proof of Lemma 4.2

[Restatement of Lemma 4.2] For any given input and norm perturbation limit , if , for some sufficient small , then with probability at least , we have for any and Lipschitz loss , the input gradient norm satisfies

Proof.

The major part of this proof is inspired from Gao et al. (2019). Let be a diagonal sign matrix. Then the neural network function can be rewritten as follows:

By the chain rule of the derivatives, the input gradient norm can be further written as

(4)

Now let us focus on the term . Note that by triangle inequality,

(5)

Note that is updated via projected gradient descent with projection set . Therefore, by Equation (12) in Lemma A.5 of Gao et al. (2019) we have

(6)

and by Lemma A.3 in Gao et al. (2019) we have

(7)

Combining (A), (A), (7), when , we have

(8)

By substituting (8) into (A) we have,

where the last inequality holds since due to the Lipschitz condition of loss . This concludes the proof. ∎

Appendix B The Experimental Detail for Reproducibility

All experiments are conducted on a single NVIDIA TITAN RTX that has a memory size of 24190MB. It runs on the GNU Linux Debian 4.9 operating system. The experiment is implemented via PyTorch 1.2.0. We adopt the public released codes of TRADES 

(Zhang et al., 2019) and adapt it for our own settings, including inspecting the loss value of robust regularization and the local Lipschitzness.

CIFAR100 contains 50k images for 100 classes, which means that it has much fewer images for each class compared with CIFAR10. This makes the learning problem of CIFAR100 much harder. For DenseNet architecture, we adopt the 40 layers model with the bottleneck design, which is the DenseNet-BC-40. It has three building blocks, with each one having the same number of layers. This is the same architecture tested in the original paper of DenseNet for CIFAR10. For simplicity reason, we make the training schedule stay the same with the one used for WideResNet, which is the decay learning rate schedule. As DenseNet gets deeper, its channel number (width) will be multiplied with the growing rate k. Thus, as k gets larger, the width of DenseNet also does. Although this mechanism slightly differs from the widen factor of WideResNet, which amplify all layers with the same ratio.

To demonstrate the fact that the over-fitting problem all comes from perturbation stability in Section 3.2(3), we use the training schedule of the original work for Figure 2. Aside from that, all the other experiments and plots are results under our proposed learning rate schedule, which halve the learning rate for every epochs after the 75-th epoch and can prevent over-fitting.

Appendix C Boosting generalized adversarial training

Note that the original adversarial training method (Madry et al., 2018) does not consider the balance of natural generalization and robust regularization. That is also the main reason we instead choose to use TRADES(Zhang et al., 2019) model to test the boosting robust regularization strategy in Section 5. Yet, in our stated generalized adversarial training framework (1), it is also possible to boost the robust regularization for original (generalized) adversarial training. In this section, we intend to verify that our experimental observations in Section 5 also applies to original (generalized) adversarial training.

Note that for TRADES, the KL-divergence term, by its definition, is guaranteed to be non-negative. For generalized adversarial training, the robust regularization term in (1) is also guaranteed to be non-negative because of the max operation in theory, however, when we approximate this maximization with projected gradient descent, it might fail to find a good . We conduct analytical experiments and find this exception has a very little chance to happen (but still could happen), most likely at the beginning of the training procedure. To avoid this problem, we manually set the robust regularization term in (1) to be non-negative by clipping the term. Let us denote as the empirical maximization solution, the final loss function becomes:

(9)

Table 4 shows the experimental results for boosting the robust regularization parameter for generalized adversarial training models. We can observe that the boosting strategy still works in this generalized adversarial training method, and the larger indeed leads to better robust accuracy in the final result.

width-1 width-5 width-10 width-1 width-5 width-10 width-1 width-5 width-10
1.00 48.12 51.96 51.59 78.08 85.64 86.86 65.78 63.58 62.36
1.25 49.53 53.14 51.87 73.23 84.26 85.97 71.85 65.88 63.19
1.50 48.88 54.44 52.87 71.77 84.14 85.70 72.08 67.09 64.37
1.75 47.83 54.74 54.08 70.30 83.28 85.11 72.28 68.24 66.14
2.00 47.70 53.24 54.75 69.29 82.23 84.23 73.55 67.19 67.58
Table 4: The robust accuracy, the natural accuracy and the smooth accuracy under PGD attack with different .

References

  • Alayrac et al. (2019) Alayrac, J., Uesato, J., Huang, P., Fawzi, A., Stanforth, R. and Kohli, P. (2019). Are labels required for improving adversarial robustness? In NeurIPS.
  • Allen-Zhu et al. (2019) Allen-Zhu, Z., Li, Y. and Song, Z. (2019). A convergence theory for deep learning via over-parameterization. In

    International Conference on Machine Learning

    . PMLR.
  • Athalye et al. (2018) Athalye, A., Carlini, N. and Wagner, D. A. (2018). Obfuscated gradients give a false sense of security: Circumventing defenses to adversarial examples. In ICML, vol. 80 of Proceedings of Machine Learning Research. PMLR.
  • Bubeck et al. (2019) Bubeck, S., Lee, Y. T., Price, E. and Razenshteyn, I. P. (2019). Adversarial examples from computational constraints. In ICML, vol. 97 of Proceedings of Machine Learning Research. PMLR.
  • Cao and Gu (2019) Cao, Y. and Gu, Q. (2019).

    Generalization bounds of stochastic gradient descent for wide and deep neural networks.

    In Advances in Neural Information Processing Systems.
  • Carlini and Wagner (2017) Carlini, N. and Wagner, D. A. (2017). Towards evaluating the robustness of neural networks. In SP. IEEE Computer Society.
  • Carmon et al. (2019) Carmon, Y., Raghunathan, A., Schmidt, L., Duchi, J. C. and Liang, P. (2019). Unlabeled data improves adversarial robustness. In NeurIPS.
  • Chen and Gu (2020) Chen, J. and Gu, Q. (2020). Rays: A ray searching method for hard-label adversarial attack. In Proceedings of the 26rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.
  • Chen et al. (2020) Chen, J., Zhou, D., Yi, J. and Gu, Q. (2020). A frank-wolfe framework for efficient and effective adversarial attacks. In AAAI.
  • Chen et al. (2017) Chen, P., Zhang, H., Sharma, Y., Yi, J. and Hsieh, C. (2017). ZOO: zeroth order optimization based black-box attacks to deep neural networks without training substitute models. In AISec@CCS. ACM.
  • Cissé et al. (2017) Cissé, M., Bojanowski, P., Grave, E., Dauphin, Y. N. and Usunier, N. (2017). Parseval networks: Improving robustness to adversarial examples. In ICML, vol. 70 of Proceedings of Machine Learning Research. PMLR.
  • Cohen et al. (2019) Cohen, J. M., Rosenfeld, E. and Kolter, J. Z. (2019). Certified adversarial robustness via randomized smoothing. In ICML, vol. 97 of Proceedings of Machine Learning Research. PMLR.
  • Croce and Hein (2020) Croce, F. and Hein, M. (2020). Reliable evaluation of adversarial robustness with an ensemble of diverse parameter-free attacks. In ICML.
  • Dhillon et al. (2018) Dhillon, G. S., Azizzadenesheli, K., Lipton, Z. C., Bernstein, J., Kossaifi, J., Khanna, A. and Anandkumar, A. (2018). Stochastic activation pruning for robust adversarial defense. ICLR .
  • Fawzi et al. (2018) Fawzi, A., Fawzi, O. and Frossard, P. (2018). Analysis of classifiers’ robustness to adversarial perturbations. Mach. Learn. 107 481–508.
  • Gao et al. (2019) Gao, R., Cai, T., Li, H., Hsieh, C.-J., Wang, L. and Lee, J. D. (2019). Convergence of adversarial training in overparametrized neural networks. In Advances in Neural Information Processing Systems.
  • Gilmer et al. (2018) Gilmer, J., Metz, L., Faghri, F., Schoenholz, S. S., Raghu, M., Wattenberg, M. and Goodfellow, I. (2018). Adversarial spheres. arXiv preprint arXiv:1801.02774 .
  • Goodfellow et al. (2015) Goodfellow, I. J., Shlens, J. and Szegedy, C. (2015). Explaining and harnessing adversarial examples. In ICLR (Y. Bengio and Y. LeCun, eds.).
  • Gu and Rigazio (2015) Gu, S. and Rigazio, L. (2015). Towards deep neural network architectures robust to adversarial examples. In ICLR (Y. Bengio and Y. LeCun, eds.).
  • Guo et al. (2018) Guo, C., Rana, M., Cisse, M. and Van Der Maaten, L. (2018). Countering adversarial images using input transformations. ICLR .
  • He et al. (2016) He, K., Zhang, X., Ren, S. and Sun, J. (2016). Identity mappings in deep residual networks. In ECCV.
  • Hein and Andriushchenko (2017) Hein, M. and Andriushchenko, M. (2017). Formal guarantees on the robustness of a classifier against adversarial manipulation. In Advances in Neural Information Processing Systems.
  • Hendrycks et al. (2019) Hendrycks, D., Lee, K. and Mazeika, M. (2019). Using pre-training can improve model robustness and uncertainty. In ICML, vol. 97 of Proceedings of Machine Learning Research. PMLR.
  • Huang et al. (2017) Huang, G., Liu, Z., van der Maaten, L. and Weinberger, K. Q. (2017). Densely connected convolutional networks. In

    2017 IEEE Conference on Computer Vision and Pattern Recognition

    .
  • Ilyas et al. (2018) Ilyas, A., Engstrom, L., Athalye, A., Lin, J., Athalye, A., Engstrom, L., Ilyas, A. and Kwok, K. (2018). Black-box adversarial attacks with limited queries and information. In Proceedings of the 35th International Conference on Machine Learning.
  • Ilyas et al. (2019a) Ilyas, A., Engstrom, L. and Madry, A. (2019a). Prior convictions: Black-box adversarial attacks with bandits and priors. ICLR .
  • Ilyas et al. (2019b) Ilyas, A., Santurkar, S., Tsipras, D., Engstrom, L., Tran, B. and Madry, A. (2019b). Adversarial examples are not bugs, they are features. In NeurIPS.
  • Jacobsen et al. (2019) Jacobsen, J., Behrmann, J., Zemel, R. S. and Bethge, M. (2019). Excessive invariance causes adversarial vulnerability. In ICLR. OpenReview.net.
  • Jacot et al. (2018) Jacot, A., Hongler, C. and Gabriel, F. (2018). Neural tangent kernel: Convergence and generalization in neural networks. In NeurIPS.
  • Krizhevsky et al. (2009) Krizhevsky, A., Hinton, G. et al. (2009). Learning multiple layers of features from tiny images .
  • Kurakin et al. (2017) Kurakin, A., Goodfellow, I. J. and Bengio, S. (2017). Adversarial machine learning at scale. In ICLR. OpenReview.net.
  • Lécuyer et al. (2019) Lécuyer, M., Atlidakis, V., Geambasu, R., Hsu, D. and Jana, S. (2019). Certified robustness to adversarial examples with differential privacy. In SP. IEEE.
  • Liao et al. (2018) Liao, F., Liang, M., Dong, Y., Pang, T., Hu, X. and Zhu, J. (2018). Defense against adversarial attacks using high-level representation guided denoiser. In CVPR.
  • Madry et al. (2018) Madry, A., Makelov, A., Schmidt, L., Tsipras, D. and Vladu, A. (2018). Towards deep learning models resistant to adversarial attacks. In ICLR. OpenReview.net.
  • Meng and Chen (2017) Meng, D. and Chen, H. (2017). Magnet: a two-pronged defense against adversarial examples. In Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security.
  • Moosavi-Dezfooli et al. (2016) Moosavi-Dezfooli, S., Fawzi, A. and Frossard, P. (2016). Deepfool: A simple and accurate method to fool deep neural networks. In CVPR. IEEE Computer Society.
  • Najafi et al. (2019) Najafi, A., Maeda, S., Koyama, M. and Miyato, T. (2019). Robustness to adversarial perturbations in learning from incomplete data. In NeurIPS.
  • Nakkiran (2019) Nakkiran, P. (2019). Adversarial robustness may be at odds with simplicity. CoRR abs/1901.00532.
  • Pang et al. (2020) Pang, T., Yang, X., Dong, Y., Xu, K., Su, H. and Zhu, J. (2020). Boosting adversarial training with hypersphere embedding. CoRR abs/2002.08619.
  • Papernot et al. (2016a) Papernot, N., McDaniel, P. D., Jha, S., Fredrikson, M., Celik, Z. B. and Swami, A. (2016a). The limitations of deep learning in adversarial settings. In EuroS&P. IEEE.
  • Papernot et al. (2016b) Papernot, N., McDaniel, P. D., Wu, X., Jha, S. and Swami, A. (2016b). Distillation as a defense to adversarial perturbations against deep neural networks. In SP. IEEE Computer Society.
  • Paulavičius and Žilinskas (2006) Paulavičius, R. and Žilinskas, J. (2006). Analysis of different norms and corresponding lipschitz constants for global optimization. Technological and Economic Development of Economy 12 301–306.
  • Qin et al. (2019) Qin, C., Martens, J., Gowal, S., Krishnan, D., Dvijotham, K., Fawzi, A., De, S., Stanforth, R. and Kohli, P. (2019). Adversarial robustness through local linearization. In NeurIPS.
  • Raghunathan et al. (2019) Raghunathan, A., Xie, S. M., Yang, F., Duchi, J. C. and Liang, P. (2019). Adversarial training can hurt generalization. CoRR abs/1906.06032.
  • Rice et al. (2020) Rice, L., Wong, E. and Kolter, J. Z. (2020). Overfitting in adversarially robust deep learning. CoRR abs/2002.11569.
  • Ross and Doshi-Velez (2018) Ross, A. S. and Doshi-Velez, F. (2018). Improving the adversarial robustness and interpretability of deep neural networks by regularizing their input gradients. In AAAI.
  • Salman et al. (2019) Salman, H., Li, J., Razenshteyn, I. P., Zhang, P., Zhang, H., Bubeck, S. and Yang, G. (2019). Provably robust deep learning via adversarially trained smoothed classifiers. In NeurIPS.
  • Samangouei et al. (2018) Samangouei, P., Kabkab, M. and Chellappa, R. (2018). Defense-gan: Protecting classifiers against adversarial attacks using generative models. ICLR .
  • Schmidt et al. (2018) Schmidt, L., Santurkar, S., Tsipras, D., Talwar, K. and Madry, A. (2018). Adversarially robust generalization requires more data. In NeurIPS.
  • Sehwag et al. (2020) Sehwag, V., Wang, S., Mittal, P. and Jana, S. (2020). On pruning adversarially robust neural networks. CoRR abs/2002.10509.
  • Song et al. (2018) Song, Y., Kim, T., Nowozin, S., Ermon, S. and Kushman, N. (2018). Pixeldefend: Leveraging generative models to understand and defend against adversarial examples. ICLR .
  • Szegedy et al. (2014) Szegedy, C., Zaremba, W., Sutskever, I., Bruna, J., Erhan, D., Goodfellow, I. J. and Fergus, R. (2014). Intriguing properties of neural networks. In ICLR (Y. Bengio and Y. LeCun, eds.).
  • Tramèr et al. (2017) Tramèr, F., Papernot, N., Goodfellow, I. J., Boneh, D. and McDaniel, P. D. (2017). The space of transferable adversarial examples. CoRR abs/1704.03453.
  • Tsipras et al. (2019) Tsipras, D., Santurkar, S., Engstrom, L., Turner, A. and Madry, A. (2019).

    Robustness may be at odds with accuracy.

    In ICLR. OpenReview.net.
  • Wang et al. (2019) Wang, Y., Ma, X., Bailey, J., Yi, J., Zhou, B. and Gu, Q. (2019). On the convergence and robustness of adversarial training. In ICML.
  • Wang et al. (2020) Wang, Y., Zou, D., Yi, J., Bailey, J., Ma, X. and Gu, Q. (2020). Improving adversarial robustness requires revisiting misclassified examples. In ICLR. OpenReview.net.
  • Weng et al. (2018) Weng, T.-W., Zhang, H., Chen, P.-Y., Yi, J., Su, D., Gao, Y., Hsieh, C.-J. and Daniel, L. (2018). Evaluating the robustness of neural networks: An extreme value theory approach. In International Conference on Learning Representations.
  • Wong et al. (2020) Wong, E., Rice, L. and Kolter, J. Z. (2020). Fast is better than free: Revisiting adversarial training. CoRR abs/2001.03994.
  • Xie et al. (2018) Xie, C., Wang, J., Zhang, Z., Ren, Z. and Yuille, A. (2018). Mitigating adversarial effects through randomization. ICLR .
  • Zagoruyko and Komodakis (2016) Zagoruyko, S. and Komodakis, N. (2016). Wide residual networks. In BMVC. BMVA Press.
  • Zhang et al. (2019) Zhang, H., Yu, Y., Jiao, J., Xing, E. P., Ghaoui, L. E. and Jordan, M. I. (2019). Theoretically principled trade-off between robustness and accuracy. In ICML, vol. 97 of Proceedings of Machine Learning Research. PMLR.
  • Zhang et al. (2020a) Zhang, J., Xu, X., Han, B., Niu, G., Cui, L., Sugiyama, M. and Kankanhalli, M. S. (2020a). Attacks which do not kill training make adversarial learning stronger. CoRR abs/2002.11242.
  • Zhang et al. (2020b) Zhang, X., Chen, J., Gu, Q. and Evans, D. (2020b). Understanding the intrinsic robustness of image distributions using conditional generative models. In

    Proceedings of the 23rd International Conference on Artificial Intelligence and Statistics

    .
  • Zou et al. (2020) Zou, D., Cao, Y., Zhou, D. and Gu, Q. (2020). Gradient descent optimizes over-parameterized deep relu networks. Machine Learning 109 467–492.