Towards Noise-Robust Neural Networks via Progressive Adversarial Training

09/11/2019 ∙ by Hang Yu, et al. ∙ Beihang University 0

Adversarial examples, intentionally designed inputs tending to mislead deep neural networks, have attracted great attention in the past few years. Although a series of defense strategies have been developed and achieved encouraging model robustness, most of them are still vulnerable to the more commonly witnessed corruptions, e.g., Gaussian noise, blur, etc., in the real world. In this paper, we theoretically and empirically discover the fact that there exists an inherent connection between adversarial robustness and corruption robustness. Based on the fundamental discovery, this paper further proposes a more powerful training method named Progressive Adversarial Training (PAT) that adds diversified adversarial noises progressively during training, and thus obtains robust model against both adversarial examples and corruptions through higher training data complexity. Meanwhile, we also theoretically find that PAT can promise better generalization ability. Experimental evaluation on MNIST, CIFAR-10 and SVHN show that PAT is able to enhance the robustness and generalization of the state-of-the-art network structures, performing comprehensively well compared to various augmentation methods. Moreover, we also propose Mixed Test to evaluate model generalization ability more fairly.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 7

page 12

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

Introduction

State-of-the-art deep learning models have shown significant successes in many tasks, including computer vision

[11]

, natural language processing

[2] and speech [9]. Their performances are usually promised by training using sufficient clean data. However, in real-world environment, it is usually impractical to acquire entirely clean data without any noises, such as adversarial examples and corruptions [8], which have been proven to pose potential threats to deep learning systems [16, 12], especially those deployments in safety and security-critical environments. Therefore, it is crucial to well understand the noise robustness of deep neural networks.

In the past few years, great efforts have been devoted to exploring model robustness to the adversarial noises (or adversarial examples), maliciously constructed imperceptible perturbations that fool deep learning models, from the views of attack [6, 1] and defense [28, 13, 17]. Most of the existing defense methods attempt to build adversarially robust models via supplying adversaries with non-computable gradients. Though they successfully make the deep models predict stably when encountering the adversarial examples, they were still easily defeated by circumventing obfuscated gradients [1]. Recently, adversarial training, appeared as a strong defense algorithm by adversarially data augmentation, has shown the strong capability of offering robust models against adversarial examples.

Besides the progress in robustness to adversarial examples, recent studies have also paid attention to improving model robustness to another common noises named corruptions [31, 24]. In the real-world deep learning systems, corruptions like Gaussian blur and snow are more likely to witness than adversarial examples. dodge2017study found that deep learning models behave distinctly subhuman to input images with Gaussian noises. Likewise, deep learning models show weak performance on various corruptions including blur, pixelation and other types of noises [8].

Though the existence of noises especially in real-world environment has drawn intense concerns about the robustness of deep models, the literature mainly focused on either corruptions [31, 24] or adversarial noises [13, 17]

. A very few studies have investigated the problem from the view of generalized noise robustness. For example, fawzi2016robustness studied the robustness of classifiers from adversarial examples to random noises and tried to build a robust model from the view of curvature constraints. fawzi2018adversarial drew the relationship between in-distribution robustness and unconstrained robustness. More recently, ford2019adversarial found that adversarial robustness is closely related to robustness under certain kinds of distributional shifts, i.e., additive Gaussian noise. Despite the promising progress, there still remain open questions that need to be answered for deep understanding model robustness:

Is there any relationship between the different noises such as adversarial noises and corruptions? Could we build strong models against them?

To answer these questions, this paper first dedicates theoretical and empirical analyses to demonstrate that there exists an inherent correlation between adversarial robustness and corruption robustness, which potentially explains why existing adversarial defensive methods may somewhat improve model robustness against corruptions. Based on our theoretical finding, we further devise a powerful training method named Progressive Adversarial Training (PAT) to improve both adversarial and corruption robustness. Different from the conventional methods that achieve model robustness by employing more training data [20, 23], in PAT the diversified adversarial noises are aggregated, augmented and injected progressively, which are proven to be beneficial on improving both the adversarial and corruption robustness. We also theoretically prove that PAT can also promise models with strong generalization ability. Extensive experiments on MNIST, CIFAR-10 and SVHN indicate that PAT shows comprehensively excellent results on both adversarial and corruption noise compared to various augmentation methods. Moreover, in order to conduct unbiased evaluation, we also propose Mixed Test to evaluate model generalization ability more fairly.

Related Work

Adversarial robustness. In order to improve model robustness against adversarial examples, many adversarial training based methods have been proposed and proved to be comparatively competitive through experiments among others. goodfellow2014explaining firstly tried to improve model robustness against adversarial attacks through adversarial training which intuitively adds a regularization term and directly feeds adversarial examples during training. madry2017towards adversarially trained moderately the most robust model with the PGD augmented adversarial examples, whereas it consumes too much time and fails to generalize well on clean examples. Meanwhile, sinha2017certifiable proposed an adversarial training algorithm with a surrogate loss which proved by theoretical analysis, but it is only confined to the small dataset, e.g., MNIST. Besides, adversarial noises are also calculated and added to each hidden layer during training in order to tackle the overfitting problem in [19]. However, adversarial training still fails to handle more aggressive iterative attack and remains questionable to corruption robustness.

Corruption robustness.

When it goes to the robust model towards corruptions, rare defense methods have been proposed. To improve model robustness against JPEG compression, zheng2016improving proposed stability training to stabilize model behavior against small input distortions. By investigating feature distribution shift within convolutional neural networks, sun2018feature employed feature quantization techniques to deal with distorted images including motion blur, salt and pepper. More recently, metz2019using drew insights from meta-learning which uses a learned optimizer to build a robust model against input noises, e.g., translations.

Noise Robustness

In this section, we introduce the formal definition of noise robustness, taking the widely studied and used convolutional neural networks in image classification as the basic deep models.

Basic Deep Models

Given a training set and a test set

with a feature vector

and a label

, the deep supervised learning model tries to learn a mapping or classification function

: or as the mapping from input examples to output labels, where is the dimension of input examples, is the corresponding labels of output. More precisely, represents the prediction results of input after the model is trained on training set . We use the log-loss in the image classification problem as follows:

where denotes the value of position in prediction result. For the single-label classification problem, let 1, then the above loss can be expressed as below:

Model Robustness

Now we introduce the definitions and formulations of the model robustness. Assuming that the test set can be divided into disjoint subsets, then the number of these partitions is denoted as the covering number of the set .

-cover. Given a specified metric , for subsets and of , set is called an - cover of set if satisfies: for any , there exists such that .

-covering number. Refer to the concept in [27], the - covering number of set is defined as , which indicates the smallest covering number among all of the - covers of set .

Let the partition and respectively denotes the set of data samples that objectively should be classified and the set of data samples that actually is classified in -th class. Therefore, the points of -th class in sample set can be regarded as the subset of their intersection, i.e., . More precisely, we use and to represent and , respectively. Thus, can be regarded as the set of samples that are supposed to be classified as -th class but actually not, which is close to the error set of -th class.

Adversarial robustness. The definition of adversarial robustness set can be written as:

(1)

Corruption robustness. We choose Gaussian noise as the representation in the analysis of corruptions and we have

as the integrals of Gaussian probability density function. Then,

- extension of set can be defined as , which denotes that the original set is “extended” by samples within a distance constraint. Then, the corruption robustness set is shown as below:

(2)

Therefore, denotes adversarial vulnerability and on the contrary represents adversarial robustness. Likewise, and stand for corruption vulnerability and robustness. More details can be found in Theorem 1 and the supplementary material.

Theoretical Connections Between Adversarial and Corruption Robustness

From above, the -extension of sets are defined as and , where denotes a set of points that should be and actually be classified as -th class, and denotes a set of points that should be but actually not be classified as -th class. In the following, we use to represent the surrounding area of band with respect to -extension of . More illustration can be found in the supplementary material. Then we have the following theorem holds:

Theorem 1

For class in single-label classification problem, we define learning function . Then we have

where is fixed and is increased by augmentation.

represents the cumulative distribution function of normal distribution

as shown below:

For probability , the right hand side of inequality is a monotonically increasing function of .

For the proof and explanation, please refer to the supplementary material. According to this theorem, we draw the observation that if we enhance the adversarial robustness by reducing , the intersection of and will increases with staying the same, which results in the rise of , leading to the smaller left-side inequality, i.e. the tighter upper bound of right-side inequality. Thus, when is in this proved interval, the stronger corruption robustness can be provable as well. Thus, we draw a key conclusion that there exists a positive correlation between adversarial and corruption robustness under some constraints.

Based on the theoretical finding, we can further explain why the existing solutions can hardly obtain the generalized noise robustness in practice, which usually focus on either adversarial examples or the corruptions. In the literature, in order to improve corruption robustness, it is an intuitive idea to add random Gaussian noise through the training process as Gaussian data augmentation (GDA) [5]. Though proved to be effective against corruptions [5], it has little contribution to the actual adversarial robustness, since improving corruption robustness directly increases the lower bound on the left side of the inequality due to the monotonically increasing function of the right side of the inequality which may somewhat reduce adversarial robustness on the contrary. Meanwhile, most adversarial defense methods, e.g., PGD-based adversarial training (PGD-AT) [17], believes to improve adversarial robustness by searching the worst-case adversarial noises with much time consumption which indeed promises good adversarial robustness. However, due to the limited amount of adversarial noises model witnessed, the reduction of will be in a restricted range on the left side of the inequality, which in turn reflects moderate robustness to corruptions on the right side of the inequality.

Progressive Adversarial Training

Inspired by the theoretical connections between adversarial and corruption robustness, we propose our progressive adversarial training (PAT) strategy that adds adversarial noises with huge complexity during training to shrink significantly, which leads to the larger increase of with the fixed resulting in the decrease of . Thus, both adversarial and corruption robustness are guaranteed.

Formulation

sinha2017certifiable firstly proposed surrogate loss, which adds the Lagrangian constraint on the original loss function in order to approximate the worst-case perturbation:

where refers to the robust objective distribution, denotes original data distribution of training set, and represents the distance between distribution and . The process indirectly constrains by optimizing the surrogate loss, meanwhile the augmented distribution approximates the robust distribution .

Analogously, let denotes the metric, we consider the surrogate loss with constraint as:

(3)

where represents the worst-case perturbation which approximates the objective robust distribution on current distribution . In this sense, we could update the parameter in the direction of robustness with the optimization of surrogate loss. Our Progressive Adversarial Training method is developed to minimize robust surrogate loss.

Prior studies [20, 23] have shown that the sample complexity plays a critical role in training a robust deep model. schmidt2018adversarially concluded that the sample complexity of robust learning can be significantly larger than that of standard learning under adversarial robustness situation. charles2019convergence believed that adversarial training may need exponentially more iterations to obtain large margins compared to standard training. Moreover, based on a risk decomposition theorem, [30] theoretically and empirically showed that with more unlabeled data, we can learn a model with better adversarially robust generalization. From the view of data complexity, we introduce Progressive Adversarial Training which injects adversarial noises progressively during several iterations within each training step, and thus we increase data complexity and diversity leading to robust models.

Specifically, given a training data , we assume as the local optimal perturbation obtained through the gradient ascent. To get the local optimal perturbation , we calculate the perturbation gradient with respect to as:

(4)

Since data complexity contributes significantly to adversarially robust model, progressive iteration process has been proposed to introduce diversified adversarial examples through adversarial ascent steps. Thus, the perturbation can be updated in every step during the progressive process as:

(5)

where is the step size and it is normalized by the number of steps , so that the overall magnitude of update equals . here controls perturbation decay with the increase of progressive iterations as well as the contribution of the former perturbation .

During each perturbation computation step via gradient ascent, another batch of adversarial examples is obtained as , thus another one-step gradient descent is executed to promote the model parameters update. Obviously, =. With the surrogate loss optimizing, we have

(6)

where =, according to equation (5) and (6). In this way, progressive adversarial training can be regarded as the optimization of robust surrogate loss in an iterative way.

(a) PGD-AT and GDA
(b) PAT
Figure 1: The effect of data augmentation on classification, where the dashed circles denote the partition of the empirical risk minimization. (a) The solid circles denote the partitions of PGD-AT and GDA. (b) The solid circle represents the partition of our proposed PAT.
1:Training set , Hyper-parameters , ,
2:Updated parameters
3:for  in training examples do
4:     for  in iterative steps do
5:         
6:         
7:         Updates parameter with augmented
8:     end for
9:end for
Algorithm 1 Progressive Adversarial Training

In the case of PGD-AT, it attempts to update the model parameters with the approximated worst-case perturbations computed and added within current data for one step only. Meanwhile, GDA-based approach tries to improve model robustness by adding the average-case perturbations. Different from them, our method tends to augment current data and update model parameters for steps, which significantly brings more data with higher complexity and diversity. Furthermore, these perturbations are much more intended and flexible when they are calculated in each step during training. Therefore, the models trained by PAT are expected to be robust against more types of noises, including both adversarial examples and corruptions (see the proof in Theorem 1).

Intuitively, as illustrated in Figure 1 (a), the data partition for the -th class becomes moderately narrow after traditional data augmentation (GDA and PGD-AT), which indicates that the partition space is “stretched” to cover more adversarial examples or Gaussian noises in some specific directions [21]. However, it directly declines the model generalization ability to benign examples as partition fails to cover some portions of benign instances that are used to be classified correctly by empirical risk minimization (ERM). On the contrary, after PAT, as shown in Figure 1 (b), the capacity of data partition becomes larger when more data with higher diversities is witnessed by the model during training. Thus, PAT promises a model with excellent generalization ability on benign examples as well as polluted ones.

Generalization and Robustness

Now, we further explain and theoretically prove the robustness and the upper bound on the expected generalization error for our algorithm. For the loss function , obviously there exist upper bounds and , such that and .

Assumption 1

For the test set and metric , assuming that the loss function is smooth. Then there exist and , such that , .

Lemma 1

([[29],Theorem 14]). For the set and metric , if satisfies -Lipschitzian, then for augmented training set and , is -robust. That is to say, can be partitioned into disjoint sets , such that ,

Theorem 2

If loss function satisfies -Lipschitzian and satisfies -Lipschitzian, then for existing upper bound and , the surrogate loss satisfies -Lipschitzian. For augmented training set and , is -robust, where , .

The robust property in Lemma 1 reveals that, for the number of partition and a sample , if falls into the same partition as , then the deviation between their losses could be controlled by . Here we obtain the -robust by Lipschitzian property in Lemma 1. If increases, the covering number will decrease and will increase for a fixed Lipschitz constant, which promises the balance of the robust property. In Theorem 2 we derive the robust property of learning function for surrogate loss. The proof is shown in the supplementary material.

Theorem 3

Given augmented training set and satisfying -robust, then for any probability , there is at least probability such that surrogate loss satisfies

where , denotes the volume of .

In Theorem 3, we analyze the generalization error upper bound for the surrogate loss of our proposed PAT. The proof is in the supplementary material. The difference between losses could be constrained in with similar input examples. There is a balance between and . If increases, then the former decreases and increases at the same time, which promises a reasonable generalization bound with the robust property. Thus, according to the theoretical analysis above, we can draw the conclusion that PAT builds robust model with proved upper bound on the generalization error.

Experiment and Evaluation

In this section, we will evaluate our Progressive Adversarial Training with other compared methods on the image classification task against both adversarial and corrupted examples.

We adopt the popular MNIST [15], CIFAR-10 [10] and SVHN [18] as the evaluation datasets in our experiments. As for the deep models, we also choose the widely-used models, such as LeNet-5 [14] with simple architecture for MNIST, VGG-16 [22] for CIFAR-10 and standard ResNet-18 [7] for SVHN, respectively. For the compared methods, different data augmentation methods are employed including PGD-AT and GDA.

Evaluation Protocol

We first evaluate the model robustness with top-1 classification accuracy respectively on benign, adversarial and corrupted datasets, following the guidelines from [3]. Meanwhile, we also use our proposed criterion Mixed Test to evaluate the generalization ability of model, where each test set is mixed with benign, adversarial and corrupted examples in the same proportion.

Adversarial examples. Following [13, 1, 3], we construct adversarial examples with the strongest attacks in different norms from both gradient-based attack and optimization-based attack including: Projected Gradient Descent (PGD) [17] attack and C&W attack [4]. For CIFAR-10 and SVHN, we set the -perturbation size of PGD as 2/255 and 4/255, and set the -perturbation size of C&W as 0.2 and 0.5, which is similar to [26].

Corruptions. Following [8], we constructed corrupted datasets of CIFAR-10 and SVHN consisting of various types corruption, e.g., noise, blur,, weather, digital, etc. More precisely, we divide all 15 corruptions into 3 main types as noise, blur and other, with 6 severity levels. Due to the simplicity of MNIST, we do not consider the corruptions on this dataset.

Experiment settings. For CIFAR-10 we set the volume of the sample set as 10000, and for SVHN we set the volume as 26032, which are the same as the official dataset. We use SGD with momentum as an optimizer for ERM baseline, the compared methods and our method. The training details of the compared methods and PAT are reported in the supplementary material.

Corruption and Adversarial Robustness

To comprehensively evaluate model robustness, we conduct the experiment on clean, adversarial and corrupted datasets, respectively. and we compare our PAT (iteration step =3) with ERM (Naive), PGD-AT and GDA.

(a) VGG16 on CIFAR-10
(b) ResNet18 on SVHN
Figure 2: Model robustness evaluation on adversarial and corrupted examples of CIFAR-10 and SVHN. We evaluate corruption robustness on the clean and corrupted datasets, and adversarial robustness under PGD and C&W attack. The parameter setting and the experimental results of MNIST are reported in the supplementary material.
VGG16 () Cln+(=2)+Blur Cln+(=4)+Blur Cln+(=2)+Noise Cln+(=4)+Noise Cln+(=2)+Other Cln+(=4)+Other Cln+(=2)+All Cln+(=4)+All
Naive 69.04% 60.53% 65.78% 57.27% 72.20% 63.69% 69.01% 60.50%
PGD-AT 80.75% 75.40% 81.06% 75.71% 81.04% 75.69% 80.95% 75.60%
GDA 79.45% 71.00% 81.40% 72.95% 81.32% 72.87% 80.72% 72.27%
PAT 84.80% 79.75% 85.48% 80.43% 85.53% 80.48% 85.27% 80.22%
VGG16 () Cln+(c=0.2)+Blur Cln+(c=0.5)+Blur Cln(c=0.2)+Noise Cln+(c=0.5)+Noise Cln+(c=0.2)+Other Cln+(c=0.5)+Other Cln+(c=0.2)+All Cln+(c=0.5)+All
Naive 60.86% 60.84% 57.60% 57.58% 64.02% 64.00% 60.83% 60.81%
PGD-AT 69.55% 67.40% 69.86% 67.72% 69.84% 67.69% 69.75% 67.60%
GDA 62.48% 61.03% 64.44% 62.99% 64.35% 62.90% 63.76% 62.31%
PAT 74.09% 71.55% 74.77% 72.22% 74.82% 72.27% 74.56% 72.01%
ResNet18 () Cln+(=2)+Blur Cln+(=4)+Blur Cln+(=2)+Noise Cln+(=4)+Noise Cln+(=2)+Other Cln+(=4)+Other Cln+(=2)+All Cln+(=4)+All
Naive 86.42% 77.53% 83.84% 74.94% 86.50% 77.61% 85.59% 76.69%
PGD-AT 89.51% 82.28% 89.17% 81.94% 89.37% 82.15% 89.35% 82.12%
GDA 89.49% 82.08% 89.05% 81.65% 88.77% 81.36% 89.10% 81.70%
PAT 92.62% 87.68% 92.20% 87.26% 91.84% 86.89% 92.22% 87.28%
ResNet18 () Cln+(c=0.2)+Blur Cln+(c=0.5)+Blur Cln+(c=0.2)+Noise Cln+(c=0.5)+Noise Cln+(c=0.2)+Other Cln+(c=0.5)+Other Cln+(c=0.2)+All Cln+(c=0.5)+All
Naive 65.62% 65.99% 63.03% 63.40% 65.70% 66.07% 64.78% 65.15%
PGD-AT 71.58% 67.33% 71.24% 66.99% 71.44% 67.19% 71.42% 67.17%
GDA 72.14% 67.58% 71.70% 67.14% 71.42% 66.86% 71.75% 67.19%
PAT 83.28% 70.76% 82.86% 70.34% 82.49% 69.97% 82.88% 70.36%
Table 1: Experiment results using Mixed Test on CIFAR-10 and SVHN with VGG-16 and ResNet18, respectively. We combine benign, adversarial and corrupted examples into the test set in the same proportion.

Figure 2 shows the accuracy performance of different methods, from which we can get the following observations: (1) It is obvious that GDA achieves good performance for all types of corruptions, which meets part of the assumption in Theorem 1. However, the for GDA could be less than PAT due to the lack of adversarial robust augmentation, and thus leads to weak adversarial robustness. (2) PGD-AT performs poorly on corruptions and clean examples, which might derive from the decrease of in Theorem 1, leading to the looser upper bound and weaker corruption robustness compared to PAT. (3) For all cases, models trained by PAT consistently achieve the most robust performance under both adversarial examples and corruptions, which means that PAT brings more data with higher complexity. In this way, shrinks significantly and increases, which leads to a tighter upper bound and the stronger adversarial robustness as well as corruption robustness.

Generalization Evaluation with Mixed Test

xu2012robustness firstly explored a deep connection between model robustness and generalization. They proved that a weak notion of robustness is both sufficient and necessary for generalization ability. Intuitively, if the test data is similar to training data, then the corresponding test error should also be close to empirical training error as well. Previous adversarial related studies [1, 28] usually evaluate the accuracy of model on individual type of examples, i.e., test set only contains adversarial examples or clean examples, and observe the gap between test and training error to obtain generalization ability. However, for models trained on dataset augmented with noises, e.g., adversarial examples, it is prejudiced and biased to evaluate model generalization on any specified type of test data, which turns out to be very different from the training set. Therefore, we believe that it is reasonable to evaluate model generalization and robustness with mixed types of samples under the noise situation, especially for those models trained with data augmentation containing noises. Thus, we propose Mixed Test to fairly evaluate model generalization ability, which combines clean, adversarial and corrupted examples in an equal proportion.

Figure 3: The corruption and adversarial robustness evaluation of Progressive Adversarial Training on CIFAR-10, with the iteration step ranging from 1 to 6.

The experimental results using Mixed Test on CIFAR-10 and SVHN are shown in Table 1. In order to conduct fair experiment, we keep all adversarial attack methods and corruption types the same as the section above. Obviously, PAT outperforms other data augmentation methods, i.e., PGD-AT and GDA, with big margins, showing the strongest generalization ability. As discussed before in Figure 1, since PGD-AT and GDA could increase model robustness against specific noise, e.g., obtaining higher adversarial accuracy, they fail to cover some portions of benign examples that are used to be classified by ERM, leading to weaker generalization ability. Also, it is interesting to see that the accuracy of GDA and ERM (Naive) drops more drastically compared to PGD-AT and PAT, when adversarial perturbation (e.g. or ) increases. This phenomenon further demonstrates the strong adversarial defense ability of PGD-AT and PAT.

Effects of Progressive Iteration

Our PAT relies on multiple progressive iterations to largely improve the noise diversity injected into deep models. To further investigate the effect of progressive iteration on model robustness and generalization, we conduct an experiment on PAT with different iterative step . In fact, the iteration step in our method can be regarded as the data augmentation times. From the results in Figure 3, we can draw two meaningful conclusions. Firstly, as the accuracy increases on clean examples, model robustness drops, which indicates that there exists a trade-off between accuracy and robustness in practice. Secondly, for a fixed perturbation threshold , the larger k, bringing larger data complexities and diversities during training, indeed supplies models with stronger adversarial and corruption robustness. Though the model robustness improves with more iteration steps, the training time consumption rises sharply at the same time. Moreover, the model robustness gradually saturates with a larger , e.g., 5 or 6. Therefore, in practice we can simply choose a moderate value for (e.g., 3) in order to keep the balance between robustness, accuracy and training time.

Figure 4: Visualization of the gradients with respect to the input images via different training strategies on CIFAR-10.

Feature Alignment with Human Vision

Beyond measuring the performance of model robustness and generalization with digital indicators (e.g., accuracy), we also try to discover model robustness more intuitively and perceptually using visualization techniques from the view of feature alignment with human vision. Since the stronger model generates gradients that are better aligned with human visual perception [25], we normalize and visualize the gradient of loss with respect to the input images on CIFAR-10. As shown in Figure 4, the gradients for Naive model turn out to be noises and appear meaningless to human, and those for PGD-AT and GDA are less perceptually aligned with human vision, e.g., too light, fewer outlines, etc. Among all methods, it is easy to observe that the gradients for PAT align much better with perceptually relevant features to human visual perception. This is mainly because that, in order to make consistent predictions, salient objects and critical features must be recognized more easily which result in robust and stable representations obtained from robust model itself. Thus, more broadly, robustness is more likely to be a reliable property for models which offers a path to build more human-aligned applications.

Conclusions

In this paper, we first theoretically established the connections between adversarial robustness and corruption robustness, by introducing a formal and uniform definition of the model robustness. Based on the surprising finding, we proposed a powerful training method named Progressive Adversarial Training (PAT), where diversified adversarial noises are aggregated, augmented and injected progressively to simultaneously guarantee both adversarial and corruption robustness. Theoretical analyses have been dedicated to proving the upper bound of model generalization and robustness, which in turn promise model with strong generalization ability and robustness against noises. Extensive experiments on MNIST, CIFAR-10 and SVHN show that PAT comprehensively performs well compared to various augmentation methods.

References

  • [1] A. Athalye, N. Carlini, and D. Wagner (2018) Obfuscated gradients give a false sense of security: circumventing defenses to adversarial examples. arXiv preprint arXiv:1802.00420. Cited by: Introduction, Evaluation Protocol, Generalization Evaluation with Mixed Test.
  • [2] D. Bahdanau, K. Cho, and Y. Bengio (2014) Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473. Cited by: Introduction.
  • [3] N. Carlini, A. Athalye, N. Papernot, W. Brendel, J. Rauber, D. Tsipras, I. Goodfellow, and A. Madry (2019) On evaluating adversarial robustness. arXiv preprint arXiv:1902.06705. Cited by: Evaluation Protocol, Evaluation Protocol.
  • [4] N. Carlini and D. Wagner (2017) Towards evaluating the robustness of neural networks. In 2017 IEEE Symposium on Security and Privacy (SP), pp. 39–57. Cited by: Evaluation Protocol.
  • [5] N. Ford, J. Gilmer, N. Carlini, and D. Cubuk (2019) Adversarial examples are a natural consequence of test error in noise. arXiv preprint arXiv:1901.10513. Cited by: Theoretical Connections Between Adversarial and Corruption Robustness.
  • [6] I. J. Goodfellow, J. Shlens, and C. Szegedy (2014) Explaining and harnessing adversarial examples. arXiv preprint arXiv:1412.6572. Cited by: Introduction.
  • [7] K. He, X. Zhang, S. Ren, and J. Sun (2016) Deep residual learning for image recognition. In

    Proceedings of the IEEE conference on computer vision and pattern recognition

    ,
    pp. 770–778. Cited by: Experiment and Evaluation.
  • [8] D. Hendrycks and T. Dietterich (2019) Benchmarking neural network robustness to common corruptions and perturbations. In International Conference on Learning Representations, Cited by: Introduction, Introduction, Evaluation Protocol.
  • [9] G. Hinton, L. Deng, D. Yu, G. E. Dahl, A. Mohamed, N. Jaitly, A. Senior, V. Vanhoucke, P. Nguyen, and T. N. Sainath (2012) Deep neural networks for acoustic modeling in speech recognition: the shared views of four research groups. IEEE Signal Processing Magazine 29 (6), pp. 82–97. Cited by: Introduction.
  • [10] A. Krizhevsky and G. Hinton (2009) Learning multiple layers of features from tiny images. Technical report Citeseer. Cited by: Experiment and Evaluation.
  • [11] A. Krizhevsky, I. Sutskever, and G. E. Hinton (2012) ImageNet classification with deep convolutional neural networks. In International Conference on Neural Information Processing Systems, pp. 1097–1105. Cited by: Introduction.
  • [12] A. Kurakin, I. Goodfellow, and S. Bengio (2016) Adversarial examples in the physical world. arXiv preprint arXiv:1607.02533. Cited by: Introduction.
  • [13] A. Kurakin, I. Goodfellow, and S. Bengio (2017) Adversarial machine learning at scale. In International Conference on Learning Representations, Cited by: Introduction, Introduction, Evaluation Protocol.
  • [14] Y. LeCun, L. Bottou, Y. Bengio, P. Haffner, et al. (1998) Gradient-based learning applied to document recognition. Proceedings of the IEEE 86 (11), pp. 2278–2324. Cited by: Experiment and Evaluation.
  • [15] Y. LeCun (1998)

    The mnist database of handwritten digits

    .
    http://yann. lecun. com/exdb/mnist/. Cited by: Experiment and Evaluation.
  • [16] A. Liu, X. Liu, J. Fan, A. Zhang, H. Xie, and D. Tao (2019) Perceptual-sensitive gan for generating adversarial patches. In

    33rd AAAI Conference on Artificial Intelligence

    ,
    Cited by: Introduction.
  • [17] A. Madry, A. Makelov, L. Schmidt, D. Tsipras, and A. Vladu (2017) Towards deep learning models resistant to adversarial attacks. arXiv preprint arXiv:1706.06083. Cited by: Introduction, Introduction, Theoretical Connections Between Adversarial and Corruption Robustness, Evaluation Protocol.
  • [18] Y. Netzer, T. Wang, A. Coates, A. Bissacco, B. Wu, and A. Y. Ng (2011) Reading digits in natural images with unsupervised feature learning. Cited by: Experiment and Evaluation.
  • [19] S. Sankaranarayanan, A. Jain, R. Chellappa, and S. N. Lim (2018) Regularizing deep networks using efficient layerwise adversarial training. In Thirty-Second AAAI Conference on Artificial Intelligence, Cited by: Related Work.
  • [20] L. Schmidt, S. Santurkar, D. Tsipras, K. Talwar, and A. Madry (2018) Adversarially robust generalization requires more data. In Advances in Neural Information Processing Systems, pp. 5014–5026. Cited by: Introduction, Formulation.
  • [21] U. Shaham, Y. Yamada, and S. Negahban (2018) Understanding adversarial training: increasing local stability of supervised models through robust optimization. Neurocomputing 307, pp. 195–204. Cited by: Formulation.
  • [22] K. Simonyan and A. Zisserman (2014) Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556. Cited by: Experiment and Evaluation.
  • [23] K. Sun, Z. Zhu, and Z. Lin (2019) Towards understanding adversarial examples systematically: exploring data size, task and model factors. arXiv preprint arXiv:1902.11019. Cited by: Introduction, Formulation.
  • [24] Z. Sun, M. Ozay, Y. Zhang, X. Liu, and T. Okatani (2018) Feature quantization for defending against distortion of images. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7957–7966. Cited by: Introduction, Introduction.
  • [25] D. Tsipras, S. Santurkar, L. Engstrom, A. Turner, and A. Madry (2018)

    Robustness may be at odds with accuracy

    .
    stat 1050, pp. 11. Cited by: Feature Alignment with Human Vision.
  • [26] Y. Tsuzuku, I. Sato, and M. Sugiyama (2018) Lipschitz-margin training: scalable certification of perturbation invariance for deep neural networks. In Advances in Neural Information Processing Systems, pp. 6541–6550. Cited by: Evaluation Protocol.
  • [27] J. Wellner et al. (2013) Weak convergence and empirical processes: with applications to statistics. Springer Science & Business Media. Cited by: Model Robustness.
  • [28] C. Xie, J. Wang, Z. Zhang, Z. Ren, and A. Yuille (2018) Mitigating adversarial effects through randomization. In International Conference on Learning Representations, Cited by: Introduction, Generalization Evaluation with Mixed Test.
  • [29] H. Xu and S. Mannor (2012) Robustness and generalization. Machine learning 86 (3), pp. 391–423. Cited by: Lemma 1.
  • [30] R. Zhai, T. Cai, D. He, C. Dan, K. He, J. Hopcroft, and L. Wang (2019) Adversarially robust generalization just requires more unlabeled data. arXiv preprint arXiv:1906.00555. Cited by: Formulation.
  • [31] S. Zheng, Y. Song, T. Leung, and I. Goodfellow (2016) Improving the robustness of deep neural networks via stability training. In IEEE conference on computer vision and pattern recognition, Cited by: Introduction, Introduction.