There has been a flurry of recent papers proposing adversarial attacks (Moosavi-Dezfooli et al., 2016; Kurakin et al., 2016; Carlini and Wagner, 2017; Madry et al., 2017; Athalye et al., 2018) and defenses (Buckman et al., 2018; Tramèr et al., 2017; Samangouei et al., 2018; Madry et al., 2017; Papernot et al., 2016; Liao et al., 2017) about this issue. Therefore, intelligent attacks against intelligent defenses become an arm race (Wang et al., 2018). Apart from these, many hypotheses have been suggested in the literature, trying to explain the existence of adversarial examples from different perspectives. Linearity hypothesis (Goodfellow et al., 2014) was firstly proposed to explain this problem and obtained great acceptance. Later work (Tanay and Griffin, 2016) studied the linearity hypothesis further and argued that adversarial examples exist when the classification boundaries lie close to the manifold of sampled data. Su et al. (2018) empirically found out the trade-off of accuracy and robustness and revealed that the robustness may be the cost of accuracy.
All the aforementioned explanations are mostly proposed from specific perspectives to explain adversarial examples and there is hardly any work that can provide us a systematic understanding towards this phenomenon. In addition, Schmidt et al. (2018) stated that adversarial robust generalization for adversarial training requires more data and the data set may not be large enough for adversarial training to obtain a high robust generalization. A natural question could be raised: Does robust generalization for standard training also require more data? If so, there seems to exist some contradictions since other works (Dimitris Tsipras, 2018; Su et al., 2018)
proposed that the robustness may be odds with accuracy. It is well-known that more data can improve the generalization, another natural question needs to be answered:will robustness be improved or worsen as the data size increases, especially when data is large enough?
Considering the issues above, we conduct an empirical exploration towards the comprehensive understanding of adversarial examples from three aspects: analyzing the generalization and robustness from limited data to the “infinite”, task-dependent and model-specific factors, attempting to unify previous research and provide new insights in our explanatory framework. In particular, aiming at answering the aforementioned questions between robustness and generalization with regards to the data size, we investigate the variation of robustness for standard training
by changing the size of training data, especially achieving the data augmentation based on Auxiliary Classifier GAN (ACGAN)(Odena et al., 2016). It turns out that with the increase of training data, there indeed exists a trade-off relationship that the robustness deteriorates as the generalization performance increases when the training data are limited, however, the robustness starts to improve when the size of data is large enough and finally robust generalization tends to converge to standard generalization, as shown in Figure 2. This experimental result demonstrates that in limited data regime, adversarially robust generalization for standard training also requires more data. This finding for standard training align with the observation in the adversarial training scenario shown by Dimitris Tsipras (2018); Su et al. (2018). However, we further show that the trade-off relationship between generalization and robustness only exists in the restricted training data. When the size of training data is large enough, the trade-off disappears and the classifier can achieve both good generalization and robustness. To the best of our knowledge, we are the first to reveal the full spectrum of relationship between generalization, robustness and data size for standard training.
As for the task-dependent factors, we investigate the correlation between the input dimension, number of categories to classify and the robustness, respectively. An interesting finding of our analysis is that the robustness firstly increases and then decreases as the input dimension expands, while it shows an apparent downtrend as the number of categories increases. This discloses the correlation between the complexity of decision boundaries and the vulnerability. For model-specific factors, we validate that the current convolutional neural networks actually have better robustness in comparison with other machine learning methods and expanding network capacity in essence cannot provide real robustness though it can contribute to defense against gradient-based attacks and mitigate transferability. In summary, the contributions of the paper are listed below:
We present the global relationship between standard generalization and robust generalization for standard training, showing the trade-off relationship in limited data and consistency when data size is large enough.
We validate the influence of task-dependent factors. Increasing the complexity of decision boundaries via increasing input dimensions and number of categories, can make classifier more susceptible to adversarial attacks.
We demonstrate that the current convolutional neural networks have better robustness than traditional ML approaches and reveal that increasing model capacity actually cannot bring real robustness albeit its better robustness against limited attacks and mitigation of transferability.
2 A Closer Look at Adversarial Examples
The existence of adversarial examples in various machine learning systems demonstrates that the robustness problem is an inherent property of the statistical setup. Here we refine the understanding on adversarial examples. We briefly recap the definition of adversarial examples: crafted indistinguishable examples by adding maliciously constructed perturbations on input data, causing the classifier to produce misclassified predictions.
Define a probability spaceand the probability measure is called the population. The data set is viewed as a realization of a random element of this probability space. Due to the imperceptibility in human vision, we assume that both adversarial examples and legitimate examples are sampled from the identical population behind albeit the low probability of occurrence for adversarial examples Nguyen et al. (2015); Yarin Gal (2018). Due to the randomness of , the discrepancy between the decision boundary of classifier trained on limited samples and the oracle one based on all population data enables some legitimate data including crafted adversarial examples, misclassified by current imperfect classifier.
More specifically, consider all adversarial examples with restricted perturbations around a correctly-classified example shown in Figure 1. Due to the discrepancy between the two decision boundaries based on training samples and population, there exists some legitimate examples of a given class in the vicinity of an original example misclassified by existing classifier though they visually should belong to the population data from that class. Based on this comprehension, we argue that it is the shortage of effective training data that prevents the classifier from including all adjacent examples, particularly those in the blue region in Figure 1, resulting in the existence of adversarial examples.
It underlies the core question: will the robustness be enhanced if we offer sufficient training data to the classifier? It is our natural expectation that when the size of data is large enough, it would be sufficient to learn robust models. In other words, the generalization and robustness are expected to be consistent with respect to the amount of data. In addition, it is also capable of resolving the inconsistency of relationship between generalization and robustness proposed in the introduction part by validating this hypothesis.
Besides that, what task and model factors affect the robustness? Is there any effect of the complexity of decision boundaries on the vulnerability of deep neural networks? Are deep neural networks themselves more susceptible to adversarial attacks compared with other machine learning approaches? All of these issues are what we pursue to explore in the following sections. In summary, we investigate these problems through three experimental parts in this work.
Data size analysis.
Try to uncover the global relationship between standard generalization and robust generalization for standard training with respect to the data size.
Task-dependent factors analysis.
Attempt to explore the influence of input dimension and number of categories on the robustness of classifier.
Model-specific factors analysis.
Compare with traditional ML methods to inspect the vulnerability of convolutional neural networks and then investigate the effect of network capacity (Madry et al., 2017) on the robustness.
3 Experimental Settings
We demonstrate our explanatory framework by performing experiments on several commonly used datasets: MNIST, SVHN, CIFAR10. Fashion-MNIST is an alternative in the first part due to the limited generalization of ACGAN (Odena et al., 2016) on SVHN. The experimental setup is as follows.
To provide a thorough evaluation of the robustness, various well-known attacks are considered: Fast Gradient Sign Method (FGSM) (Goodfellow et al., 2014), PGD attack (Madry et al., 2017), Randomized Fast Gradient Sign Method (RAND+FGSM) (Tramèr et al., 2017) and Carlini-Wagner (CW) attack (Carlini and Wagner, 2017) with -norm.
For the classifier on MNIST and Fashion MNIST, we adopt the simple architecture with two convolutional layers and three fully-connected layer and for SVHN and CIFAR-10, we consider the standard ResNet18 model. All of our models are trained with identical setting of optimizer for fair comparison and could achieve the state-of-the-art test accuracy on clean data for corresponding datasets.
Evaluation of robustness.
We consider the original images that are correctly classified to eliminate the influence of standard generalization. Then we evaluate the classification accuracy on the adversarial examples for these correctly classified images and denote it as Success Defense Rate.
4 Data Size Analysis
In this section, we will show that in the regime of limited data and data augmentation case, the generalization and robustness of the classifier in the standard training behave differently. In the scenario of limited training data, the generalization and robustness exhibit a trade-off relationship, while in the setting of nearly infinite data, the robustness tends to be consistent with generalization.
More specifically, we follow the definition of standard generalization and robust generalization from (Schmidt et al., 2018). In general, standard generalization measures the generalization over the clean test data while robust generalization evaluates the generalization in the adversarial setting where the classification should consider all examples in a perturbation set of original examples. Here we aim to investigate whether adversarially robust generalizaton for standard training also requires more data than standard generalization. If yes, we also need to explore the contradiction behind with the proposals (Su et al., 2018; Dimitris Tsipras, 2018) that robustness may be at odds with generalization. We demonstrate the hypothesis above and resolve the contradiction by exploring the global relationship between robust generalization and standard generalization, especially through data augmentation. We sketch this global relationship in Figure 2.
To verify the relationship between the standard generalization and robust generalization shown in Figure 2, we investigate the change of generalization and robustness with respect to the size of training data by varying the number of training samples from relatively small to large enough. Generalization is measured by the accuracy on the test set and the robustness is evaluated by Success Defense Rate. When the data size is not larger than the existing dataset, we partition its training set with sub-datasets with a strict inclusion relationship: and then we train models on each sub-dataset, respectively. When the data size expands further, we utilize the ACGAN (Odena et al., 2016), a version of conditional GAN (Mirza and Osindero, 2014), to model the conditional distribution of data first and then to generate new images to augment the training set. Due to the limitation on precision of ACGAN, we inject clean examples during the training iterations to maintain the accuracy of original classifier. We conduct experiments on MNIST, Fashion-MNIST and CIFAR10 and the results can be observed in Figure 3.
There is a consistent manifestation from Figure 3 that the robustness evaluated on Success Defense Rate decreases first and then increases later as the training data expands across the three datasets. Nevertheless, the turning point of robustness differs when the dataset changes, which is observed earliest for MNIST, then for Fashion-MNIST and last for CIFAR10, just in the ascending order of complexity of datasets. In addition, the result suggests that Success Defense Rate under CW attack turns up later than the other attacks, indicating that CW is a much stronger attack so that it requires more data to defense against it, which coordinates with recent experimental conclusions (Carlini and Wagner, 2017). As the size of data is enlarged, Success Defense Rate quickly increases to a high level and then saturates. Although there is a slight decline in the phase of data augmentation due to the limited precision of ACGAN, the apparent uptrend can still be easily observed, providing a strong evidence to our previous statement.
Through the data size analysis via data augmentation based on generative models, we have revealed a global relationship between standard generalization and robust generalization for standard training. The experimental result extends and unifies the viewpoints proposed in (Su et al., 2018; Schmidt et al., 2018; Dimitris Tsipras, 2018):
On one hand, it indeed shows a trade-off relationship between the generalization and robustness for standard training before the turning point occurs. As for the underlying reason, we conjecture that with a limited number of samples, training on these data helps the classifier to find a better and clearer decision boundary, resulting in high robustness in the initial stage, just as proposed in (Anonymous, 2019b). As the data size increases, the decision boundary tends to be more complicated and delicate while more training data may lead to higher accuracy, which yields adversarial examples to forge easily. In this case, the generalization and robustness for standard training gradually develop a trade-off relationship.
On the other hand, when the training data is large enough with respect to the complexity of classifier, Success Defense Rate ascends to close 100 where no adversarial attacks can be successfully crafted, implying that the robustness generalization converges to the standard generalization for standard training. To further explore the reason of the change of robustness, we calculate the magnitude of gradients with respect to the input images under and norms; and plot the relationship between median of those gradients and the data size, shown in Figure 4. An interesting observation is that the magnitude of gradients gradually vanishes with the increase of the data size. To further probe the cause of gradients vanishing, we exhibit the change of median of confidence measured by maximum softmax probability of network outputs and demonstrate that it is the saturation of prediction probability that yields the vanishing of gradients. More specifically, as the size of data increases, the classifier is capable of including more examples in input space, especially those in the blue region in Figure 1. In this case, the gap between decision boundary on samples and the oracle one is narrowed, thus improving the robustness as claimed in Session 2. We argue that this type of gradient vanishing is not the phenomenon of obfuscated gradients proposed in (Athalye et al., 2018) since our method is not involved in any complex defensive mechanism, on the contrary, it is actually the presentation of real robustness when data size is large enough.
Recommendation on defense
In consideration of the global relationship between robustness and generalization, we speculate there might exist a kind of phenomenon called robustness trap, in which the robustness is likely to continue to deteriorate when we augment training data from the existing dataset although the generalization might be improved during this process. Only if the augmented data is large enough can the classifier overcome this trap and its robustness is likely to turn better, which suggests us establish a more precise generative model to overcome the phenomenon in the future.
5 Task-Dependent Factors
For the classification task-dependent factors, we demonstrate that it is the complexity of decision boundaries that aggravates the problem of adversarial examples. Here we explore the task-dependent factors from two explicit aspects: input dimension and number of categories to classify.
5.1 Input Dimension
For the factor of input dimension, in general we propose that classification tasks with more input dimensions are more vulnerable to adversarial attacks. We verify this proposal in the following.
Previous work (Anonymous, 2019a)
states that in the space of high dimensional data, correctly classified data are very close to misclassified examples, especially for adversarial examples. It adheres to our intuition that the distance between examples is becoming subtler in the high dimensional space, thus aggravating the vulnerability of classifier.Simon-Gabriel et al. (2018) has proved this influence of input dimension on robustness in the adversarial regularization scenario. In contrast, we conduct our exploration in a more general explanatory framework, getting rid of the gradient-based regularization (Simon-Gabriel et al., 2018) and perform experiments on more datasets with various types of attacks. The observation from our experiments discloses some distinctions with previous work (Simon-Gabriel et al., 2018), showing that the integral correlation between the input dimension and robustness is not just a simple monotonous one.
We implement the experiments by resizing the input images based on bilinear interpolation into different sizes. For the convenience of design and the adaptation of datasets, we apply different networks with similar architecture on three datasets. For MNIST, we adopt a simple network with one convolutional layer and two fully-connected layers. For SVHN and CIFAR10, we apply networks with three convolutional layers and three fully-connected layers, seven convolutional layers and three fully-connected layers, respectively. The experimental results are shown in Figure5.
In general, the robustness presents a decreasing tendency as the input dimension expands. However, it can be easily observed that except CW attack, the robustness becomes worse when the input dimension is too small. We hypothesize that it is the small input dimensions that makes the classifier under standard training suffer from overfitting, which can be verified by the decline of generalization as well.
5.2 Number of Categories to Classify
For the factor of number of categories to classify, it is intuitive for us to expect that as the the number of categories grows, the decision boundary will become more complicated, causing the classifier more susceptible to attacks. Thus, we select training data with label from 0 to and construct several subsets with a strict inclusion relationship by enlarging . Then the networks with different categories to classify are trained on these datasets and evaluated on corresponding robustness under various attacks.
As illustrated in Figure 6, there is an apparent downtrend of robustness with the increasing of categories to classify though we have to admit that some stochastic effects still exist. For the three datasets, deep neural networks that are required to classify more categories under standard training are more vulnerable to adversarial attacks, which is consistent with our intuition from the perspective of complexity of decision boundaries mentioned before.
Recommendation on defense
It is discouraging to admit the fact that deep neural networks in those classification tasks with higher input dimensions and more categories to classify are liable to suffer from adversarial attacks since in practice this is the trend to utilize deep neural networks to tackle more complicated tasks in the future. Nevertheless, we might consider to combine the models dealing with relative simple tasks to resolve the complex task, probably maintaining a high robustness. In addition, resizing the image to a proper size is beneficial to robustness as well.
6 Model-Specific Factors
For the model-specific factors, on the one hand, we reveal that the robustness of deep neural networks is better than other machine learning models for standard training. On the other hand, we demonstrate that increasing model capacity can help to defend against gradient-based attacks but it actually cannot yield real robustness since they are still fragile faced with alternative optimization-based CW attacks.
6.1 Comparison with Other Machine Learning Approaches
We conduct experiments by comparing the robustness among CNN, LinearSVM and Logistic regression on the three datasets. The CNN has two convolutional layers and two fully-connected layers. Logistic regression adopts standard cross entropy loss and LinearSVM employs a variation of multiclass hinge loss.
It might be the preconceived notion that deep neural networks are more susceptible to adversarial examples than other traditional machines learning models due to their uncontrollable Lipschitz constants (Zantedeschi et al., 2017). However, our experimental results present evidence that Success Defense Rates of CNN under each attack are consistently superior to the others, suggesting that the robustness of deep neural networks is much better than other typical machines learning systems, as shown in Figure 7. One thing should be mentioned that although the evaluation of robustness is based on Success Defense Rate, which only examine on truly classified examples by given classifier, this index is actually inevitably influenced by generalization in this comparison because LinearSVM and Logistic regression cannot attain the similar generalization performance as CNN especially on SVHN and CIFAR10. We can approximately conclude that CNN has both better generalization and robustness that other machine learning approaches.
6.2 Model Capacity
Madry et al. (2017) demonstrated that larger model capacity can decrease transferability of adversarial examples, however, we find out increasing model capacity actually cannot bring real robustness by additionally testing CW attack.
We follow the definition of model capacity in (Madry et al., 2017), namely the number of filters, and adopt the network architecture with four convolutional layers and three fully-connected layers for the convenience of implementation. To increase the network capacity, we modified the network by incorporating wider layers with different factors , resulting in the enlargement of number of filters with certain magnification. Figure 8 depicts the trend of robustness with respect to the number of filters. We can observe that Success Defense Rate exhibits an apparent uptrend against gradient-based attacks, i.e. FGSM and PGD attack. Madry et al. (2017) stated that increasing the network capacity improves the resistance against transfer attacks, which is in accordance with our result since gradient-based attacks are more transferable that CW attack, as proposed by (Su et al., 2018). Furthermore, we obtain a more profound conclusion that increasing network capacity actually is unable to improve the real robustness of deep neural networks since the networks are more vulnerable against alternative optimization-based CW attacks (Carlini and Wagner, 2017). This also raises an open problem that what are the difference between CW attacks and others.
Furthermore, we explore the correlation between the magnitude of gradients and model capacity by calculating the median of gradients with different norms. As illustrated in Figure 9, for all the datasets, it displays an apparent downtrend as the model capacity increases. This implies that the networks with larger capacity have smaller gradients so that they have better robustness against gradient-based attack. However, the reduction of gradients have no improvement on CW attacks that are not directly constructed based on gradients of original loss. More differences between gradient-based attacks and CW attacks are well worth exploring in the future.
Recommendation on defense
Model capacity plays an important role on the robustness but it cannot bring real robustness. Since network architecture is a more crucial factor for robustness, it is a promising direction to design the robust network architectures in the future.
In this work, we empirically present a systematical study on adversarial examples from three aspects: the amount of data, task-dependent factors and model-specific factors. In particular, we demonstrate that adversarially robust generalization of deep neural networks under standard training also requires more data, and reveal the global relationship between generalization and robustness especially through data augmentation. Then we demonstrate that increasing complexity of decision boundaries will aggravate the vulnerability of deep neural networks from task-dependent factors and demystify the relationship between model-specific factors and robustness. Our analysis sheds light on the systematic understanding of adversarial examples.
- Anonymous [2019a] Anonymous. Adversarial examples are a natural consequence of test error in noise. In Submitted to International Conference on Learning Representations, 2019. under review.
- Anonymous [2019b] Anonymous. How training data affect the accuracy and robustness of neural networks for image classification. In Submitted to International Conference on Learning Representations, 2019. under review.
- Athalye et al.  Anish Athalye, Nicholas Carlini, and David Wagner. Obfuscated gradients give a false sense of security: Circumventing defenses to adversarial examples. arXiv preprint arXiv:1802.00420, 2018.
- Buckman et al.  Jacob Buckman, Aurko Roy, Colin Raffel, and Ian Goodfellow. Thermometer encoding: One hot way to resist adversarial examples. 2018.
- Carlini and Wagner  Nicholas Carlini and David Wagner. Towards evaluating the robustness of neural networks. In 2017 IEEE Symposium on Security and Privacy (SP), pages 39–57. IEEE, 2017.
- Dimitris Tsipras  Logan Engstrom Alexander Turner Aleksander Madry Dimitris Tsipras, Shibani Santurkar. Robustness may be at odds with accuracy. arXiv preprint arXiv:1805.12152, 2018.
- Goodfellow et al.  Ian J Goodfellow, Jonathon Shlens, and Christian Szegedy. Explaining and harnessing adversarial examples. arXiv preprint arXiv:1412.6572, 2014.
- Kurakin et al.  Alexey Kurakin, Ian Goodfellow, and Samy Bengio. Adversarial machine learning at scale. arXiv preprint arXiv:1611.01236, 2016.
- Liao et al.  Fangzhou Liao, Ming Liang, Yinpeng Dong, Tianyu Pang, Jun Zhu, and Xiaolin Hu. Defense against adversarial attacks using high-level representation guided denoiser. arXiv preprint arXiv:1712.02976, 2017.
- Madry et al.  Aleksander Madry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, and Adrian Vladu. Towards deep learning models resistant to adversarial attacks. arXiv preprint arXiv:1706.06083, 2017.
- Mirza and Osindero  Mehdi Mirza and Simon Osindero. Conditional generative adversarial nets. arXiv preprint arXiv:1411.1784, 2014.
- Moosavi-Dezfooli et al.  Seyed-Mohsen Moosavi-Dezfooli, Alhussein Fawzi, and Pascal Frossard. Deepfool: a simple and accurate method to fool deep neural networks. In , pages 2574–2582, 2016.
- Nguyen et al.  Anh Nguyen, Jason Yosinski, and Jeff Clune. Deep neural networks are easily fooled: High confidence predictions for unrecognizable images. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 427–436, 2015.
- Odena et al.  Augustus Odena, Christopher Olah, and Jonathon Shlens. Conditional image synthesis with auxiliary classifier gans. arXiv preprint arXiv:1610.09585, 2016.
- Papernot et al.  Nicolas Papernot, Patrick McDaniel, Xi Wu, Somesh Jha, and Ananthram Swami. Distillation as a defense to adversarial perturbations against deep neural networks. In 2016 IEEE Symposium on Security and Privacy (SP), pages 582–597. IEEE, 2016.
- Samangouei et al.  Pouya Samangouei, Maya Kabkab, and Rama Chellappa. Defense-gan: Protecting classifiers against adversarial attacks using generative models. arXiv preprint arXiv:1805.06605, 2018.
- Schmidt et al.  Ludwig Schmidt, Shibani Santurkar, Dimitris Tsipras, Kunal Talwar, and Aleksander Madry. Adversarially robust generalization requires more data. arXiv preprint arXiv:1804.11285, 2018.
- Simon-Gabriel et al.  Carl-Johann Simon-Gabriel, Yann Ollivier, Bernhard Schölkopf, Léon Bottou, and David Lopez-Paz. Adversarial vulnerability of neural networks increases with input dimension. arXiv preprint arXiv:1802.01421, 2018.
- Su et al.  Dong Su, Huan Zhang, Hongge Chen, Jinfeng Yi, Pin-Yu Chen, and Yupeng Gao. Is robustness the cost of accuracy?–a comprehensive study on the robustness of 18 deep image classification models. In Proceedings of the European Conference on Computer Vision (ECCV), pages 631–648, 2018.
- Szegedy et al.  Christian Szegedy, Wojciech Zaremba, Ilya Sutskever, Joan Bruna, Dumitru Erhan, Ian Goodfellow, and Rob Fergus. Intriguing properties of neural networks. arXiv preprint arXiv:1312.6199, 2013.
- Tanay and Griffin  Thomas Tanay and Lewis Griffin. A boundary tilting persepective on the phenomenon of adversarial examples. arXiv preprint arXiv:1608.07690, 2016.
- Tramèr et al.  Florian Tramèr, Alexey Kurakin, Nicolas Papernot, Ian Goodfellow, Dan Boneh, and Patrick McDaniel. Ensemble adversarial training: Attacks and defenses. arXiv preprint arXiv:1705.07204, 2017.
- Wang et al.  Jingkang Wang, Ruoxi Jia, Gerald Friedland, Bo Li, and Costas Spanos. One bit matters: Understanding adversarial examples as the abuse of redundancy. arXiv preprint arXiv:1810.09650, 2018.
- Yarin Gal  Lewis Smith Yarin Gal. Sufficient conditions for idealised models to have no adversarial examples: a theoretical and empirical study with bayesian neural networks. arXiv preprint arXiv:1806.00667, 2018.
Zantedeschi et al. 
Valentina Zantedeschi, Maria-Irina Nicolae, and Ambrish Rawat.
Efficient defenses against adversarial attacks.
Proceedings of the 10th ACM Workshop on Artificial Intelligence and Security, pages 39–49. ACM, 2017.