Towards Understanding Adversarial Examples Systematically: Exploring Data Size, Task and Model Factors

by   Ke Sun, et al.
Peking University

Most previous works usually explained adversarial examples from several specific perspectives, lacking relatively integral comprehension about this problem. In this paper, we present a systematic study on adversarial examples from three aspects: the amount of training data, task-dependent and model-specific factors. Particularly, we show that adversarial generalization (i.e. test accuracy on adversarial examples) for standard training requires more data than standard generalization (i.e. test accuracy on clean examples); and uncover the global relationship between generalization and robustness with respect to the data size especially when data is augmented by generative models. This reveals the trade-off correlation between standard generalization and robustness in limited training data regime and their consistency when data size is large enough. Furthermore, we explore how different task-dependent and model-specific factors influence the vulnerability of deep neural networks by extensive empirical analysis. Relevant recommendations on defense against adversarial attacks are provided as well. Our results outline a potential path towards the luminous and systematic understanding of adversarial examples.



There are no comments yet.


page 1

page 2

page 3

page 4


Searching for an Effective Defender: Benchmarking Defense against Adversarial Word Substitution

Recent studies have shown that deep neural networks are vulnerable to in...

Relationship between manifold smoothness and adversarial vulnerability in deep learning with local errors

Artificial neural networks can achieve impressive performances, and even...

Adversarial Vertex Mixup: Toward Better Adversarially Robust Generalization

Adversarial examples cause neural networks to produce incorrect outputs ...

Random Projections for Adversarial Attack Detection

Whilst adversarial attack detection has received considerable attention,...

Adversarial Parameter Defense by Multi-Step Risk Minimization

Previous studies demonstrate DNNs' vulnerability to adversarial examples...

Provable trade-offs between private robust machine learning

Historically, machine learning methods have not been designed with secur...

Rethinking Machine Learning Robustness via its Link with the Out-of-Distribution Problem

Despite multiple efforts made towards robust machine learning (ML) model...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Although deep learning has achieved impressive performance in a wide range of machine learning tasks, recent research 

(Szegedy et al., 2013; Goodfellow et al., 2014) has discovered that existing deep neural networks are susceptible to imperceptible perturbations of the input data, making erroneous but high-confident predictions. Furthermore, this phenomenon under the name of adversarial examples is demonstrated ubiquitous in machine learning systems, causing great real-world security concerns.

There has been a flurry of recent papers proposing adversarial attacks (Moosavi-Dezfooli et al., 2016; Kurakin et al., 2016; Carlini and Wagner, 2017; Madry et al., 2017; Athalye et al., 2018) and defenses (Buckman et al., 2018; Tramèr et al., 2017; Samangouei et al., 2018; Madry et al., 2017; Papernot et al., 2016; Liao et al., 2017) about this issue. Therefore, intelligent attacks against intelligent defenses become an arm race (Wang et al., 2018). Apart from these, many hypotheses have been suggested in the literature, trying to explain the existence of adversarial examples from different perspectives. Linearity hypothesis (Goodfellow et al., 2014) was firstly proposed to explain this problem and obtained great acceptance. Later work (Tanay and Griffin, 2016) studied the linearity hypothesis further and argued that adversarial examples exist when the classification boundaries lie close to the manifold of sampled data. Su et al. (2018) empirically found out the trade-off of accuracy and robustness and revealed that the robustness may be the cost of accuracy.

All the aforementioned explanations are mostly proposed from specific perspectives to explain adversarial examples and there is hardly any work that can provide us a systematic understanding towards this phenomenon. In addition, Schmidt et al. (2018) stated that adversarial robust generalization for adversarial training requires more data and the data set may not be large enough for adversarial training to obtain a high robust generalization. A natural question could be raised: Does robust generalization for standard training also require more data? If so, there seems to exist some contradictions since other works (Dimitris Tsipras, 2018; Su et al., 2018)

proposed that the robustness may be odds with accuracy. It is well-known that more data can improve the generalization, another natural question needs to be answered:

will robustness be improved or worsen as the data size increases, especially when data is large enough?

Considering the issues above, we conduct an empirical exploration towards the comprehensive understanding of adversarial examples from three aspects: analyzing the generalization and robustness from limited data to the “infinite”, task-dependent and model-specific factors, attempting to unify previous research and provide new insights in our explanatory framework. In particular, aiming at answering the aforementioned questions between robustness and generalization with regards to the data size, we investigate the variation of robustness for standard training

by changing the size of training data, especially achieving the data augmentation based on Auxiliary Classifier GAN (ACGAN) 

(Odena et al., 2016). It turns out that with the increase of training data, there indeed exists a trade-off relationship that the robustness deteriorates as the generalization performance increases when the training data are limited, however, the robustness starts to improve when the size of data is large enough and finally robust generalization tends to converge to standard generalization, as shown in Figure 2. This experimental result demonstrates that in limited data regime, adversarially robust generalization for standard training also requires more data. This finding for standard training align with the observation in the adversarial training scenario shown by  Dimitris Tsipras (2018); Su et al. (2018). However, we further show that the trade-off relationship between generalization and robustness only exists in the restricted training data. When the size of training data is large enough, the trade-off disappears and the classifier can achieve both good generalization and robustness. To the best of our knowledge, we are the first to reveal the full spectrum of relationship between generalization, robustness and data size for standard training.

As for the task-dependent factors, we investigate the correlation between the input dimension, number of categories to classify and the robustness, respectively. An interesting finding of our analysis is that the robustness firstly increases and then decreases as the input dimension expands, while it shows an apparent downtrend as the number of categories increases. This discloses the correlation between the complexity of decision boundaries and the vulnerability. For model-specific factors, we validate that the current convolutional neural networks actually have better robustness in comparison with other machine learning methods and expanding network capacity in essence cannot provide real robustness though it can contribute to defense against gradient-based attacks and mitigate transferability. In summary, the contributions of the paper are listed below:

  • We provide a systematic analysis on adversarial examples for standard training and unify relevant previous works (Dimitris Tsipras, 2018; Su et al., 2018; Schmidt et al., 2018), paving a way towards better understanding about adversarial examples.

  • We present the global relationship between standard generalization and robust generalization for standard training, showing the trade-off relationship in limited data and consistency when data size is large enough.

  • We validate the influence of task-dependent factors. Increasing the complexity of decision boundaries via increasing input dimensions and number of categories, can make classifier more susceptible to adversarial attacks.

  • We demonstrate that the current convolutional neural networks have better robustness than traditional ML approaches and reveal that increasing model capacity actually cannot bring real robustness albeit its better robustness against limited attacks and mitigation of transferability.

2 A Closer Look at Adversarial Examples

The existence of adversarial examples in various machine learning systems demonstrates that the robustness problem is an inherent property of the statistical setup. Here we refine the understanding on adversarial examples. We briefly recap the definition of adversarial examples: crafted indistinguishable examples by adding maliciously constructed perturbations on input data, causing the classifier to produce misclassified predictions.

Define a probability space

and the probability measure is called the population. The data set is viewed as a realization of a random element of this probability space. Due to the imperceptibility in human vision, we assume that both adversarial examples and legitimate examples are sampled from the identical population behind albeit the low probability of occurrence for adversarial examples Nguyen et al. (2015); Yarin Gal (2018). Due to the randomness of , the discrepancy between the decision boundary of classifier trained on limited samples and the oracle one based on all population data enables some legitimate data including crafted adversarial examples, misclassified by current imperfect classifier.

Figure 1: Location of adversarial examples in the proximity of an input example.

More specifically, consider all adversarial examples with restricted perturbations around a correctly-classified example shown in Figure 1. Due to the discrepancy between the two decision boundaries based on training samples and population, there exists some legitimate examples of a given class in the vicinity of an original example misclassified by existing classifier though they visually should belong to the population data from that class. Based on this comprehension, we argue that it is the shortage of effective training data that prevents the classifier from including all adjacent examples, particularly those in the blue region in Figure 1, resulting in the existence of adversarial examples.

It underlies the core question: will the robustness be enhanced if we offer sufficient training data to the classifier? It is our natural expectation that when the size of data is large enough, it would be sufficient to learn robust models. In other words, the generalization and robustness are expected to be consistent with respect to the amount of data. In addition, it is also capable of resolving the inconsistency of relationship between generalization and robustness proposed in the introduction part by validating this hypothesis.

Besides that, what task and model factors affect the robustness? Is there any effect of the complexity of decision boundaries on the vulnerability of deep neural networks? Are deep neural networks themselves more susceptible to adversarial attacks compared with other machine learning approaches? All of these issues are what we pursue to explore in the following sections. In summary, we investigate these problems through three experimental parts in this work.

Data size analysis.

Try to uncover the global relationship between standard generalization and robust generalization for standard training with respect to the data size.

Task-dependent factors analysis.

Attempt to explore the influence of input dimension and number of categories on the robustness of classifier.

Model-specific factors analysis.

Compare with traditional ML methods to inspect the vulnerability of convolutional neural networks and then investigate the effect of network capacity (Madry et al., 2017) on the robustness.

3 Experimental Settings

We demonstrate our explanatory framework by performing experiments on several commonly used datasets: MNIST, SVHN, CIFAR10. Fashion-MNIST is an alternative in the first part due to the limited generalization of ACGAN (Odena et al., 2016) on SVHN. The experimental setup is as follows.

Adversarial attacks.

To provide a thorough evaluation of the robustness, various well-known attacks are considered: Fast Gradient Sign Method (FGSM) (Goodfellow et al., 2014), PGD attack (Madry et al., 2017), Randomized Fast Gradient Sign Method (RAND+FGSM) (Tramèr et al., 2017) and Carlini-Wagner (CW) attack (Carlini and Wagner, 2017) with -norm.

Model architectures.

For the classifier on MNIST and Fashion MNIST, we adopt the simple architecture with two convolutional layers and three fully-connected layer and for SVHN and CIFAR-10, we consider the standard ResNet18 model. All of our models are trained with identical setting of optimizer for fair comparison and could achieve the state-of-the-art test accuracy on clean data for corresponding datasets.

Evaluation of robustness.

We consider the original images that are correctly classified to eliminate the influence of standard generalization. Then we evaluate the classification accuracy on the adversarial examples for these correctly classified images and denote it as Success Defense Rate.

4 Data Size Analysis

In this section, we will show that in the regime of limited data and data augmentation case, the generalization and robustness of the classifier in the standard training behave differently. In the scenario of limited training data, the generalization and robustness exhibit a trade-off relationship, while in the setting of nearly infinite data, the robustness tends to be consistent with generalization.

Figure 2: Relationship between standard generalization and robust generalization with respect to the amount of data.
Figure 3: Relationship between generalization and robustness with respect to the data size under different adversarial attacks on MNIST, Fashion-MNIST and CIFAR10. Black dashed line represents the test accuracy and decimals behind FGSM denote the perturbation .

More specifically, we follow the definition of standard generalization and robust generalization from (Schmidt et al., 2018). In general, standard generalization measures the generalization over the clean test data while robust generalization evaluates the generalization in the adversarial setting where the classification should consider all examples in a perturbation set of original examples. Here we aim to investigate whether adversarially robust generalizaton for standard training also requires more data than standard generalization. If yes, we also need to explore the contradiction behind with the proposals (Su et al., 2018; Dimitris Tsipras, 2018) that robustness may be at odds with generalization. We demonstrate the hypothesis above and resolve the contradiction by exploring the global relationship between robust generalization and standard generalization, especially through data augmentation. We sketch this global relationship in Figure 2.

To verify the relationship between the standard generalization and robust generalization shown in Figure 2, we investigate the change of generalization and robustness with respect to the size of training data by varying the number of training samples from relatively small to large enough. Generalization is measured by the accuracy on the test set and the robustness is evaluated by Success Defense Rate. When the data size is not larger than the existing dataset, we partition its training set with sub-datasets with a strict inclusion relationship: and then we train models on each sub-dataset, respectively. When the data size expands further, we utilize the ACGAN (Odena et al., 2016), a version of conditional GAN (Mirza and Osindero, 2014), to model the conditional distribution of data first and then to generate new images to augment the training set. Due to the limitation on precision of ACGAN, we inject clean examples during the training iterations to maintain the accuracy of original classifier. We conduct experiments on MNIST, Fashion-MNIST and CIFAR10 and the results can be observed in Figure 3.

There is a consistent manifestation from Figure 3 that the robustness evaluated on Success Defense Rate decreases first and then increases later as the training data expands across the three datasets. Nevertheless, the turning point of robustness differs when the dataset changes, which is observed earliest for MNIST, then for Fashion-MNIST and last for CIFAR10, just in the ascending order of complexity of datasets. In addition, the result suggests that Success Defense Rate under CW attack turns up later than the other attacks, indicating that CW is a much stronger attack so that it requires more data to defense against it, which coordinates with recent experimental conclusions (Carlini and Wagner, 2017). As the size of data is enlarged, Success Defense Rate quickly increases to a high level and then saturates. Although there is a slight decline in the phase of data augmentation due to the limited precision of ACGAN, the apparent uptrend can still be easily observed, providing a strong evidence to our previous statement.

Through the data size analysis via data augmentation based on generative models, we have revealed a global relationship between standard generalization and robust generalization for standard training. The experimental result extends and unifies the viewpoints proposed in (Su et al., 2018; Schmidt et al., 2018; Dimitris Tsipras, 2018):

On one hand, it indeed shows a trade-off relationship between the generalization and robustness for standard training before the turning point occurs. As for the underlying reason, we conjecture that with a limited number of samples, training on these data helps the classifier to find a better and clearer decision boundary, resulting in high robustness in the initial stage, just as proposed in (Anonymous, 2019b). As the data size increases, the decision boundary tends to be more complicated and delicate while more training data may lead to higher accuracy, which yields adversarial examples to forge easily. In this case, the generalization and robustness for standard training gradually develop a trade-off relationship.

On the other hand, when the training data is large enough with respect to the complexity of classifier, Success Defense Rate ascends to close 100 where no adversarial attacks can be successfully crafted, implying that the robustness generalization converges to the standard generalization for standard training. To further explore the reason of the change of robustness, we calculate the magnitude of gradients with respect to the input images under and norms; and plot the relationship between median of those gradients and the data size, shown in Figure 4. An interesting observation is that the magnitude of gradients gradually vanishes with the increase of the data size. To further probe the cause of gradients vanishing, we exhibit the change of median of confidence measured by maximum softmax probability of network outputs and demonstrate that it is the saturation of prediction probability that yields the vanishing of gradients. More specifically, as the size of data increases, the classifier is capable of including more examples in input space, especially those in the blue region in Figure 1. In this case, the gap between decision boundary on samples and the oracle one is narrowed, thus improving the robustness as claimed in Session 2. We argue that this type of gradient vanishing is not the phenomenon of obfuscated gradients proposed in (Athalye et al., 2018) since our method is not involved in any complex defensive mechanism, on the contrary, it is actually the presentation of real robustness when data size is large enough.

Figure 4: Relationship between median of gradients (the first row), confidence (the second row) and the data size on MNIST, Fashion-MNIST and CIFAR10. Median gradients instead of average ones are adopted to avoid the influence of extreme values.

Recommendation on defense

In consideration of the global relationship between robustness and generalization, we speculate there might exist a kind of phenomenon called robustness trap, in which the robustness is likely to continue to deteriorate when we augment training data from the existing dataset although the generalization might be improved during this process. Only if the augmented data is large enough can the classifier overcome this trap and its robustness is likely to turn better, which suggests us establish a more precise generative model to overcome the phenomenon in the future.

5 Task-Dependent Factors

For the classification task-dependent factors, we demonstrate that it is the complexity of decision boundaries that aggravates the problem of adversarial examples. Here we explore the task-dependent factors from two explicit aspects: input dimension and number of categories to classify.

5.1 Input Dimension

For the factor of input dimension, in general we propose that classification tasks with more input dimensions are more vulnerable to adversarial attacks. We verify this proposal in the following.

Previous work (Anonymous, 2019a)

states that in the space of high dimensional data, correctly classified data are very close to misclassified examples, especially for adversarial examples. It adheres to our intuition that the distance between examples is becoming subtler in the high dimensional space, thus aggravating the vulnerability of classifier.

Simon-Gabriel et al. (2018) has proved this influence of input dimension on robustness in the adversarial regularization scenario. In contrast, we conduct our exploration in a more general explanatory framework, getting rid of the gradient-based regularization (Simon-Gabriel et al., 2018) and perform experiments on more datasets with various types of attacks. The observation from our experiments discloses some distinctions with previous work (Simon-Gabriel et al., 2018), showing that the integral correlation between the input dimension and robustness is not just a simple monotonous one.

We implement the experiments by resizing the input images based on bilinear interpolation into different sizes. For the convenience of design and the adaptation of datasets, we apply different networks with similar architecture on three datasets. For MNIST, we adopt a simple network with one convolutional layer and two fully-connected layers. For SVHN and CIFAR10, we apply networks with three convolutional layers and three fully-connected layers, seven convolutional layers and three fully-connected layers, respectively. The experimental results are shown in Figure 


Figure 5: Relationship between the robustness and the input size on MNIST, SVHN and CIFAR10. The input dimension is proportional to the square of input size.

In general, the robustness presents a decreasing tendency as the input dimension expands. However, it can be easily observed that except CW attack, the robustness becomes worse when the input dimension is too small. We hypothesize that it is the small input dimensions that makes the classifier under standard training suffer from overfitting, which can be verified by the decline of generalization as well.

5.2 Number of Categories to Classify

For the factor of number of categories to classify, it is intuitive for us to expect that as the the number of categories grows, the decision boundary will become more complicated, causing the classifier more susceptible to attacks. Thus, we select training data with label from 0 to and construct several subsets with a strict inclusion relationship by enlarging . Then the networks with different categories to classify are trained on these datasets and evaluated on corresponding robustness under various attacks.

Figure 6: Relationship between robustness and number of categories to classify on MNIST, SVHN and CIFAR10.

As illustrated in Figure 6, there is an apparent downtrend of robustness with the increasing of categories to classify though we have to admit that some stochastic effects still exist. For the three datasets, deep neural networks that are required to classify more categories under standard training are more vulnerable to adversarial attacks, which is consistent with our intuition from the perspective of complexity of decision boundaries mentioned before.

Recommendation on defense

It is discouraging to admit the fact that deep neural networks in those classification tasks with higher input dimensions and more categories to classify are liable to suffer from adversarial attacks since in practice this is the trend to utilize deep neural networks to tackle more complicated tasks in the future. Nevertheless, we might consider to combine the models dealing with relative simple tasks to resolve the complex task, probably maintaining a high robustness. In addition, resizing the image to a proper size is beneficial to robustness as well.

6 Model-Specific Factors

For the model-specific factors, on the one hand, we reveal that the robustness of deep neural networks is better than other machine learning models for standard training. On the other hand, we demonstrate that increasing model capacity can help to defend against gradient-based attacks but it actually cannot yield real robustness since they are still fragile faced with alternative optimization-based CW attacks.

6.1 Comparison with Other Machine Learning Approaches

We conduct experiments by comparing the robustness among CNN, LinearSVM and Logistic regression on the three datasets. The CNN has two convolutional layers and two fully-connected layers. Logistic regression adopts standard cross entropy loss and LinearSVM employs a variation of multiclass hinge loss.

Figure 7: Comparison of robustness among CNN, Logistic regression and LinearSVM on MNIST, SVHN and CIFAR10.

It might be the preconceived notion that deep neural networks are more susceptible to adversarial examples than other traditional machines learning models due to their uncontrollable Lipschitz constants (Zantedeschi et al., 2017). However, our experimental results present evidence that Success Defense Rates of CNN under each attack are consistently superior to the others, suggesting that the robustness of deep neural networks is much better than other typical machines learning systems, as shown in Figure 7. One thing should be mentioned that although the evaluation of robustness is based on Success Defense Rate, which only examine on truly classified examples by given classifier, this index is actually inevitably influenced by generalization in this comparison because LinearSVM and Logistic regression cannot attain the similar generalization performance as CNN especially on SVHN and CIFAR10. We can approximately conclude that CNN has both better generalization and robustness that other machine learning approaches.

6.2 Model Capacity

Madry et al. (2017) demonstrated that larger model capacity can decrease transferability of adversarial examples, however, we find out increasing model capacity actually cannot bring real robustness by additionally testing CW attack.

Figure 8: Relationship between the robustness and model capacity. X axis is in log transformation.

We follow the definition of model capacity in (Madry et al., 2017), namely the number of filters, and adopt the network architecture with four convolutional layers and three fully-connected layers for the convenience of implementation. To increase the network capacity, we modified the network by incorporating wider layers with different factors , resulting in the enlargement of number of filters with certain magnification. Figure 8 depicts the trend of robustness with respect to the number of filters. We can observe that Success Defense Rate exhibits an apparent uptrend against gradient-based attacks, i.e. FGSM and PGD attack. Madry et al. (2017) stated that increasing the network capacity improves the resistance against transfer attacks, which is in accordance with our result since gradient-based attacks are more transferable that CW attack, as proposed by (Su et al., 2018). Furthermore, we obtain a more profound conclusion that increasing network capacity actually is unable to improve the real robustness of deep neural networks since the networks are more vulnerable against alternative optimization-based CW attacks (Carlini and Wagner, 2017). This also raises an open problem that what are the difference between CW attacks and others.

Figure 9: Relationship between model capacity and median of magnitude of gradients with norms.

Furthermore, we explore the correlation between the magnitude of gradients and model capacity by calculating the median of gradients with different norms. As illustrated in Figure 9, for all the datasets, it displays an apparent downtrend as the model capacity increases. This implies that the networks with larger capacity have smaller gradients so that they have better robustness against gradient-based attack. However, the reduction of gradients have no improvement on CW attacks that are not directly constructed based on gradients of original loss. More differences between gradient-based attacks and CW attacks are well worth exploring in the future.

Recommendation on defense

Model capacity plays an important role on the robustness but it cannot bring real robustness. Since network architecture is a more crucial factor for robustness, it is a promising direction to design the robust network architectures in the future.

7 Conclusion

In this work, we empirically present a systematical study on adversarial examples from three aspects: the amount of data, task-dependent factors and model-specific factors. In particular, we demonstrate that adversarially robust generalization of deep neural networks under standard training also requires more data, and reveal the global relationship between generalization and robustness especially through data augmentation. Then we demonstrate that increasing complexity of decision boundaries will aggravate the vulnerability of deep neural networks from task-dependent factors and demystify the relationship between model-specific factors and robustness. Our analysis sheds light on the systematic understanding of adversarial examples.