Adversarial Example Decomposition

12/04/2018 ∙ by Horace He, et al. ∙ cornell university 0

Research has shown that widely used deep neural networks are vulnerable to carefully crafted adversarial perturbations. Moreover, these adversarial perturbations often transfer across models. We hypothesize that adversarial weakness is composed of three sources of bias: architecture, dataset, and random initialization. We show that one can decompose adversarial examples into an architecture-dependent component, data-dependent component, and noise-dependent component and that these components behave intuitively. For example, noise-dependent components transfer poorly to all other models, while architecture-dependent components transfer better to retrained models with the same architecture. In addition, we demonstrate that these components can be recombined to improve transferability without sacrificing efficacy on the original model.



There are no comments yet.


page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Due to the recent successes of Neural Networks on a wide variety of tasks, they are now being widely applied in the real-world. However, despite their major successes, recent works [9, 1] have shown that in the presence of adversarially perturbed input, they fail catastrophically. Moreover, [9, 1] showed that inputs adversarially generated for one model often cause other models to misclassify images as well, a phenomenon commonly called transferability.

Our understanding of the causes of transferability is fairly limited. [11] analyzes local similarity of decision boundaries to define a local decision boundary metric that determines how transferable adversarial examples between two models are likely to be. However, many questions are still open. A recent work [12] hypothesized that adversarial perturbations could be decomposed into initialization-specific and data-dependent components. It is also hypothesized that the data-dependent component is what primarily contributes to transfer. However, the paper provides neither theoretical nor empirical evidence to justify this hypothesis.

Our work aims to examine this hypothesis in greater detail. We first augment the previous hypothesis to provide decomposition into three parts: architecture-dependent, data-dependent, and noise-dependent components. In addition, we propose a method for decomposing adversarial perturbations into noise-dependent and noise-reduced components, as well as decomposing the noise-reduced component into architecture-dependent and data-dependent components. We also conduct ablation studies to show the significance of the choices made in our methodology.

2 Motivation and Approach

Motivated by the reviewers’ comments on [12], we seek to provide further evidence that an adversarial example can be decomposed into model-dependent and data-dependent portions. First, we augment our hypothesis to claim that an adversarial perturbation can be decomposed into architecture-dependent, data-dependent, and noise-dependent components. We note that it is clear that these are the only things that could contribute in some way to the adversarial example. An intuition behind why noise-dependent components exist and wouldn’t transfer despite working on the original dataset is shown in Figure 2. Not drawn above is the architecture-dependent component. As neural networks induce biases in the decision boundary, and specific network architectures induce specific biases, we would expect that an adversarial example could exploit these biases across all models with the same architecture.

Figure 1:

Noise Vector Decomposition

. , , are as defined in Section 2.1. Note the orthogonality of and justified in the ablation study of Section 3.3.
Figure 2: Varying Decision Boundaries. In the above figure, is the adversarial perturbation, and , are as defined in Section 2.1.

2.1 Notation

We denote to be the set of model architectures. Let to be a set of fully trained models of architecture initialized with random noise. The superscript will be omitted when architecture is clear.

We define an attack , where is an image, its corresponding label, is a neural network model as defined above,

is a loss function,

the initial perturbation of , and a perturbation of such that is maximal.

For fixed architecture , model , and attack , we denote to be the three components of introduced in previous sections. Let ; we will use the short hand .

Let denotes the projection of vector onto vector . Let be the unit vector with same direction as .

2.2 and Decomposition

Description: We fix our architecture and have as our set of trained models. Set to be the cross-entropy loss and let be the generated adversarial perturbation for .

Proposition: can be decomposed into such that the attack is effective on but transfers poorly to , while transfers well on all models.

The values for and are given by the below equations (see Appendix C for justification). The technique is illustrated in Figure 1.

2.3 and Decomposition

Description: We reuse notation from the above section, except that we now consider a set of different architectures

Proposition: can be composed into suc h that the attack is effective on but transfers poorly to , while transfers well on all models.

We can calculate the value for and with the below equations (see Appendix C for justification). We set to be the noise reduced perturbation generated on . Then

3 Results

We empirically verify the approaches given in the motivation above and show that the isolated noise and architecture-dependent perturbations show the desired properties. Unless stated otherwise, all perturbations are generated on CIFAR-10 [5] (original images rescaled to ) using iFGSM [6] with 10 iterations, distance metric , and

. All experiments are run on the first 2000 CIFAR-10 test images. In addition, all models are trained for only 10 epochs due to computational constraints. All percentages reported are fooling ratios

[7]. For results with other settings, check Appendix A.

3.1 and Decomposition

We start off with a set of 10 retrained ResNet18 [2] models . We attack the first ResNet18 model () to get perturbation . We then follow the process in 2.2 to obtain from the other 9 retrained ResNet18 models (). We then test on an untouched set of 5 retrained ResNet18 models . We also do the same process for DenseNet121 [4] instead of ResNet18 and report their respective results in Tables 1 and 2.

We note that achieves a far lower transfer rate than either or while still maintaining relatively high error rate on the original model, providing evidence for the success of this decomposition. To the best of our knowledge, this is the first methodology that is able to construct adversarial examples with especially low transferability. Although this is of low practical use, this is theoretically interesting. We note that although we attempt to generate by multi-fooling across 9 retrained models, reducing noise in high dimensions is difficult, so we are unable to achieve a perfect decomposition of . Ablation studies in Appendix B suggest that we may be able to achieve a better decomposition with a larger set of retrained models.

68.3% 45.6% 46.7%
63.7% 61.9% 59.5%
60.2% 19.8 % 20.3%
Table 2: decomposition (DenseNet121)
70.0 % 47.1% 49.8%
64.3% 65.3% 66.6%
64.9% 27.1% 29.6%
All numbers reported are fooling ratios. Observe that exhibits exceptionally low transferability.
Table 1: decomposition (ResNet18)

3.1.1 Recombining components

As the components and are linearly independent unit vectors, and by definition, is in the span of these vectors, we can find unique scalars and such that . Experimentally, we find that under our setting, and . We note that for our original perturbation, this is perhaps an undue amount of focus paid to the noise-specific perturbation. We can now try setting and to different ratios, which correspond to how much we wish to emphasize attacking the original model vs. transferability. As we are now able to set an arbitrarily high and , allowing us to saturate the epsilon constraints, we sign maximize (ie: ), as motivated in [1]) to level the playing field. The results in Table 3 show the results of performing these experiments on ResNet18. We find that we are able to generate perturbations that perform equivalently with on , while performing substantially better when transferring to and .

65.8% 63.6% 65.1%
2:1 68.5% 63.7% 65.2%
1.5:1 69.4% 61.2% 62.8%
1:1 69.8% 56.0% 56.4%
1:2 70.0% 53.1% 53.5%
69.8% 51.0% 51.0%
All numbers reported are fooling ratios. Observe that as you increase the ratio we obtain better transferability with lowered effectiveness on . Also note that we are able to construct perturbations that are strictly superior to either or .
Table 3: Linear Combinations of and

3.2 and Decomposition

To evaluate decomposition into architecture and data, we consider the 4 models ResNet18 [2], GoogLeNet [10], DenseNet121 [4], and SENet18 [3]. In each experiment we first fix a source architecture and generate by attacking 4 retrained copies of , denoted as . We then generate by attacking four copies of each for twelve models total. We then test on another 4 retrained copies of called as well as , consisting of four copies of each of the other three architectures . We see that for all four models, obtains significantly higher error rate on than on . In addition, the relative error between and for are close to the relative error between and for when averaged across models, supporting the success of our decomposition.

source {
60.9% 50.7% 59.4% 50.7%
ResNet18 54.6% 61.4% 54.8% 60.8%
52.4% 26.7% 36.9% 30.3%
62.8% 46.2% 62.9% 47.1%
DenseNet121 58.4% 58.3% 57.2% 55.7%
54.1% 24.4% 43.1% 26.0%
65.3% 41.9% 65.7% 41.9%
GoogLeNet 59.5% 59.2% 59.5% 58.3%
57.9% 22.8% 44.8% 26.2%
53.8% 48.4% 53.2% 49.0%
SENet18 55.7% 64.5% 54.8% 63.8%
47.1% 28.1% 38.6% 29.8%
All numbers reported are fooling ratios. Note that for all architectures, the adversarial decomposition holds. Namely, is more transferable to its specific architecture than to others, whereas is equally transferable across architectures.
Table 4: decomposition

3.3 Ablation

Orthogonality We assume that , , and

terms are orthogonal. We note that if these vectors had no relation to each other, then due to the properties of high dimensional space, they are approximately orthogonal with very high probability.

We vary orthogonality by modifying the method in 2.2 to generate with . When , we recover the original algorithm, and when , . We also experimentally vary the orthogonality of and in Table 5 and note that we achieve the greatest difference in efficacy between the original model and transferred models when they are near-orthogonal, suggesting that the assumption we made is reasonable.

However, it is not true that orthogonal components achieve the best isolation. This suggests that our current method of decomposition may simply be an approximation for the true components, and that a more nuanced method may be necessary for better isolation.

68.3% 45.6% 22.7
63.7% 61.9% 1.8
0.1 66.7% 42.2% 24.5
0.5 63.4% 29.1% 34.3
0.8 59.7% 21.4% 38.3
1.0 52.7% 16.6% 36.1
1.2 51.9% 11.3% 40.6
1.5 42.9% 9.4% 33.5
2.0 33.5% 7.2% 26.3
All percentages reported are fooling ratios. Note that the setting is what produces maximal difference, which is slightly different from the assumed orthogonality ().
Table 5: Varying

Number of Models We find that the higher the number of models are used to approximate , the more successfully we are able to isolate . Check Appendix B for full results.

4 Conclusion

We demonstrate that it is possible to decompose adversarial perturbations into noise-dependent and data-dependent components, a hypothesis reviewers thought was interesting but unsupported in [12]. We go further beyond by decomposing an adversarial perturbation into model related, data related, and noise related perturbations. A major contribution here is a new method of analyzing adversarial examples; this creates many potential future directions for research. One interesting direction would be extending these decompositions to universal perturbations [7, 8] and thus removing the dependence on individual data points. Another avenue to explore is analyzing various attacks and defenses and how they interplay with these various components.


A. Different attack settings

To show that our decomposition is effective across a variety of attack settings, we perform the experiment of Section 3.1 with three different iFGSM settings corresponding to . Results are shown in Table 6.

39.0% 16.4% 14.4%
.01 25.1% 28.4% 22.2%
26.6% 06.2% 05.3%
68.3% 45.6% 46.7%
.03 63.7% 61.9% 59.5%
60.2% 19.8% 20.3%
81.2% 69.7% 73.6%
.06 81.1% 80.5% 85.8%
77.7% 39.4% 40.0%
Table 6: Varying

B. Varying number of models/iterations

We investigate the effectiveness of the Section 3.1 decomposition as we vary hyper-parameters. Results for increasing iFGSM iterations in Table 7 and results for increasing the results for increasing the number of models are give in Table 8.

# of iters {
65.2% 43.4%
5 58.8% 58.8%
55.5% 20.6%
68.3% 46.7%
10 63.7% 61.9%
60.2% 19.8%
72.9% 48.6%
100 67.3% 65.2%
60.3% 18.7%
Table 7: Varying number of iterations used for iFGSM
# of models {
69.4% 46.6% 45.6%
3 57.6% 62.1% 51.9%
60.1% 24.9% 29.2%
68.4% 47.0% 44.8%
5 60.1% 62.0% 55.2%
57.5% 22.4% 24.6%
68.3% 45.6% 46.7%
10 63.7% 61.9% 59.5%
60.2% 19.8% 20.3%
Table 8: Varying number of models used to approximate

C. Justification of Equations

Justification of Equations in 3.1

Recall that the equations are given by

We assume that the expected value of our noise term is over all random noise. This is motivated because the random noise

at initialization is a Gaussian distribution centered at

, and it is reasonable to assume that the model distribution and the noise distribution follows a similar pattern.

Letting over all random initialization , we claim that . Since and are noise independent, which means that

where is the noise component corresponding with the noise of model . Therefore, it follows that

By the law of large numbers, it follows that

. Therefore, we note that, for sufficiently large , it follows that

We see that, since the cross entropy loss is additive and the attack that we examine are first order differentiation methods, we have

To prove the other claim, we have already shown through empirical results and an intuition that and are linearly independent that and are very close to orthogonal and compose . Therefore, it follows that we can take the use the projection of implies that

up to a scaling constant.

Justification of Equations in 3.2

Recall that the equations are, given generated on ,

We make two core assumptions:

  • The value of . This is a reasonable assumption since our generated architectures should produce roughly symmetric error vectors .

  • is equivalent in the sense that the former produces a noised reduce gradient closer to . This is reasonable because the space of there are many adversarial perturbations (different directions) and changing our start location won’t cripple our search space. Furthermore, we use this to generate a close to .

We claim that where we take over architecture . To see this, we note that

and so again we can approximate it with where is the component generated for model . For sufficiently large , it follows that

Therefore we have

and by our assumption this is roughly equivalent to

as desired. To prove the other claim, we use an analogous argument to the one above as we have shown that and are orthogonal and applying the same projection technique yields

up to a scaling constant.