1 Introduction
Due to the recent successes of Neural Networks on a wide variety of tasks, they are now being widely applied in the realworld. However, despite their major successes, recent works [9, 1] have shown that in the presence of adversarially perturbed input, they fail catastrophically. Moreover, [9, 1] showed that inputs adversarially generated for one model often cause other models to misclassify images as well, a phenomenon commonly called transferability.
Our understanding of the causes of transferability is fairly limited. [11] analyzes local similarity of decision boundaries to define a local decision boundary metric that determines how transferable adversarial examples between two models are likely to be. However, many questions are still open. A recent work [12] hypothesized that adversarial perturbations could be decomposed into initializationspecific and datadependent components. It is also hypothesized that the datadependent component is what primarily contributes to transfer. However, the paper provides neither theoretical nor empirical evidence to justify this hypothesis.
Our work aims to examine this hypothesis in greater detail. We first augment the previous hypothesis to provide decomposition into three parts: architecturedependent, datadependent, and noisedependent components. In addition, we propose a method for decomposing adversarial perturbations into noisedependent and noisereduced components, as well as decomposing the noisereduced component into architecturedependent and datadependent components. We also conduct ablation studies to show the significance of the choices made in our methodology.
2 Motivation and Approach
Motivated by the reviewers’ comments on [12], we seek to provide further evidence that an adversarial example can be decomposed into modeldependent and datadependent portions. First, we augment our hypothesis to claim that an adversarial perturbation can be decomposed into architecturedependent, datadependent, and noisedependent components. We note that it is clear that these are the only things that could contribute in some way to the adversarial example. An intuition behind why noisedependent components exist and wouldn’t transfer despite working on the original dataset is shown in Figure 2. Not drawn above is the architecturedependent component. As neural networks induce biases in the decision boundary, and specific network architectures induce specific biases, we would expect that an adversarial example could exploit these biases across all models with the same architecture.
2.1 Notation
We denote to be the set of model architectures. Let to be a set of fully trained models of architecture initialized with random noise. The superscript will be omitted when architecture is clear.
We define an attack , where is an image, its corresponding label, is a neural network model as defined above,
is a loss function,
the initial perturbation of , and a perturbation of such that is maximal.For fixed architecture , model , and attack , we denote to be the three components of introduced in previous sections. Let ; we will use the short hand .
Let denotes the projection of vector onto vector . Let be the unit vector with same direction as .
2.2 and Decomposition
Description: We fix our architecture and have as our set of trained models. Set to be the crossentropy loss and let be the generated adversarial perturbation for .
Proposition: can be decomposed into such that the attack is effective on but transfers poorly to , while transfers well on all models.
The values for and are given by the below equations (see Appendix C for justification). The technique is illustrated in Figure 1.
2.3 and Decomposition
Description: We reuse notation from the above section, except that we now consider a set of different architectures
Proposition: can be composed into suc h that the attack is effective on but transfers poorly to , while transfers well on all models.
We can calculate the value for and with the below equations (see Appendix C for justification). We set to be the noise reduced perturbation generated on . Then
3 Results
We empirically verify the approaches given in the motivation above and show that the isolated noise and architecturedependent perturbations show the desired properties. Unless stated otherwise, all perturbations are generated on CIFAR10 [5] (original images rescaled to ) using iFGSM [6] with 10 iterations, distance metric , and
. All experiments are run on the first 2000 CIFAR10 test images. In addition, all models are trained for only 10 epochs due to computational constraints. All percentages reported are fooling ratios
[7]. For results with other settings, check Appendix A.3.1 and Decomposition
We start off with a set of 10 retrained ResNet18 [2] models . We attack the first ResNet18 model () to get perturbation . We then follow the process in 2.2 to obtain from the other 9 retrained ResNet18 models (). We then test on an untouched set of 5 retrained ResNet18 models . We also do the same process for DenseNet121 [4] instead of ResNet18 and report their respective results in Tables 1 and 2.
We note that achieves a far lower transfer rate than either or while still maintaining relatively high error rate on the original model, providing evidence for the success of this decomposition. To the best of our knowledge, this is the first methodology that is able to construct adversarial examples with especially low transferability. Although this is of low practical use, this is theoretically interesting. We note that although we attempt to generate by multifooling across 9 retrained models, reducing noise in high dimensions is difficult, so we are unable to achieve a perfect decomposition of . Ablation studies in Appendix B suggest that we may be able to achieve a better decomposition with a larger set of retrained models.
68.3%  45.6%  46.7%  
63.7%  61.9%  59.5%  
60.2%  19.8 %  20.3% 
70.0 %  47.1%  49.8%  
64.3%  65.3%  66.6%  
64.9%  27.1%  29.6% 
3.1.1 Recombining components
As the components and are linearly independent unit vectors, and by definition, is in the span of these vectors, we can find unique scalars and such that . Experimentally, we find that under our setting, and . We note that for our original perturbation, this is perhaps an undue amount of focus paid to the noisespecific perturbation. We can now try setting and to different ratios, which correspond to how much we wish to emphasize attacking the original model vs. transferability. As we are now able to set an arbitrarily high and , allowing us to saturate the epsilon constraints, we sign maximize (ie: ), as motivated in [1]) to level the playing field. The results in Table 3 show the results of performing these experiments on ResNet18. We find that we are able to generate perturbations that perform equivalently with on , while performing substantially better when transferring to and .
65.8%  63.6%  65.1%  
2:1  68.5%  63.7%  65.2% 
1.5:1  69.4%  61.2%  62.8% 
1:1  69.8%  56.0%  56.4% 
1:2  70.0%  53.1%  53.5% 
69.8%  51.0%  51.0% 
3.2 and Decomposition
To evaluate decomposition into architecture and data, we consider the 4 models ResNet18 [2], GoogLeNet [10], DenseNet121 [4], and SENet18 [3]. In each experiment we first fix a source architecture and generate by attacking 4 retrained copies of , denoted as . We then generate by attacking four copies of each for twelve models total. We then test on another 4 retrained copies of called as well as , consisting of four copies of each of the other three architectures . We see that for all four models, obtains significantly higher error rate on than on . In addition, the relative error between and for are close to the relative error between and for when averaged across models, supporting the success of our decomposition.
source  {  

60.9%  50.7%  59.4%  50.7%  
ResNet18  54.6%  61.4%  54.8%  60.8%  
52.4%  26.7%  36.9%  30.3%  
62.8%  46.2%  62.9%  47.1%  
DenseNet121  58.4%  58.3%  57.2%  55.7%  
54.1%  24.4%  43.1%  26.0%  
65.3%  41.9%  65.7%  41.9%  
GoogLeNet  59.5%  59.2%  59.5%  58.3%  
57.9%  22.8%  44.8%  26.2%  
53.8%  48.4%  53.2%  49.0%  
SENet18  55.7%  64.5%  54.8%  63.8%  
47.1%  28.1%  38.6%  29.8% 
3.3 Ablation
Orthogonality We assume that , , and
terms are orthogonal. We note that if these vectors had no relation to each other, then due to the properties of high dimensional space, they are approximately orthogonal with very high probability.
We vary orthogonality by modifying the method in 2.2 to generate with . When , we recover the original algorithm, and when , . We also experimentally vary the orthogonality of and in Table 5 and note that we achieve the greatest difference in efficacy between the original model and transferred models when they are nearorthogonal, suggesting that the assumption we made is reasonable.
However, it is not true that orthogonal components achieve the best isolation. This suggests that our current method of decomposition may simply be an approximation for the true components, and that a more nuanced method may be necessary for better isolation.
Difference  
68.3%  45.6%  22.7  
63.7%  61.9%  1.8  
0.1  66.7%  42.2%  24.5 
0.5  63.4%  29.1%  34.3 
0.8  59.7%  21.4%  38.3 
1.0  52.7%  16.6%  36.1 
1.2  51.9%  11.3%  40.6 
1.5  42.9%  9.4%  33.5 
2.0  33.5%  7.2%  26.3 
Number of Models We find that the higher the number of models are used to approximate , the more successfully we are able to isolate . Check Appendix B for full results.
4 Conclusion
We demonstrate that it is possible to decompose adversarial perturbations into noisedependent and datadependent components, a hypothesis reviewers thought was interesting but unsupported in [12]. We go further beyond by decomposing an adversarial perturbation into model related, data related, and noise related perturbations. A major contribution here is a new method of analyzing adversarial examples; this creates many potential future directions for research. One interesting direction would be extending these decompositions to universal perturbations [7, 8] and thus removing the dependence on individual data points. Another avenue to explore is analyzing various attacks and defenses and how they interplay with these various components.
References
 Goodfellow et al. [2014] I. J. Goodfellow, J. Shlens, and C. Szegedy. Explaining and harnessing adversarial examples. CoRR, abs/1412.6572, 2014.

He et al. [2016]
K. He, X. Zhang, S. Ren, and J. Sun.
Deep residual learning for image recognition.
2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
, pages 770–778, 2016.  Hu et al. [2017] J. Hu, L. Shen, and G. Sun. Squeezeandexcitation networks. CoRR, abs/1709.01507, 2017.
 Huang et al. [2017] G. Huang, Z. Liu, L. van der Maaten, and K. Q. Weinberger. Densely connected convolutional networks. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 2261–2269, 2017.
 Krizhevsky [2009] A. Krizhevsky. Learning multiple layers of features from tiny images. 2009.
 Kurakin et al. [2016] A. Kurakin, I. J. Goodfellow, and S. Bengio. Adversarial examples in the physical world. CoRR, abs/1607.02533, 2016.
 MoosaviDezfooli et al. [2017] S.M. MoosaviDezfooli, A. Fawzi, O. Fawzi, and P. Frossard. Universal adversarial perturbations. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 86–94, 2017.
 Poursaeed et al. [2017] O. Poursaeed, I. Katsman, B. Gao, and S. J. Belongie. Generative adversarial perturbations. CoRR, abs/1712.02328, 2017.
 Szegedy et al. [2013] C. Szegedy, W. Zaremba, I. Sutskever, J. Bruna, D. Erhan, I. J. Goodfellow, and R. Fergus. Intriguing properties of neural networks. CoRR, abs/1312.6199, 2013.
 Szegedy et al. [2015] C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. E. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich. Going deeper with convolutions. 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 1–9, 2015.
 Tramèr et al. [2017] F. Tramèr, A. Kurakin, N. Papernot, D. Boneh, and P. D. McDaniel. Ensemble adversarial training: Attacks and defenses. CoRR, abs/1705.07204, 2017.
 Wu et al. [2018] L. Wu, Z. Zhu, C. Tai, and W. E. Enhancing the transferability of adversarial examples with noise reduced gradient, 2018. URL https://openreview.net/forum?id=ryvxcPeAb.
A. Different attack settings
To show that our decomposition is effective across a variety of attack settings, we perform the experiment of Section 3.1 with three different iFGSM settings corresponding to . Results are shown in Table 6.
{  

39.0%  16.4%  14.4%  
.01  25.1%  28.4%  22.2%  
26.6%  06.2%  05.3%  
68.3%  45.6%  46.7%  
.03  63.7%  61.9%  59.5%  
60.2%  19.8%  20.3%  
81.2%  69.7%  73.6%  
.06  81.1%  80.5%  85.8%  
77.7%  39.4%  40.0% 
B. Varying number of models/iterations
We investigate the effectiveness of the Section 3.1 decomposition as we vary hyperparameters. Results for increasing iFGSM iterations in Table 7 and results for increasing the results for increasing the number of models are give in Table 8.
# of iters  {  

65.2%  43.4%  
5  58.8%  58.8%  
55.5%  20.6%  
68.3%  46.7%  
10  63.7%  61.9%  
60.2%  19.8%  
72.9%  48.6%  
100  67.3%  65.2%  
60.3%  18.7% 
# of models  {  

69.4%  46.6%  45.6%  
3  57.6%  62.1%  51.9%  
60.1%  24.9%  29.2%  
68.4%  47.0%  44.8%  
5  60.1%  62.0%  55.2%  
57.5%  22.4%  24.6%  
68.3%  45.6%  46.7%  
10  63.7%  61.9%  59.5%  
60.2%  19.8%  20.3% 
C. Justification of Equations
Justification of Equations in 3.1
Recall that the equations are given by
We assume that the expected value of our noise term is over all random noise. This is motivated because the random noise
at initialization is a Gaussian distribution centered at
, and it is reasonable to assume that the model distribution and the noise distribution follows a similar pattern.Letting over all random initialization , we claim that . Since and are noise independent, which means that
where is the noise component corresponding with the noise of model . Therefore, it follows that
By the law of large numbers, it follows that
. Therefore, we note that, for sufficiently large , it follows thatWe see that, since the cross entropy loss is additive and the attack that we examine are first order differentiation methods, we have
To prove the other claim, we have already shown through empirical results and an intuition that and are linearly independent that and are very close to orthogonal and compose . Therefore, it follows that we can take the use the projection of implies that
up to a scaling constant.
Justification of Equations in 3.2
Recall that the equations are, given generated on ,
We make two core assumptions:

The value of . This is a reasonable assumption since our generated architectures should produce roughly symmetric error vectors .

is equivalent in the sense that the former produces a noised reduce gradient closer to . This is reasonable because the space of there are many adversarial perturbations (different directions) and changing our start location won’t cripple our search space. Furthermore, we use this to generate a close to .
We claim that where we take over architecture . To see this, we note that
and so again we can approximate it with where is the component generated for model . For sufficiently large , it follows that
Therefore we have
and by our assumption this is roughly equivalent to
as desired. To prove the other claim, we use an analogous argument to the one above as we have shown that and are orthogonal and applying the same projection technique yields
up to a scaling constant.
Comments
There are no comments yet.