Adversarial example, which can mislead deep networks is one of the major obstacles for applying deep learning on security-sensitive applications(Szegedy et al., 2014). Researchers found that some adversarial examples co-exist in models with different architectures and parameters (Papernot et al., 2016b, 2017). By exploiting this property, an adversary can derive adversarial examples through a surrogate model and attack other models (Liu et al., 2017). Seeking these co-existing adversarial examples would benefit many aspects, including evaluating network robustness, developing defense schemes, and understanding deep learning (Goodfellow et al., 2015).
Adversarial examples are commonly studied under two threat models, white-box and black-box attacks (Kurakin et al., 2018)
. In the white-box setting, adversaries have full knowledge of victim models, including model structures, weights of the parameters, and loss functions used to train the models. Accordingly, they can directly obtain the gradients of the victim models and seek adversarial examples(Goodfellow et al., 2015; Madry et al., 2018; Carlini and Wagner, 2017). White-box attacks are important for evaluating and developing robust models (Goodfellow et al., 2015)
. In the black-box setting, adversaries have no knowledge of victim models. Two types of approaches, query-based approach and transfer-based approach, are commonly studied for black-box attacks. The query-based approach estimates the gradients of victim models through the outputs of query images(Guo et al., 2019; Ilyas et al., 2018, 2019; Tu et al., 2019). Due to the huge number of queries, it can be easily defended. The transfer-based approach, using surrogate models to estimate the gradients, is a more practical way in black-box attacks. Some researchers have joined these two approaches together to reduce the number of queries (Cheng et al., 2019; Guo et al., 2019; Huang and Zhang, 2020). This paper focuses on the transfer-based approach because of its practicality and its use in the combined methods.
There are three approaches, standard objective optimization, attention modification, and smoothing to craft adversarial examples. In general, combining methods based on different approaches together yields stronger black-box attacks. Along this line, we propose a new and simple algorithm named Transferable Attack based on Integrated Gradients (TAIG) to generate highly transferable adversarial examples in this paper. The fundamental difference from the previous methods is that TAIG uses one single term to carry out all the three approaches simultaneously. Two versions of TAIG, TAIG-S and TAIG-R are studied. TAIG-S uses the original integrated gradients (Sundararajan et al., 2017) computed on a straight-line path, while TAIG-R calculates the integrated gradients on a random piecewise linear path. TAIG can also be applied with other methods together to further increase its transferability.
The rest of the paper is organized as follows. Section 2 summarizes the related works. Section 3 describes TAIG and discusses it from the three perspectives. Section 4 reports the experimental results and comparisons. Section 5 gives some conclusive remarks.
2 Related Works
mainly uses the gradients of a surrogate model to optimize a standard objective function, such as maximizing a training loss or minimizing the score or logit output of a benign image. Examples are Projected Gradient Descent (PGD)(Madry et al., 2018), Fast Gradient Sign Method (FGSM) (Goodfellow et al., 2015), Basic Iterative Method (BIM) (Kurakin et al., 2016), Momentum Iterative FGSM (MIFGSM) (Dong et al., 2018) and Carlini-Wagner’s (C&W) (Carlini and Wagner, 2017) attacks. They are commonly referred to as gradient-based attacks. These methods were originally designed for white-box attacks, but are commonly used as a back-end component in other methods for black-box attacks.
Attention modification approach
assumes that different deep networks classify the same image based on similar features. Therefore, adversarial examples generated by modifying the features in benign images are expected to be more transferable. Examples are Jacobian based Saliency Map Attack (JSMA)(Papernot et al., 2016a), Attack on Attention (AoA) (Chen et al., 2020) and Attention-guided Transfer Attack (ATA) (Wu et al., 2020b), which use attention maps to identify potential common features for attacks. JSMA uses the Jacobian matrix to compute its attention map, but its objective function is unclear. AoA utilizes SGLRP (Iwana et al., 2019)
to compute the attention map, while ATA uses the gradients of an objective function with respect to neuron outputs to derive an attention map. Both AoA and ATA seek adversarial example that maximizes the difference between its attention map and the attention map of the corresponding benign sample. In addition to the attention terms, AoA and ATA also include the typical attack losses, e.g., logit output in their objective functions, and use a hyperparameter to balance the two terms. Adversarial Perturbations (TAP)(Zhou et al., 2018), Activation attack (AA) (Inkawhich et al., 2019) and Intermediate Level Attack (ILA) (Huang et al., 2019), which all directly maximize the distance between feature maps of benign images and adversarial examples, also belong to this category. TAP and AA generate adversarial examples by employing multi-layer and single-layer feature maps respectively. ILA fine-tunes existing adversarial examples by increasing the perturbation on a specific hidden layer for higher black-box transferability.
Smoothing approach aims at avoiding over-fitting the decision surface of surrogate model. The methods based on the smoothing approach can be divided into two branches. One branch uses smoothed gradients derived from multiple points of the decision surface. Examples are Diverse Inputs Iterative method (DI) (Xie et al., 2019), Scale Invariance Attack (SI) (Lin et al., 2020), Translation-invariant Attack (TI) (Dong et al., 2019), Smoothed Gradient Attack (SG) (Wu and Zhu, 2020), Admix Attack (Admix) (Wang et al., 2021)
and Variance Tuning (VT)(Wang and He, 2021). Most of these methods smooth the gradients by calculating the average gradients of augmented images. The other branch modifies the gradient calculations in order to estimate the gradients of a smoother surface. Examples are Skip Gradient Method (SGM) (Wu et al., 2020a)
and Linear backpropagation Attack (LinBP)(Guo et al., 2020). SGM is specifically designed for models with skip connections, such as ResNet. It forces backpropagating using skip connections more than the routes with non-linear functions. LinBP takes a similar approach, which computes forward loss as normal and skips some non-linear activations in backpropagating. By diminishing non-linear paths in a surrogate model, gradients are computed from a smoother surface.
3 Preliminaries and TAIG
3.1 Notations and Integrated Gradients
For the sake of clear presentation, a set of notations is given first. Let be a classification network that maps input
to a vector whose k-th element represents the value of the k-th output node in the logit layer, andbe the network mapping to the output value of the k-th class, i.e., , where is a transpose operator. To simplify the notations, the subscript k is omitted i.e., , when represents arbitrary class in K or the class label is clear. and represent a benign image and an adversarial example respectively, and and represent their i-th pixels. The class label of is denoted as . The bold symbols e.g., , are used to indicate images, matrices and vectors, and non-bold symbols e.g., , are used to indicate scalars.
Integrated gradients (Sundararajan et al., 2017) is a method attributing the prediction of a deep network to its input features. The attributes computed by it indicate the importance of each pixel to the network output and can be regarded as attention and saliency values. Integrated gradients is developed based on two axioms — Sensitivity and Implementation Invariance, and satisfies another two axioms — Linearity and Completeness. To discuss the proposed TAIG, the completeness axiom is needed. Thus, we briefly introduce integrated gradients and the completeness axiom below. Integrated gradients is a line integral of the gradients from a reference image to an input image . An integrated gradient of the i-th pixel of the input is defined as
where is the i-th pixel of . In this work, a black image is selected as the reference .
The completeness axiom states that the difference between and is equal to the sum of , i.e.,
To simplify the notations, both and are used to represent , and and are used to represent , when and are clear. The details of the other axioms and the properties of integrated gradients can be found in Sundararajan et al. (2017).
3.2 The Two Versions — TAIG-S and TAIG-R
We propose two versions of TAIG for untargeted attack. The first one based on the original integrated gradients performs the integration on a straight-line path. This version is named Transferable Attack based on Integrated Gradients on Straight-line Path (TAIG-S) and its attack equation is defined as
where the integrated gradients are computed from the label of , i.e., , and controls the step size.
The second version is named Transferable Attack based on Integrated Gradients on Random Piecewise Linear Path (TAIG-R). Let be a random piecewise linear path and be its turning points, including the starting point and the endpoint . The line segment from to is defined as , where . When computing integrated gradients of the line segment, is used as a reference and the corresponding integrated gradients can be computed by equation 1. The integrated gradients of the entire path are defined,
The integrated gradients computed from the random piecewise linear path is called random path integrated gradients (RIG). Note that RIG still fulfills the completeness axiom:
It should be highlighted that the integrated gradients computed from other paths also fulfill the completeness axiom (Sundararajan et al., 2017). In this paper, the turning points, in the random path are generated by
is a random vector following a uniform distribution with support from. The attack equation of TAIG-R is
which is the same as TAIG-S, except that in TAIG-S is replaced by . As with PGD and BIM, TAIG can be applied iteratively. The sign function is used in TAIG because the distance between and is measured by norm in this study.
3.3 Viewing TAIG from The Three Perspectives
In the TAIG attack equations i.e., equation 3 and equation 7, the integration of optimization, attention, and smoothing approaches is not obvious. This subsection explains TAIG from the perspective of optimization first, and followed by the perspectives of attention and smoothing. TAIG-S is used in the following discussion, as the discussion of TAIG-R is similar.
Using the completeness axiom, the minimization of can be written as
Since is independent of , it can be ignored. Taking gradient on both sides of equation 8, we have the sum of equal to
, which is the same as the gradients used in PGD and FGSM for white-box attack. For ReLU networks, it can be proven that
where the j-th element of is
The proof is given in Appendix A.1. Computing the derivative of respect to and using the definition of derivative,
where all the elements of are zero, except for the i-th element being one. Using the backward difference, it can be approximated as
where . According to the completeness axiom, if an adversarial example , whose , then , where
is a black image in this study. In other words, the network outputs of the adversarial example and the black image are the same, which implies that the adversarial example has a high probability to be misclassified. In equation12, represents the slope between and . and can be regarded as the integrated gradient of the i-th element of the current and the target adversarial example. To minimize equation 8, we seek an adversarial example, whose integrated gradients are zero. Thus, the target integrated gradient i.e., is set to zero such that it would not contribute to the network output. Setting to zero,
is obtained. Given that in equation 13 is positive and TAIG-S uses the sign of IG, we can draw the following conclusions: 1) can be used to approximate of ReLU networks; 2) the quality of this approximation depends on equation 12, meaning that and sign() are not necessarily very close. To maintain the sign of for the minimization, we choose backward difference instead of forward difference in equation 12. If forward difference is used in equation 12, the term in equation 3 would become and TAIG-S would maximize . More detailed explanation can be found in Appendix A.2.
Fig. 2 shows the distribution of normalized , where is norm. On average, 68% elements of and are the same, indicating that weakly approximates 222It should be highlighted that better approximation is not our goal. When is close to , it only implies that it is stronger for white-box attack.. The previous discussion is also applicable to RIG, because it also fulfills the completeness axiom. Fig. 2 also shows the distribution of normalized . On average, 58% elements of and are the same. Because can weakly approximate , TAIG-S can be used to perform white-box attack. As for how it can enhance the transferability in black-box attacks, we believe more insights can be gained by viewing TAIG from the perspectives of attention and smoothing.
Viewing from the perspective of attention, integrated gradients identify key features, which deep networks use in the prediction and allot higher integrated gradient values to them. If networks are trained to perform on the same classification task, they likely use similar key features. In equation 12, the target integrated gradient in the backward difference is set to zero. By modifying the input image through its integrated gradients, the key features are amended, and the transferability can be enhanced. To justify these arguments, Fig. 1 shows the integrated gradients of an original image from different networks and Fig. 3 shows the integrated gradients before and after TAIG-S and TAIG-R attacks. These figures indicate that 1) different models have similar integrated gradients for the same images and 2) TAIG-S and TAIG-R significantly modify the integrated gradients.
To avoid overfitting the surface of surrogate model, smoothing based on augmentation is commonly used. Equation 1 shows that TAIG-S employs intensity augmentation if the reference is a black image. Since , and TAIG-S only uses the sign function, the term in equation 1 can be ignored. Therefore, in equation 3 only depends on , whose discrete version is . This discrete version reveals that TAIG-S uses the sum of the gradients from intensity augmented images to perform the attack. Similarly, it can be observed from equation 1, equation 4 and equation 6 that TAIG-R applies both noise and intensity augmentation to perform the attack.
4 Experimental Results
In this section, we compare the performance of the proposed TAIG-S and TAIG-R with the state-of-the-art methods, including LinBP (Guo et al., 2020), SI (Lin et al., 2020), AOA (Chen et al., 2020) and VT (Wang and He, 2021). As LinBP had shown its superiority over methods, including SGM (Wu et al., 2020a), TAP (Zhou et al., 2018) and ILA (Huang et al., 2019), they are not included in the comparisons. The comparisons are done on both undefended and advanced defense models. To verify that IG and RIG could be a good replacement of standard gradient for transferable attacks, they are also evaluated on the combinations with different methods. In the last experiment, we use different surrogate models to demonstrate that TAIG is applicable to different models and has similar performances.
4.1 Experimental Setting
ResNet50 (He et al., 2016) is selected as a surrogate model to generate adversarial examples to compare with the state-of-the-art methods. InceptionV3 (Szegedy et al., 2016), DenseNet121 (Huang et al., 2017), MobileNetV2 (Sandler et al., 2018), SENet154 (Hu et al., 2018) and PNASNet-5-Large (Liu et al., 2018)
are selected to evaluate the transferability of the adversarial examples. For the sake of convenience, the names of the networks are shortened as ResNet, Inception, DenseNet, MobileNet, SENet, and PNASNet in the rest of the paper. The networks selected for the black-box attacks were invented after or almost at the same period of ResNet, and the architecture of PNASNet was found by neural searching. The experiments are conducted on ImageNet(Russakovsky et al., 2015) validation set. 5000 images are randomly selected from images that can be correctly classified by all the networks. As with the previous works (Guo et al., 2020; Lin et al., 2020), is used to measure the difference between benign image and the corresponding adversarial example. By following LinBP, the maximum allowable perturbations are set as . We also report the experimental results of in Table 6 in the appendix. The preprocessing procedure of the input varies with models. We follow the requirement to preprocess the adversarial examples before feeding them into the networks. For example, for SENet the adversarial examples are preprocessed by three steps: (1) resized to pixels, (2) center cropped to pixels and (3) normalized using and
. Attack success rate is used as an evaluation metric to compare all the methods. Thirty sampling points are used to estimate TAIG-S. For TAIG-R, the number of turning pointis set to 30 and is set equal to
. Because the line segment is short, each segment is estimated by one sampling point in TAIG-R. An ablation study about the number of sampling points is provided in the supplemental material Section A.6. All the experiments are performed on two NVIDIA GeForce RTX 3090 with the main code implemented using PyTorch.
4.2 Comparison with State-of-the-Arts
Usually, one method has several versions by using different back-end attacks such as FGSM, IFGSM, and MIFGSM. In the comparison, FGSM and IFGSM are selected as the back-end methods for one-step and multi-step attacks, respectively. In one-step attacks, the step size for all the methods is set to . In multi-step attacks, for LinBP, we keep the default setting where the number of iterations is 300 and the step size is 1/255 for all different . The linear backpropagation layer starts at the first residual block in the third convolution layer, which provides the best performance (Guo et al., 2020). TAIG-S and TAIG-R are run 20, 50, and 100 iterations with the same step size as LinBP for , respectively. For SI, the number of scale copies is set to 5, which is the same as the default setting. The default numbers of iterations of SI and AOA are 10 and 10-20 for and respectively. We found that more iterations would improve the performance of SI and AOA. Therefore, the numbers of iterations of SI and AOA for different
are kept the same as TAIG-S and TAIG-R. The authors of AOA method provide a public dataset named DAmageNet, which consists of adversarial examples generated by combining SI and AOA with ImageNet validation set as the source images. VGG19 was taken as the surrogate model andis set to 0.1. For comparison, TAIG-R is used to generate another set of adversarial examples with and VGG19 as the surrogate model. 5000 adversarial images generated by TAIG-R and 5000 images sampled from DAmageNet with the same source images are used in the evaluation. Table 1 lists the experimental results of untargeted multi-step attacks. It demonstrates that the proposed TAIG-S outperforms AOA and SI significantly but it is weaker than LinBP. The proposed TAIG-R outperforms all the state-of-the-art methods in all models, except for SENet. In terms of average attack success rate, TAIG-R achieves an average attack success rate of 52.82% under . Comparison results of the one-step attacks are given in Table 5 in the appendix, which also shows the superiority of TAIG-R. We also provide the results of LinBP and SI under the same number of gradient calculations as TAIG-S and TAIG-R in Table 8 in the appendix. VT (Wang and He, 2021) and Admix (Wang et al., 2021), which use NIFGSM (Lin et al., 2020) and MIFGSM (Dong et al., 2018) as the back-end methods are also compared and the experimental results are given in Table 11 in the appendix.
LinBP and SI are also examined on the target attack setting. AOA is excluded because it was not designed for target attacks. The experimental results show that TAIG-R can get an average attack success rate of 26% under , which is the best among all the examined methods. The second best is LinBP with an average attack success rate of 10.1%. Due to limited space, the full experimental results are given in Table 7 in the appendix.
4.3 Evaluation on advanced defense models
In addition to the undefended models, we further examine TAIG-S and TAIG-R on six advanced defense methods. By following SI, we include the top-3 defense methods in the NIPS competition333https://www.kaggle.com/c/nips-2017-defense-against-adversarial-attack. and three recently proposed defense methods in our evaluation. These methods are high level representation guided denoiser (HGD, rank-1) (Liao et al., 2018)
, random resizing and padding (R&P, rank-2)(Xie et al., 2018), the rank-3 submission444https://github.com/anlthms/nips-2017/tree/master/mmd. (MMD), feature distillation (FD) (Liu et al., 2019), purifying perturbations via image compression model (ComDefend) (Jia et al., 2019) and random smoothing (RS) (Cohen et al., 2019)). is set to 16/255, which is the same as the setting in the SI study (Lin et al., 2020), and ResNet is taken as the surrogate model in this evaluation. The results are listed in Table 2. The average attack success rates of TAIG-R is 70.82%, 25.61% higher than the second best – LinBP. It shows that TAIG-R outperforms all the state-of-the-art methods in attacking the advanced defense models. We also evaluate the performance of TAIG-S and TAIG-R on two advanced defense models based on adversarial image detection in Section A.5.
4.4 Combination with Other Methods
TAIG-S and TAIG-R can be used as back-end attacks, as IFGSM and M-IFGSM, by other methods to achieve higher transferability. Specifically, and in TAIG-S and TAIG-R can replace the standard gradients in the previous methods to produce stronger attacks. In this experiment, LinBP, DI, and ILA with different back-end attacks are investigated. LinBP and ILA use the same set of back-end attacks, including IFGSM, TAIG-S, and TAIG-R. For LinBP, the back-end attacks are used to seek adversarial examples, while for ILA, the back-end attacks are used to generate directional guides. For DI, and are used to replace the gradients in M-IFGSM and produce two new attacks. The two new attacks are named DI+MTAIG-S and DI+MTAIG-R. The numbers of iterations of all the back-end attacks are 20, 50, and 100 for respectively. As ILA and their back-end attacks are performed in a sequence, the number of iterations for ILA is fixed to 100, which is more effective than the default setting. And the intermediate output layer is set to be the third residual block in the second convolution layer in ResNet50 as suggested by (Guo et al., 2020). Table 4 lists the results. They indicate that TAIG-S and TAIG-R effectively enhance the transferability of other methods. As with other experiments, TAIG-R performs the best. The results of , are given in Table 13 in the appendix.
4.5 Evaluation on other surrogate models
To demonstrate that TAIG is also applicable to other surrogate models, we select SENet, VGG19, Inception, and DenseNet as the surrogate models and test TAIG on the same group of black-box models. Because TAIG-S and TAIG-R are almost the same, except for the paths of computing integrated gradients and TAIG-R outperforms TAIG-S in the previous experiments, we select TAIG-R as an example. In this experiment, we set to 16/255, and the results are reported in Table 3. It indicates that TAIG-R performs similarly on different surrogate models.
In this paper, we propose a new attack algorithm named Transferable Attack based on Integrated Gradients. It tightly integrates three common approaches in crafting adversarial examples to generate highly transferable adversarial examples. Two versions of the algorithm, one based on straight-line integral, TAIG-S, and the other based on random piecewise linear path integral, TAIG-R, are studied. Extensive experiments, including attacks on undefended and defended models, untargeted and targeted attacks and combinations with the previous methods, are conducted. The experimental results demonstrate the effectiveness of the proposed algorithm. In particular, TAIG-R outperforms all the state-of-the-art methods in all the settings.
This work is partially supported by the Ministry of Education, Singapore through Academic Research Fund Tier 1, RG73/21.
Hereby, we assure that the experimental results are reported accurately and honestly. The experimental settings including data splits, hyperparameters, and GPU resources used are clearly introduced in Section 4 and the source code will be shared after this paper is published. The ImageNet dataset and source code from AOA and LinBP attack are used in this paper and they are cited in Section 4.
- Towards evaluating the robustness of neural networks. In IEEE Symposium on Security and Privacy, Cited by: §1, §2.
- Universal adversarial attack on attention and the resulting dataset damagenet. IEEE Transactions on Pattern Analysis and Machine Intelligence. Cited by: §2, §4.
- Improving black-box adversarial attacks with a transfer-based prior. In Advances in Neural Information Processing Systems, Cited by: §1.
Certified adversarial robustness via randomized smoothing.
International Conference on Machine Learning, Cited by: §4.3.
- Boosting adversarial attacks with momentum. In , Cited by: §2, §4.2.
- Evading defenses to transferable adversarial examples by translation-invariant attacks. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Cited by: §2.
- Explaining and harnessing adversarial examples. In International Conference on Learning Representations, Cited by: §1, §1, §2.
- Backpropagating linearly improves transferability of adversarial examples.. In Advances in Neural Information Processing Systems, Cited by: §2, §4.1, §4.2, §4.4, §4.
- Subspace attack: exploiting promising subspaces for query-efficient black-box attacks. In Advances in Neural Information Processing Systems, Cited by: §1.
- Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, Cited by: §4.1.
- Squeeze-and-excitation networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Cited by: §4.1.
- A new defense against adversarial images: turning a weakness into a strength. In Advances in Neural Information Processing Systems, Cited by: §A.5.
- Densely connected convolutional networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, Cited by: §4.1.
- Enhancing adversarial example transferability with an intermediate level attack. In Proceedings of the IEEE International Conference on Computer Vision, Cited by: §2, §4.
- Black-box adversarial attack with transferable model-based embedding. In International Conference on Learning Representations, Cited by: §1.
- Black-box adversarial attacks with limited queries and information. In International Conference on Machine Learning, Cited by: §1.
- Prior convictions: black-box adversarial attacks with bandits and priors. In International Conference on Learning Representations, Cited by: §1.
- Feature space perturbations yield more transferable adversarial examples. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Cited by: §2.
Explaining convolutional neural networks using softmax gradient layer-wise relevance propagation. Cited by: §2.
- ComDefend: an efficient image compression model to defend adversarial examples. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Cited by: §4.3.
- Adversarial attacks and defences competition. In The NIPS’17 Competition: Building Intelligent Systems, Cited by: §1.
- Adversarial examples in the physical world.. CoRR. Cited by: §2.
- Defense against adversarial attacks using high-level representation guided denoiser. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Cited by: §4.3.
- Nesterov accelerated gradient and scale invariance for adversarial attacks. In International Conference on Learning Representations, Cited by: §2, §4.1, §4.2, §4.3, §4.
- Progressive neural architecture search. In Proceedings of the European Conference on Computer Vision, Cited by: §4.1.
- Delving into transferable adversarial examples and black-box attacks. In Proceedings of International Conference on Learning Representations, Cited by: §1.
- Feature distillation: dnn-oriented jpeg compression against adversarial examples. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Cited by: §4.3.
- Towards deep learning models resistant to adversarial attacks. In International Conference on Learning Representations, Cited by: §1, §2.
- The limitations of deep learning in adversarial settings. IEEE European Symposium on Security and Privacy. Cited by: §2.
- Practical black-box attacks against machine learning. In Proceedings of the 2017 ACM on Asia conference on computer and communications security, Cited by: §1.
- Transferability in machine learning: from phenomena to black-box attacks using adversarial samples. arXiv preprint arXiv:1605.07277. Cited by: §1.
- Imagenet large scale visual recognition challenge. International journal of computer vision. Cited by: §4.1.
- MobileNetV2: inverted residuals and linear bottlenecks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Cited by: §4.1.
- Image information and visual quality. IEEE Transactions on Image Processing 15 (2), pp. 430–444. External Links: Cited by: §A.7.
- Axiomatic attribution for deep networks. In International Conference on Machine Learning, Cited by: §A.6, §1, §3.1, §3.1, §3.2.
- Rethinking the inception architecture for computer vision. In Proceedings of the IEEE conference on computer vision and pattern recognition, Cited by: §4.1.
- Intriguing properties of neural networks. In International Conference on Learning Representations, Cited by: §1.
AutoZOOM: autoencoder-based zeroth order optimization method for attacking black-box neural networks. In
AAAI Conference on Artificial Intelligence, Cited by: §1.
- Enhancing the transferability of adversarial attacks through variance tuning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Cited by: §A.3.3, §2, §4.2, §4.
- Admix: enhancing the transferability of adversarial attacks. In Proceedings of the IEEE International Conference on Computer Vision, Cited by: §A.3.3, §2, §4.2.
- Multiscale structural similarity for image quality assessment. In The Thrity-Seventh Asilomar Conference on Signals, Systems Computers, 2003, Vol. 2, pp. 1398–1402 Vol.2. External Links: Cited by: §A.7.
- Image quality assessment: from error visibility to structural similarity. IEEE Transactions on Image Processing 13 (4), pp. 600–612. External Links: Cited by: §A.7.
- A universal image quality index. IEEE Signal Processing Letters 9 (3), pp. 81–84. External Links: Cited by: §A.7.
- Smoothed geometry for robust attribution. Advances in Neural Information Processing Systems. Cited by: footnote 5.
- Skip connections matter: on the transferability of adversarial examples generated with resnets. In International Conference on Learning Representations, Cited by: §2, §4.
- Towards understanding and improving the transferability of adversarial examples in deep neural networks. In Asian Conference on Machine Learning, Cited by: §2.
- Boosting the transferability of adversarial samples via attention. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Cited by: §2.
- Mitigating adversarial effects through randomization. In International Conference on Learning Representations, Cited by: §4.3.
- Improving transferability of adversarial examples with input diversity. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Cited by: §2.
- Feature squeezing: detecting adversarial examples in deep neural networks. In 25th Annual Network and Distributed System Security Symposium, NDSS 2018, San Diego, California, USA, February 18-21, 2018, Cited by: §A.5.
- Transferable adversarial perturbations. In Proceedings of the European Conference on Computer Vision, Cited by: §2, §4.
Appendix A Appendix
a.1 The proof of equation 10
For ReLU networks555The term ReLU network is to refer to the networks with second-order derivatives being zero (Wang et al., 2020) due to their computational unit, , where is a network parameter and is the input from a previous layer., it can be proven that the j-th element of is:
Considering . Using the product rule, we have
; otherwise,. Thus, the first term becomes
in the second term is zero because of the ReLU functions in the network. Thus,
Since , equation 14 can be written as
a.2 Why use backward difference
Some may question why backward difference is used in equation 12, instead of forward difference i.e., . If forward difference is used and the target integrated gradient, i.e., in forward difference is set to zero,
where . The attack equation would become
Considering the discrete vector form of ,
where is the Hadamard product also known as element-wise product, it is noted that when , which is the discrete version of in equation 1, is close to one, can approximate . If we only use , whose is close to one to compute equation 17, equation 17 is an approximation of gradient ascent. On the contrary, if we only use , whose is close to one to compute the integrated gradients in the proposed attack equation i.e., equation 3, equation 3 is in fact an approximation of gradient decent. Since we would like to minimize , backward difference is applied to equation 12 and TAIG-S in equation 3 performs an approximation of gradient decent.
a.3 Comparison with State-of-the-Arts
a.3.1 More result on the main comparisons
In this section, we provide more experimental results about the comparisons on LinBP, SI and AOA omitted in the main paper. The experimental results of TAIG-S and TAIG-R for untargeted attack under running with 20, 50 and 100 iterations are given in Table 6. The results of different methods using FGSM as the back-end method for one-step attacks are summarized in Table 5. Table 7 lists the experimental results of different methods on the target attack setting. These results highlight the effectiveness of the proposed methods, in particular, TAIG-R.
The comparison experiments between LinBP, SI, TAIG-S, and TAIG-R are also conducted under different with the same number of gradient calculations, and the results are listed in Table 8. For AOA, the calculation of SGLPR is too slow, and we find that the average attack success rate of AOA only changes from 32.07 (100 iterations) to 32.19 (300 iterations) under . As the result of AOA is not as good as LinBP and SI, we do not run it for more iterations. Table 8 shows that more number of gradient calculations will slightly improve the performance of SI and LinBP, but they are still not as good as TAIG-R.
a.3.2 Comparison with LinBP on VGG19
Section 4.2 shows that LinBP performs the second-best among all the methods. Therefore, we compare TAIG-R with LinBP using VGG19 as a surrogate model under . LinBP is sensitive to the choice of the position from where the network is modified. To find the best position to start the model modification, we test all the possible position in VGG19 separately and generate adversarial examples from these 16 different positions with 100 iterations. Then we use the six black-box models to compare their performance. Table 9 gives the experimental results. We select SENet and ResNet as two examples and show how the attack success rate of LinBP varies with the choice of the position in VGG-19 in Fig. 4. Based on Table 9, we select the two modified positions with the best performance and use them to generate adversarial examples with 3000 gradient calculations. The results are listed in Table 10.
a.3.3 Comparison with VT and Admix
In this section, we compare TAIG-R with Admix (Wang et al., 2021) and VT (Wang and He, 2021), which use other back-end methods. Admix uses MIFGSM as the basic iteration method (MI-Admix) and VT has two different versions NI-VT and MI-VT, which use MIFGSM and NIFGSM as their back-end methods respectively. VT and Admix both use InceptionV3 as the surrogate model in their study and they also provided 1000 images used in their evaluation. was set to 16/255 in their experiment. Thus, we run TAIG-R on InceptionV3 on the same 1000 images they used under the same value of . All the methods are run with 3000 gradient calculations. Table 11 lists the attack success rate of these methods on different models and shows that TAIG-R outperforms MI-Admix, MI-VT, and NI-VT in all the models. NI-VT is slightly better than MI-VT. As NI-VT performs the best, we further run NI-VT on the 5000 images used in our evaluation with ResNet50 as a surrogate model under different and the results are listed in Table 12.
a.4 Combinations with other methods
In Section 4.3, LinBP, DI, and ILA with different back-end attacks are investigated under . In this section, we provide the experimental results of the same setting under . The results are given in Table 13. It demonstrates that TAIG-R and TAIG-S improve the transferability of other methods. Besides, the same combinations are also examined in the target setting and the experimental results are listed in Table 14. It demonstrates that TAIG-S and TAIG-R also improve the transferability of different methods in target attacks. However, the combinations of TAIG-S and TAIG-R with other methods do not consistently improve the performance of TAIG-S and TAIG-R, which is different from the observations in untargeted attacks. Through the comparison of Table 7 and Table 14, it is noted that the transferability of TAIG-R would be harmed when combined with other methods in the target setting. And the performance of TAIG-S will decrease when combined with ILA in the target setting.
a.5 Evaluation on Advanced Defense model based on detection
In this section, we evaluate the performance of TAIG-S and TAIG-R on two advanced defense models based on adversarial image detection. These two methods are Feature squeezing (FS (Xu et al., 2018)) and Turning a Weakness into a Strength (TWS Hu et al. (2019)). We use their default settings and examine their detection rate on the different methods. For each of the methods, 5000 adversarial images generated under with 3000 gradient calculations are used in this evaluation. The detection rate and the attack success rate after detection (ASRD) are listed in Table 15. The ASRD is defined as the number of successful attacks over 5000. The detection rates of both detectors on AOA are low, but the ASRD of AOA is the lowest one. Table 15 shows that TAIG-R has the highest ASRD among all the methods.
|Method||Detection rate||ASRD||Detection rate||ASRD|
a.6 Ablation Study
In this section, we investigate the influence of the number of sampling points. In this experiment, is set to 8/255 and 1000 images are sampled from the 5000 images used in the evaluation. (Sundararajan et al., 2017) pointed out that the sampling points between 20 to 300 are enough to approximate the integral. Following Sundararajan et al.’s suggestion, the number of sampling points for TAIG-S starts from 20 with an increment of 10 in each of the tests. We find that with the increase of the number of sampling points, the attack success rate of TAIG-S only slightly fluctuates. Thus, we stop it at 70 sampling points and the results are listed in Table 16. For TAIG-R, we follow the same setting. The number of turning points also starts from 20 with an increment of 10 in each of the tests. And each of the segments in the path is estimated by one sampling point. We find that the attack success rate is slightly improved when the number of turning points increases. But the improvement becomes slow when the number of turning points is larger than 50. The results are listed in Table 17. To study the influence of the number of sampling points in each line segment, we fix the turning points to 20 and change from 1 to 5. With the increase of sampling points, the success attack success rate has slight improvements. The results are listed in Table 18.
a.7 Perceptual Evaluation
In this section, we provide the perceptual evaluation of different methods on seven full reference objective quality metrics, namely: Root Mean Square Error (RMSE),
distance, Peak-Signal-to-Noise-Ratio (PSNR), Structural Similarity Index (SSIM)(Wang et al., 2004), Visual Information Fidelity (VIFP) (Sheikh and Bovik, 2006), Multi-Scale SSIM index (MS-SSIM) (Wang et al., 2003), Universal Quality Index (UQI) (Wang and Bovik, 2002). 1000 images are used in this evaluation. The results are listed in Table 19. It shows that these methods perform similarly and TAIG-S is slightly better than the others.
a.8 Visualization of IG and RIG
In this section, we provide more IG and RIG of different images. Fig. 5 is original images used in the visualization. Fig. 6 and Fig. 7 are the IG and RIG of Fig. 5 and Fig. 5 from different networks. Fig. 8 and Fig. 9 are the IG and RIG of Fig. 5 and Fig. 5 before and after attacks.