1 Introduction
With the rapid growth in network depth and width as well as the improvement in network structure and topology, the use of deep neural networks (DNNs) has been successfully extended to many applications. For instance, the stateoftheart DNNs can achieve extremely high accuracy in image classification, which could be even higher than the level humans can reach.
However, the latest studies szegedy2013intriguing revealed the high vulnerability of neural network models to adversarial attacks: adding a carefully designed, small perturbation to an image could result in a rapid decrease in classification confidence or even misclassification of a welltrained network, even though the perturbation is too small to be distinguished by humans. Such images with small perturbations, namely adversarial samples
, raise a severe security threat for deep learning technology. Extensive research studies have been carried out in adversarial attacks to explore the vulnerability of neural networks as well as in defense techniques to protect the systems and applications.
Among many attack methods proposed in recent years, the fast gradient sign method (FSGM) goodfellow2014explaining and its iterative variation kurakin2016adversarial
have drawn significant attention. The methods tend to exploit the gradients of classification loss to craft adversarial images. The principle is pretty straightforward as gradients are correlated to the direction, a change along which potentially influences the classification in the most significant way. The methods, however, treat every pixel the same, that is, the magnitude of introduced perturbation is exactly the same for all the pixels. This often makes the adversarial samples more noticeable since humans are much more sensitive to the difference in low variance area
liu2010just , e.g., the background of an image. Compared to the original images, the adversaries generated by FGSM could be a lot blurred. Examples include Figure 2(d) in goodfellow2014explaining and Figure 4 in kurakin2016adversarialphysical .CW attack carlini2016towards
is another widely adopted adversarial attack method. As an optimizationbased approach, CW attack first defines an objective or a loss function, and then searches for the optimal perturbation while maintaining the distortion small during the procedure. As such, CW could produce adversaries with much smaller and more imperceptible distortion. The performance, however, comes at the cost of speed. For instance,
carlini2016towards performed 200,000 optimization iterations for every image when evaluating their method. Such a high computation cost is not practical, especially considering the fast growth in dataset size. In addition, the implementation of CW attack can sometimes be pretty tricky and requires careful parameter selection to obtain the desirable data reported in the paper. This is indeed a common scenario in optimization based methods.Our work aims at a strong and effective adversarial attack. We propose to leverage the gradient information in the search of adversarial examples. Inspired by the FGSM concept, the gradients are used as a guidance in calculating the perturbation. Unlike FGSM that utilizes a universal magnitude to all the pixels, however, our approach will assign each pixel with its own magnitude of perturbation. The magnitude optimization uses the same loss function as CW, which corresponds to the general expectation in crafting adversarial attacks. In a nutshell, our method constrains the perturbation with the gradients, which in turn reduces the search space of adversarial examples.
In the work, we intuitively prove the rationality of our method in reducing the search space. Empirically, our experimental results show that our method can reach a higher attack successful rate while applying smaller distortion and requiring much fewer iterations than the original CW attack method carlini2016towards . The effectiveness of our method is further demonstrated through the comparison with IFGSM kurakin2016adversarial and LBFGS szegedy2013intriguing .
2 The Proposed Method
As aforementioned, although FGSM and IFGSM can quickly use gradients to generate the perturbations, the large magnitude of the universal perturbations applied on each pixel could make such perturbations easily perceptible to human eyes. CW attack, which optimizes the perturbation generation process using a welldefined loss function, can solve this issue by exploring the adversarial images with small perturbations. However, the computational cost of CW attack can be huge since a large number of iterations need to be performed to minimize the loss and adjust the constant . This fact greatly hinders applications of such optimizationbased attacks in processing a large dataset.
Thus, we propose to incorporate gradient information in the optimization process of CW attack to guide the search process and reduce the search time. We use to denote the perturbation, i.e., . While CW updates during each iteration, we set (the multiplication here is elementwise), and use Adam to update instead. Each element of controls the perturbation magnitude for the corresponding pixel, and we initialize all elements in with . is the gradient of hinge loss w.r.t . This hinge loss corresponds to the goal of causing the network to classify the input as a target class, so its gradients could tell us how we should perturb the legitimate image. Normalized gradient is taken here so that the norm of is independent from the scale of . Note that although will not strictly follow the direction of or move along that straight line like the way does, this technique still works as shown by experimental results. The pseudocode is shown in Algorithm 1. The is designed for highconfidence adversaries, which are more likely to transfer to another model, since they usually needs greater perturbation.
2.1 Intuitive Explanation
We first illustrate the principle of our method in a simple scenario. We create a synthetic dataset which consists of data points lying on the 2D plane. As shown in Figure 1, the dataset has two classes. Each of them has 1,000 points for training and 200 for testing. It is a naive classification problem whose ideal decision boundary is , where represents each point.
We train a small multilayer perceptron as the classifier, which reaches
accuracy on both training and testing data. We then apply CW and our method to find the adversary for a randomly picked point. The intermediate and final result of the search process is presented in Figure 1. By constraining the perturbation with gradients, our method actually reduces the search space. In this simple scenario where the decision boundary is a straight line, the fastest way to find an adversary is to move along the direction of the gradient, which is perpendicular to the decision boundary. That is exactly what our method’s search trajectory looks like. Also note that the last several steps of CW attack directly fall into the search space of ours. The fact indicates the reduction of search space resulted by our method doesn’t exclude the optimal results CW attack tries to find, which in turn proves the rationality of our method. Even though this is too simple an example for the much more complex image recognition tasks and neural networks, it still offers some intuitive thoughts. The effectiveness of our method will be further demonstrated by experimental results.3 Experimental Evaluations
Dataset. We perform our experiments on CIFAR10 krizhevsky2014cifar
and ImageNet
imagenet_cvpr09 . For CIFAR10, we pick the first 1,000 test images when applying attacks. ImageNet is a largescale dataset which has 1,000 classes, and we randomly choose 500 validation images from 500 different classes to evaluate our method. The target label of the attack is also randomly chosen while being ensured to vary from the true label.Model Topology. For CIFAR10, we choose the same CNN used in carlini2016towards , and it reaches the accuracy of
on test images. For ImageNet, we use the pretrained Inception v3 from Tensorflow
tensorflow2015whitepaper .Baseline. We compare our method with CW attack, LBFGS, and IFGSM. We use the version of IFGSM to ensure comparisons are made under the same distance metric. Also the classification loss for IFGSM and LBFGS is changed to the aforementioned hinge loss for direct comparison. We adopt the implementation in Cleverhans papernot2018cleverhans for these baselines.
Parameter Setting. For IFGSM, we gradually increase the maximum allowed perturbation magnitude and stop when it successfully reaches adversaries for all test images. For LBFGS and CW, we first fix max iterations to 100, which specifies the maximum number of iterations the attack can update its perturbation under a certain constant . Then we tune their parameters, e.g. the learning rate for CW attack, to make them first reach high attack success rate within relative few iterations, since the attack’s success should be of the highest priority. This tuning process is based either on the empirical results reported in original papers or on our own experimental observations. When implementing our own method, we empirically assign values to the parameters, also trying to guarantee successful attacks with only a few iterations.
3.1 Result Analysis
When running experiments, we mainly focus on two aspects: the attack success rate and the distance. An adversary is called a success if the neural network indeed classifies it as the target class. The average distance between the legitimate image and its adversarial counterpart is calculated to show how much distortion the algorithm introduced. When the success rate is not 100, we only take the average of successes. We also pay great attention to the average total number of update iterations of CW and our method. For both methods, each iteration corresponds to a single step made by Adam optimizer. We also apply the same abort early technique in the implementation of our method as CW, i.e., when no improvement is gained, both algorithms will abort early to avoid the meaningless search. Thus, the number of iterations could indicate the amount of the computation taken by the algorithms. However, since IFGSM just computes the gradient every iteration without any optimization, and LBFGS performs line search within each of its socalled iteration, the computational cost behind each step is not comparable to our method or CW attack. Therefore, there is no point or is difficult to conduct a fair comparison between them in this aspect.
0  5  10  15  20  25  

prob  dist  prob  dist  prob  dist  prob  dist  prob  dist  prob  dist  
Our1  100  0.856  100  1.008  100  1.146  100  1.280  100  1.415  100  1.547 
CW1  100  3.220  100  3.609  100  3.937  100  4.225  100  4.475  100  4.708 
CW3  100  2.459  100  2.685  100  2.888  100  3.109  100  3.335  100  3.531 
CW6  100  1.876  100  1.977  100  2.032  100  2.058  100  2.089  100  2.096 
LBFGS2  100  0.941  100  1.152  100  1.306  100  1.533  100  1.715  100  1.885 
IFGSM  100  1.409  100  1.531  100  1.625  100  1.704  100  1.754  100  1.794 
0  5  10  15  20  25  30  

Our1  28  28  30  34  38  42  47 
CW1  27  25  23  22  22  21  21 
CW3  109  108  108  106  103  100  98 
CW6  197  200  207  213  219  223  228 
prob  dist  iter  

Our1  100  1.083  52 
CW1  99.6  4.978  36 
CW3  100  3.030  116 
CW6  100  1.813  216 
CW10  100  1.497  312 
LBFGS2  95.8  2.000  / 
IFGSM  100  1.625  / 
For CIFAR10, we generate adversaries when confidence is in the set . The results are shown in Table 1 and 2. Note that our method reaches the smallest distortion while successfully attacking all test images. While Our1 only spends of the iterations of CW3, it reaches the distortion which is only of that of CW3. Besides, even though CW6 searches for more iterations than Our1, its perturbation is still greater than ours. Compared with LBFGS and IFGSM, our method could still produce superior results.
For ImageNet, we only present the results of adversaries with confidence equals to . As shown in Table 3, Our1 successfully finds all adversaries and meanwhile introduces the least amount of perturbation. On the contrary, CW attack doesn’t produce comparable results even with of our iterations. IFGSM and LBFGS2 also lead to inferior results than ours, meanwhile the latter one takes more computational cost. Thus, it is proved that our method maintains its effectiveness on the largescale dataset.
4 Conclusion
Adversarial attack has recently drawn significant attention in the deep learning community. The gradientbased adversarial example crafting scheme, i.e, FGSM and its variants, often introduce visually perceptible perturbation on the adversarial examples. CW attack, as an optimizationbased scheme, solves the above problem by searching for the adversaries with small distortion. However, the incurred high computational cost could be intolerant in real applications. In this work, we propose to leverage the gradient information in the optimization process of crafting adversaries by including it into the perturbation part. We illustrate that our proposed method can reduce the search space of the adversarial examples and thus leads to fewer iterations of the search. Experimental results show that compared to other tested methods, our method can also achieve a higher attack efficiency and a smaller perturbation or fewer iterations.
References
 [1] Christian Szegedy, Wojciech Zaremba, Ilya Sutskever, Joan Bruna, Dumitru Erhan, Ian Goodfellow, and Rob Fergus. Intriguing properties of neural networks. arXiv preprint arXiv:1312.6199, 2013.
 [2] Ian J Goodfellow, Jonathon Shlens, and Christian Szegedy. Explaining and harnessing adversarial examples. arXiv preprint arXiv:1412.6572, 2014.
 [3] Alexey Kurakin, Ian Goodfellow, and Samy Bengio. Adversarial machine learning at scale. arXiv preprint arXiv:1611.01236, 2016.
 [4] Anmin Liu, Weisi Lin, Manoranjan Paul, Chenwei Deng, and Fan Zhang. Just noticeable difference for images with decomposition model for separating edge and textured regions. IEEE Transactions on Circuits and Systems for Video Technology, 20(11):1648–1652, 2010.
 [5] Alexey Kurakin, Ian Goodfellow, and Samy Bengio. Adversarial examples in the physical world. arXiv preprint arXiv:1607.02533, 2016.
 [6] Nicholas Carlini and David Wagner. Towards evaluating the robustness of neural networks. arXiv preprint arXiv:1608.04644, 2016.
 [7] Alex Krizhevsky, Vinod Nair, and Geoffrey Hinton. The cifar10 dataset. online: http://www. cs. toronto. edu/kriz/cifar. html, 2014.
 [8] J. Deng, W. Dong, R. Socher, L.J. Li, K. Li, and L. FeiFei. ImageNet: A LargeScale Hierarchical Image Database. In CVPR09, 2009.

[9]
Martín Abadi, Ashish Agarwal, Paul Barham, Eugene Brevdo, Zhifeng Chen,
Craig Citro, Greg S. Corrado, Andy Davis, Jeffrey Dean, Matthieu Devin,
Sanjay Ghemawat, Ian Goodfellow, Andrew Harp, Geoffrey Irving, Michael Isard,
Yangqing Jia, Rafal Jozefowicz, Lukasz Kaiser, Manjunath Kudlur, Josh
Levenberg, Dandelion Mané, Rajat Monga, Sherry Moore, Derek Murray, Chris
Olah, Mike Schuster, Jonathon Shlens, Benoit Steiner, Ilya Sutskever, Kunal
Talwar, Paul Tucker, Vincent Vanhoucke, Vijay Vasudevan, Fernanda Viégas,
Oriol Vinyals, Pete Warden, Martin Wattenberg, Martin Wicke, Yuan Yu, and
Xiaoqiang Zheng.
TensorFlow: Largescale machine learning on heterogeneous systems, 2015.
Software available from tensorflow.org.  [10] Nicolas Papernot, Fartash Faghri, Nicholas Carlini, Ian Goodfellow, Reuben Feinman, Alexey Kurakin, Cihang Xie, Yash Sharma, Tom Brown, Aurko Roy, Alexander Matyasko, Vahid Behzadan, Karen Hambardzumyan, Zhishuai Zhang, YiLin Juang, Zhi Li, Ryan Sheatsley, Abhibhav Garg, Jonathan Uesato, Willi Gierke, Yinpeng Dong, David Berthelot, Paul Hendricks, Jonas Rauber, and Rujun Long. Technical report on the cleverhans v2.1.0 adversarial examples library. arXiv preprint arXiv:1610.00768, 2018.
Comments
There are no comments yet.