Deep Neural Networks(DNNs) [1, 2, 3, 4, 5] have led to a dramatic improvement on image, text and audio tasks recently. With research going specific domain, many research works reveal that in image classification domain, deep neural classification models deployed are easily fooled by adversarial examples[14, 15, 16, 17, 18, 19, 20]. By adding appropriate perturbations into clean images. Perturbations should be in enough strength level so that perturbation can make origin images over decision boundary[17, 19, 20] between different classes, and make generated adversarial examples representation in latent space approaching the other classes space and having more corresponding features to fool deep neural models.
In gradient updating aspect, we can classify adversarial attack methods on image classification, single-step attack, and iterative-steps attack . From models’ structure perspective of view, if get understanding of deployed networks’ structure and parameters, attack methods is categorized into white-box attack [14, 19], on the opposite side, it is categorized into black-box attack [23, 24]. From attack oriented perspective, attacks can be classified into targeted or non-targeted attack methods[8, 25], targeted attack aims to find perturbation which can fool models with define label as wish, and non-targeted attack tries to find perturbation that fool model with the other labels without any definition.
Many proposed attack methods can fool DNNs with prediction confidence and attack deep neural networks in image classifications [3, 26]. We find that most of these methods craft adversarial examples with large perturbation and not very high attack success rates. In this paper we proposed adaptive perturbation generated algorithm which can craft adversarial cases that fool deep neural networks with slight perturbation and misclassify with almost high confidence.
We summarize contributions in this paper follows:
We propose algorithms for generating adversarial perturbation with adaptive gradient which computes adversarial perturbations in slight and small size level that fool deep neural networks deployed with high fooling probability, and give a formulation which aims to calculate adversarial perturbation strength. By using this algorithm, we can qualify the perturbation strength between clean image and corresponding adversarial examples and it to reflect pixel modification strength between clean images and corresponding examples.
2 Related Work
In this section, we first give the background knowledge of adversarial examples and corresponding formalization of proposed algorithms. In the first place, let define as a distribution of images in which means images from a distribution with dimensions, and then is an image from , at the same time, let denotes a classification function which is to make the input data(in this paper, we define images as input data)
output with an estimated label. In adversarial perturbation seeking process, it finds a perturbation tensor by adding the tensor we gain to origin image, this generated image can fool the deep neural models to make mistake classification of from correct label to other labels , generally we can describe this process as follow:
2.1 Methods for Generating Adversarial Examples and Adversarial Defense
The iterative-steps gradient-based algorithm(I-FGSM) [22, 16] is learning models parameters well than FGSM due to updating gradient direction step by step well so that I-FGSM can generate perturbation with good transferability and high attack success rates for models under white-box setting.
Carlini and Wagner proposed a targeted attack method called attack, which generate adversarial examples reduce detecting defenses rates. It is shown:
the parameter is norm constraint which is set to 0, 1, 2 or . Moosavidezfooli . seek adversarial examples which fool one deep neural model with only one perturbation.
In the equation, is set to control adversarial examples attack success rate. Besides, Meeting boundary distance between origin images and relevant adversarial examples under norm limitation setting , which is expressed in the following:
In this equation, the method tries to optimize boundary distance between the origin and adversarial examples. In , it introduce a simple iterative method(JSMA) for targeted attack.
Papernot and Fawzi [11, 31] think that by injecting adversarial examples to training datasets, the deployed deep neural models’ robustness is increased, [24, 32] proposed adversarial ensemble methods to improve defense abilities, as a result, the ensemble adversarial models perform well against gradient-based and black-box attacks strategy. Wenlin . illustrate Feature Squeezing method which reduces the search space available to an adversarial example by coalescing samples that correspond to different feature vectors in the original space into a single sample.
Most adversarial attack defenses are effective for attacks and can reduce attack success rates. However, adversarial examples are generated with low distortion and high similarity with origin images, and it is difficult to defend methods, corresponding defenses are urgently updated in these domains.
We describe our proposed methods with a slight adaptive gradient which contributes to the gradient direction directly and make adversarial examples generated fool deep neural models well. Before display our methods in a specific way, let define = ,, be a set of image from a distribution , be the classifier of deep neural models, besides we use norm constraint mentioned above to up attack success rate and limit pixel in image changing size.
3.1 Adversarial Example Generation with Adaptive Gradient Refinement
Updating gradient in direction with stable size may cause trapping into local minima, because steady gradient size lead gradient updating over maxima data point and miss the deepest valley point in models, in seeking global best point process, gradient updating with stable size seek minimal position between boundaries which may encounter back and forth shocks situation, and escape global position. This phenomenon mentioned above is more familiar with FGSM. Research works reveal that updating gradient with adaptive pace escape local maxima and poor maxima situation because of its adjusting mechanism. AdaGrad (for adaptive gradient algorithm) is a modified stochastic gradient descent with per-parameter learning rate. Examples of such applications include natural language processing and image recognition. It still has a base learning rate, but this is multiplied with the elements of a vector which is the diagonal of the outer product matrix. We formulate as follows:
the is the gradient at time , we give it to , so in adaptive gradient mechanism, this model’s parameter is updated after every iteration, see in follows:
It reveals that parameters update relies on high-parameter and the denominator , once the grows high, the parameter will update with a low rate, so as escape the local minima or maxima when searching directions. We calculate as original adaptive pace by:
We calculate with sign function for the square of loss, according to adaptive gradient processing, we draw a figure in the following. is the key adaptive idea for our methods, using iterative fast gradient sign method strategy, adjust the pace of generating by the next equation,
This method adapts the learning rate to the parameters, performing smaller updates for parameters associated with frequently occurring features, and substantial updates for parameters associated with infrequent features. Process of generating adversarial examples.
As we can see from algorithm 1, we focus on the output vector which is generated by Eq.11, then updating the perturbation tensor, after iterations in the direction of generated perturbation, the classifier of models seek right perturbation tensor. We use cosine similarity to evaluate the difference between original images and adversarial images.
4 Experimental Results
We conduct our experiments on ImageNet datasets to validate the effectiveness of our proposed method in this section with attack setting in the following part, experiments settings are kept the same setup both inand norm constraints. We show attack success rates, cosine similarity and perturbation strength with our proposed method on preprocessed ILSVRC2012(Val) datasets.
Our proposed method generate adversarial examples which has high cosine similarity because the adaptive gradient based approach reduced when approaching the decision boundary during the perturbation calculation. This visually intuitive performance is that under the same constraints, the adversarial examples produced by our method are less perturbative than the adversarial images generated by I-FGSM or MI-FGSM, and the adversarial images are more clear and more difficult to recognize as processed images. Our proposed method can achieve high attack success under the white box attack strategy, and it is better than the I-FGSM or MI-FGSM methods under the black box attack condition, which means that the proposed method can express better network transferability, especially network models are similar in structure, the transferability is better. e.g. Inception-v3 and Inception-v4.
We use the average of the Absolute Mean Perturbation values:(AMP) = as a measure of the magnitude of the disturbance, which is a representation of the scope of the value of the disturbance added to the pixels of the clean image.
We describe an adversarial examples generation attack method which generate perturbation with adaptive pace gaping the boundary distance between different classes representation in latent space; we take two norm constraints into consideration with the no-targeted and targeted attack strategy. Our proposed methods generate adversarial examples which fool the deep neural networks with high probability and generated perturbations show good transferability on different deep neural models which has a good effect in the black-box attack setup, and having high similarity with original images which is directly reflected on the AMP. Next, we will focus our attention on the image-independent adversarial attack.
-  Christian Szegedy, Wojciech Zaremba, Ilya Sutskever, Joan Bruna, Dumitru Erhan, Ian Goodfellow, and Rob Fergus. Intriguing properties of neural networks (2013). arXiv preprint arXiv:1312.6199, 594:595, 2014.
-  Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014.
Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton.
Imagenet classification with deep convolutional neural networks.In International Conference on Neural Information Processing Systems, pages 1097–1105, 2012.
-  Ian J. Goodfellow, Jean Pougetabadie, Mehdi Mirza, Bing Xu, David Wardefarley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. Generative adversarial networks. Advances in Neural Information Processing Systems, 3:2672–2680, 2014.
-  Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Identity mappings in deep residual networks. In European conference on computer vision, pages 630–645. Springer, 2016.
-  Yinpeng Dong, Hang Su, Jun Zhu, and Fan Bao. Towards interpretable deep neural networks by leveraging adversarial examples. arXiv preprint arXiv:1708.05493, 2017.
-  Yann Lecun, Yoshua Bengio, and Geoffrey Hinton. Deep learning. Nature, 521(7553):436, 2015.
-  Kathrin Grosse, Praveen Manoharan, Nicolas Papernot, Michael Backes, and Patrick McDaniel. On the (statistical) detection of adversarial examples. arXiv preprint arXiv:1702.06280, 2017.
-  Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu. Bleu: a method for automatic evaluation of machine translation. In Proceedings of the 40th annual meeting on association for computational linguistics, pages 311–318. Association for Computational Linguistics, 2002.
-  Edward Marcus Batchelder and R Pito Salas. Text abstraction method and apparatus, November 25 1997. US Patent 5,691,708.
-  Nicolas Papernot, Patrick McDaniel, Xi Wu, Somesh Jha, and Ananthram Swami. Distillation as a defense to adversarial perturbations against deep neural networks. In 2016 IEEE Symposium on Security and Privacy (SP), pages 582–597. IEEE, 2016.
-  Sara Sabour, Yanshuai Cao, Fartash Faghri, and David J Fleet. Adversarial manipulation of deep representations. arXiv preprint arXiv:1511.05122, 2015.
-  Pedro Tabacof and Eduardo Valle. Exploring the space of adversarial images. In 2016 International Joint Conference on Neural Networks (IJCNN), pages 426–433. IEEE, 2016.
-  Yanpei Liu, Xinyun Chen, Chang Liu, and Dawn Song. Delving into transferable adversarial examples and black-box attacks. arXiv preprint arXiv:1611.02770, 2016.
-  Ian J. Goodfellow, Jonathon Shlens, and Christian Szegedy. Explaining and harnessing adversarial examples. CoRR, abs/1412.6572, 2014.
-  Yinpeng Dong, Fangzhou Liao, Tianyu Pang, Xiaolin Hu, and Jun Zhu. Discovering adversarial examples with momentum. arXiv preprint arXiv:1710.06081, 2017.
-  Cihang Xie, Jianyu Wang, Zhishuai Zhang, Yuyin Zhou, Lingxi Xie, and Alan L. Yuille. Adversarial examples for semantic segmentation and object detection. 2017 IEEE International Conference on Computer Vision (ICCV), pages 1378–1387, 2017.
-  Nicholas Carlini and David Wagner. Audio adversarial examples: Targeted attacks on speech-to-text. arXiv preprint arXiv:1801.01944, 2018.
Anh Nguyen, Jason Yosinski, and Jeff Clune.
Deep neural networks are easily fooled: High confidence predictions
for unrecognizable images.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 427–436, 2015.
-  Nicolas Papernot, Patrick McDaniel, Arunesh Sinha, and Michael Wellman. Towards the science of security and privacy in machine learning. arXiv preprint arXiv:1611.03814, 2016.
-  Christian Szegedy, Vincent Vanhoucke, Sergey Ioffe, Jon Shlens, and Zbigniew Wojna. Rethinking the inception architecture for computer vision. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 2818–2826, 2016.
-  Alexey Kurakin, Ian Goodfellow, and Samy Bengio. Adversarial examples in the physical world. arXiv preprint arXiv:1607.02533, 2016.
Battista Biggio, Igino Corona, Davide Maiorca, Blaine Nelson, Nedim
Šrndić, Pavel Laskov, Giorgio Giacinto, and Fabio Roli.
Evasion attacks against machine learning at test time.In Joint European conference on machine learning and knowledge discovery in databases, pages 387–402. Springer, 2013.
-  Ruitong Huang, Bing Xu, Dale Schuurmans, and Csaba Szepesvári. Learning with a strong adversary. arXiv preprint arXiv:1511.03034, 2015.
-  Nicholas Carlini and David Wagner. Towards evaluating the robustness of neural networks. In 2017 IEEE Symposium on Security and Privacy (SP), pages 39–57. IEEE, 2017.
-  Lie Lu, Hong-Jiang Zhang, and Hao Jiang. Content analysis for audio classification and segmentation. IEEE Transactions on speech and audio processing, 10(7):504–516, 2002.
-  Seyed-Mohsen Moosavi-Dezfooli, Alhussein Fawzi, Omar Fawzi, and Pascal Frossard. Universal adversarial perturbations. In 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 86–94. IEEE, 2017.
-  Osbert Bastani, Yani Ioannou, Leonidas Lampropoulos, Dimitrios Vytiniotis, Aditya Nori, and Antonio Criminisi. Measuring neural net robustness with constraints. In Advances in neural information processing systems, pages 2613–2621, 2016.
-  Nicolas Papernot, Patrick McDaniel, Somesh Jha, Matt Fredrikson, Z. Berkay Celik, and Ananthram Swami. The limitations of deep learning in adversarial settings. pages 372–387, 03 2016.
-  Weilin Xu, David Evans, and Yanjun Qi. Feature squeezing: Detecting adversarial examples in deep neural networks. arXiv preprint arXiv:1704.01155, 2017.
-  Alhussein Fawzi, Seyed-Mohsen Moosavi-Dezfooli, and Pascal Frossard. Robustness of classifiers: from adversarial to random noise. In Advances in Neural Information Processing Systems, pages 1632–1640, 2016.
-  Seyed-Mohsen Moosavi-Dezfooli, Alhussein Fawzi, Omar Fawzi, Pascal Frossard, and Stefano Soatto. Analysis of universal adversarial perturbations. arXiv preprint arXiv:1705.09554, 2017.
-  Hinton Geoffrey, Srivastava Nitish, and Swersky Kevin. Neural networks for machine learning. http://www.cs.toronto.edu/~tijmen/csc321/slides/lecture_slides_lec6.pdf.
-  Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. Imagenet: A large-scale hierarchical image database. In Computer Vision and Pattern Recognition, 2009. CVPR 2009. IEEE Conference on, pages 248–255. Ieee, 2009.