Deep neural networks are vulnerable to adversarial examples, which are often generated by adding perturbations to the clear input [Szegedy et al.2013]. Understanding of how to manipulate adversarial examples can improve model robustness [Arnab et al.2018] and help to develop better training algorithms [Goodfellow et al.2014, Kurakin et al.2016, Tramèr et al.2017]. Recently, several methods [Szegedy et al.2013, Kurakin et al.2016, Xiao et al.2018, Carlini and Wagner2017] have been proposed to find such examples. Generally, these methods can be divided into two types, 1) maximum-allowable attack and 2) regularization-based attack. Typically, maximum-allowable attacks achieve faster but larger perturbations than regularization-based attacks. Both types of attacks retain a hyper-parameter called step size or learning rate which controls the maximum-allowance of or rate of convergence. Step size, in most cases, is a fixed number, while Decoupling Direction and Norm (DDN) [Rony et al.2018] method changes its step size by either as a weight decay or weight increase depending on the label of the current iteration.
Ensemble of models has been commonly used in competitions and researches to enhance performance and improving the robustness [Hansen and Salamon1990, Krogh and Vedelsby1995, Caruana et al.2004]. So attacking ensemble of models that use one set of input is also necessary. Currently, studies and competition solutions in adversarial attacks are often applied fuse in logits
fuse in logitswith fixed weights as simple ensemble method [Dong et al.2018, Rony et al.2018].
Inspired by DDN, we propose an efficient project gradient descent for ensemble adversarial attack which improves search direction and step size of ensemble models. Our method won the first place in IJCAI19 Targeted Adversarial Attack competition. By using the same codes, we ranked twenty-third in Non-Target Track.
2 Related Work
We focus on the competition problem formulation and its corresponding metric criteria as well as some well-known attacks.
2.1 Problem Formulation
Let be an sample from the input space where is the input width and height, with ground truth label and random assigned attack target label from label set . Let be the attack model with possbile labels and output logit values. Let be the distance measurement that compare the similarity of clear image and adversarial image . In this report, we use the following distance measurement which is consistent with competition criteria :
For target attack, success means
fool the classifier to the assigned label while for non-target attack, success just represent thatfool the classifier away from its ground truth label . One thing needs to be pointed out: this distance measurement is not the same as the or so-called Frobenius norm of the . In this case, it measures the average distance of spatial points in 3 image RGB channels. This specific design encourages us to generate more spatial sparse adversarial examples.
Given a set of evaluation models where , the final score is calculated as:
Where is the number of testing images. In the final stage, is equal to 550. For this task, the smaller score, the better the performance.
In this section, we review some adversarial examples generated by methods which are related to our methods.
2.2.1 Fast Gradient Sign Method (FGSM)
Fast Gradient Sign Method (FGSM) performs a single step update on the original sample x along the direction of the gradient of a loss function [Szegedy et al.2013]
. The loss could be either an accuracy metric like cross-entropy or a dispersion metric like standard deviation[Jia et al.2019].
where epsilon controls the maximum perturbation of the adversarial samples, and the function forces x to reside in the range of . FGSM can be easily extended to norm criteria which fit the competition criteria better. In the following section, we will focus on norms to make this report more related to the competition scenario.
2.2.2 Projected Gradient Descent (PGD)
Projected Gradient Descent (PGD) is an iterative version of FGSM. In each iteration, PGD greedily solves the problem of maximizing the loss function:
Here, clip is a projection function and can be replaced by other functions like tanh
to force output of each iteration within the effective range. To further use the second order information, Momentum-based method or other optimization methods like RMSprop and Adam can be applied to speed up the iteration process and enhance the transferability. Momentum-based PGD can be formal as below[Zheng et al.2018, Dong et al.2018].
Where is a parameter to adjust the balance of the current gradient and the historical gradient.
2.2.3 Decoupled Direction and Norm
For NeurIPS 18 adversarial attacks competition, [Rony et al.2018] propose a method which decouples direction and norm for PGD. DDN, generally says, at each iteration, associates its search direction and step size with its current predicted label. This method performs better and can also speed up the search process.
2.2.4 Ensemble Adversarial Attack
In NIPS 17 adversarial attack competition, [Dong et al.2018] reports that fuse in logits with cross-entropy and softmax achieve the best performance. Let becomes the logit output of s model. This ensemble method can be formed as follow:
where is defined as:
Here is s model weight. However, in their work, it is a fixed number and does not change during each iteration.
In this section, first we post the Algorithm 1 EPGD, where the Capital E represents Efficient and Ensemble. Then we discuss and give some explanation of this modification.
3.1 Changes of Step Size
Changes of step size is not a new idea to both training deep neural networks and adversarial attacks. DDN [Rony et al.2018] modify step size as a weight decay per iteration. Here propose our step size modification method, instead of using exponential decay, we perform a truncated linear min-max scale for step size, formally:
represent of probability output of labelof model and the confidence factor (0.5 in this competition) which truncates and probability of ensemble model. If is above 0.5 then will be , otherwise it is linear proportion to in the range of
. Indeed,this method is a greedy estimation of the best step size, and perform better than DDN weight decay in this competition.
3.2 Changes of Model Ensemble Weights
Fig.2 shows the general pattern of the individual gradient of input image with respect to the loss function. In most cases, there are some overlapped regions, which lead to certain transformability for black box attack. We observed that for the fixed model ensemble weights, sometimes a single model has difficulty reaching the decision boundary, which we define as a point when fixed weights lead to the local optima. Different from DDN, other second-order searching methods like the momentum method, could speed up the process. This approach, however, has difficulty reaching the global optima. We address this issue by changing the model ensemble weights. At each iteration, we greedily use the average gradient of all models which were not attacked successfully as the direction to go. Through testing of all the above methods, we found that our method performs the best of all ensemble models.
3.3 Choices of Masks
We define mask as , and modify the final output as:
Using mask is a very natural idea to attacks, which has been used in both competitions and researches [Karmon et al.2018, Brendel and
Brendel2018]. In our competition criteria, applying spatial masks is better than applying channel masks. For target attack, instead of updating the whole image space, we fixed the noise region as relatively small but continuous area in order reduce overall distance since it is obvious that some features are much more important than others. For non-target attack, disperse grid mask with size and space with 7 pixels performs the best. One possible reason is that most deep neural networks architectures are started with or global pooling operation.
4 Implementation and Scores
EPGD+Tensorflow+3 Models+All tricks
|EPGD+Tensorflow+2 Models+All tricks||38.62|
|EPGD+Tensorflow+2 Models+No tricks||39.82|
|PGD+Tensorflow+2 Models+No tricks||41.00|
PGD+Pytorch+2 Models+No tricks
In a competition, some tricks also help to increase the online score and may have a significant influence on the final rank. Algorithm 2 is our implementation algorithm which addresses some specific issues. Table 1 is our major milestone for the changes in method and framework.Table 2 is out final submission parameters. In this section, we summarize some useful tricks in adversarial attack competition. For the final submission, we use one proxy model Inception V3 which was trained by ourselves and two office model which are publicly available Resnet 50 and VGG 16 [Simonyan and Zisserman2014, He et al.2016, Szegedy et al.2016].
4.1 Float Point Mapping
Consider the evaluation process which requires us map from to as RGB format, the method of mapping float number to integer also affects the attack successful rate as well as the overall distance. We define floor operation as:
Based on this definition, there are three different rounding methods:
Here and .
Eq.12 and Eq.13 are the standard definition of floor and round which no need to discuss much. Eq.14 is our implement of mapping float numbers to integers. The main idea is to round the number as close as possible to raw input. Our method performs the best compared to all three mapping operations in this competition.
4.2 Resize Methods
Given the fact that the input size of evaluation models varies while the distance measurement is based on . So the best choice is to generate adversarial examples under . We need a transformation function where is required input of the evaluation model. Typically resize transformations are bilinear interpolating, neatest interpolating
and so on. Since the official examples use PIL.Image.BILINEAR as the interpolating method, it becomes very natural to use bilinear interpolating and not surprised it performs best. However, for Tensorflow version no more than 1.13, tf.image.resize_images does not share the consistent behavior with PIL and will lead to lowering the attack success rate when applying the early-stop trick.222https://github.com/tensorflow/tensorflow/issues/6720. This problem is issue for tensorflow 1.14 or tensorflow 2.0 given the new v2 image resize api333https://github.com/tensorflow/tensorflow/releases.
4.3 Parameters Selection
The ultimate aim for this competition is to generate adversarial example as small as possible within a limited time (550 images in 25 minutes). So using fixed parameters like step size and mask size for each image is not usually optimized: small step size and mask size but larger iteration step will lead to depleting time while large step size and mask size will cause large perturbation. Although the step size modification could partially solve this issue, there should be other tricks to balance it. For the real code implementation, like Algorithm 2, we apply a step-like or filter-like strategy, starting from small perturbation parameters and gradually increase the limit of perturbation.
4.4 Adversarial Example Evaluation
Adversarial example evaluation is a critical step for efficient computing and offline test. Based on the above analysis, we should apply this step in a cautious manner, since image transformation in different libraries is slightly different. To evaluate exactly the same image for submission, at each iteration for EPGD, we generate a round output and use PIL.Image.BILINEAR to resize the image as the adversarial example evaluation input.
4.5 Framework Selection
Tensorflow and Pytorch are the two most important deep learning libraries. In the preliminary stage and half time of the final stage, we use pytorch due to its usability. However, the different behavior of batch-norm implementation between Tensorflow and Pytorch causes us to use Tensorflow since we all know that 2 of 3 given models are used as online evaluation models and run under Tensorflow.
We also tested several versions of Tensorflow because we know that the permanence and APIs might be altered for different Tensorflow version. It seems that Tensorflow 1.14 could be the best choice for this competition since it has both tf.slim module and resize v2 method. However, we found that the speed is too slow compare to version 1.4. There is another github issue discuss the permanence difference 444https://github.com/tensorflow/tensorflow/issues/25606.
In this report, we propose an efficient modified PGD method called EPGD for attacking ensemble models by automatically changing ensemble weights and step size per iteration, per input. At the same time, we present some useful implementation tools, which aim to search small noise adversarial examples efficiently. Experiments show that our solution can generate smaller perturbation adversarial examples than PGD method, while remaining efficient. With this method, we won the first place in IJCAI19 Targeted Adversarial Attack competition.
- [Arnab et al.2018] Anurag Arnab, Ondrej Miksik, and Philip HS Torr. On the robustness of semantic segmentation models to adversarial attacks. In , pages 888–897, 2018.
- [Brendel and Brendel2018] Wieland Brendel and Wieland Brendel. Results of the nips adversarial vision challenge 2018, Nov 2018.
- [Carlini and Wagner2017] Nicholas Carlini and David Wagner. Towards evaluating the robustness of neural networks. In 2017 IEEE Symposium on Security and Privacy (SP), pages 39–57. IEEE, 2017.
[Caruana et al.2004]
Rich Caruana, Alexandru Niculescu-Mizil, Geoff Crew, and Alex Ksikes.
Ensemble selection from libraries of models.
Proceedings of the twenty-first international conference on Machine learning, page 18. ACM, 2004.
- [Dong et al.2018] Yinpeng Dong, Fangzhou Liao, Tianyu Pang, Hang Su, Jun Zhu, Xiaolin Hu, and Jianguo Li. Boosting adversarial attacks with momentum. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 9185–9193, 2018.
- [Goodfellow et al.2014] Ian J Goodfellow, Jonathon Shlens, and Christian Szegedy. Explaining and harnessing adversarial examples. arXiv preprint arXiv:1412.6572, 2014.
- [Hansen and Salamon1990] Lars Kai Hansen and Peter Salamon. Neural network ensembles. IEEE Transactions on Pattern Analysis & Machine Intelligence, (10):993–1001, 1990.
- [He et al.2016] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016.
- [Jia et al.2019] Yunhan Jia, Yantao Lu, Senem Velipasalar, Zhenyu Zhong, and Tao Wei. Enhancing cross-task transferability of adversarial examples with dispersion reduction. arXiv preprint arXiv:1905.03333, 2019.
- [Karmon et al.2018] Danny Karmon, Daniel Zoran, and Yoav Goldberg. Lavan: Localized and visible adversarial noise. arXiv preprint arXiv:1801.02608, 2018.
[Krogh and Vedelsby1995]
Anders Krogh and Jesper Vedelsby.
Neural network ensembles, cross validation, and active learning.In Advances in neural information processing systems, pages 231–238, 1995.
- [Kurakin et al.2016] Alexey Kurakin, Ian Goodfellow, and Samy Bengio. Adversarial machine learning at scale. arXiv preprint arXiv:1611.01236, 2016.
- [Rony et al.2018] Jérôme Rony, Luiz G Hafemann, Luis S Oliveira, Ismail Ben Ayed, Robert Sabourin, and Eric Granger. Decoupling direction and norm for efficient gradient-based l2 adversarial attacks and defenses. arXiv preprint arXiv:1811.09600, 2018.
- [Simonyan and Zisserman2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014.
- [Szegedy et al.2013] Christian Szegedy, Wojciech Zaremba, Ilya Sutskever, Joan Bruna, Dumitru Erhan, Ian Goodfellow, and Rob Fergus. Intriguing properties of neural networks. arXiv preprint arXiv:1312.6199, 2013.
- [Szegedy et al.2016] Christian Szegedy, Vincent Vanhoucke, Sergey Ioffe, Jon Shlens, and Zbigniew Wojna. Rethinking the inception architecture for computer vision. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 2818–2826, 2016.
- [Tramèr et al.2017] Florian Tramèr, Alexey Kurakin, Nicolas Papernot, Ian Goodfellow, Dan Boneh, and Patrick McDaniel. Ensemble adversarial training: Attacks and defenses. arXiv preprint arXiv:1705.07204, 2017.
- [Xiao et al.2018] Chaowei Xiao, Jun-Yan Zhu, Bo Li, Warren He, Mingyan Liu, and Dawn Song. Spatially transformed adversarial examples. arXiv preprint arXiv:1801.02612, 2018.
- [Zheng et al.2018] Tianhang Zheng, Changyou Chen, and Kui Ren. Distributionally adversarial attack. arXiv preprint arXiv:1808.05537, 2018.