Efficient Project Gradient Descent for Ensemble Adversarial Attack

06/07/2019
by   Fanyou Wu, et al.
Purdue University
0

Recent advances show that deep neural networks are not robust to deliberately crafted adversarial examples which many are generated by adding human imperceptible perturbation to clear input. Consider l_2 norms attacks, Project Gradient Descent (PGD) and the Carlini and Wagner (C&W) attacks are the two main methods, where PGD control max perturbation for adversarial examples while C&W approach treats perturbation as a regularization term optimized it with loss function together. If we carefully set parameters for any individual input, both methods become similar. In general, PGD attacks perform faster but obtains larger perturbation to find adversarial examples than the C&W when fixing the parameters for all inputs. In this report, we propose an efficient modified PGD method for attacking ensemble models by automatically changing ensemble weights and step size per iteration per input. This method generates smaller perturbation adversarial examples than PGD method while remains efficient as compared to C&W method. Our method won the first place in IJCAI19 Targeted Adversarial Attack competition.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

10/28/2020

Most ReLU Networks Suffer from ℓ^2 Adversarial Perturbations

We consider ReLU networks with random weights, in which the dimension de...
09/03/2021

A Synergetic Attack against Neural Network Classifiers combining Backdoor and Adversarial Examples

In this work, we show how to jointly exploit adversarial perturbation an...
07/03/2019

Minimally distorted Adversarial Examples with a Fast Adaptive Boundary Attack

The evaluation of robustness against adversarial manipulation of neural ...
03/26/2018

Clipping free attacks against artificial neural networks

During the last years, a remarkable breakthrough has been made in AI dom...
02/03/2021

Adversarially Robust Learning with Unknown Perturbation Sets

We study the problem of learning predictors that are robust to adversari...
06/21/2019

Adversarial Examples to Fool Iris Recognition Systems

Adversarial examples have recently proven to be able to fool deep learni...
12/16/2021

Towards Robust Neural Image Compression: Adversarial Attack and Model Finetuning

Deep neural network based image compression has been extensively studied...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Deep neural networks are vulnerable to adversarial examples, which are often generated by adding perturbations to the clear input [Szegedy et al.2013]. Understanding of how to manipulate adversarial examples can improve model robustness [Arnab et al.2018] and help to develop better training algorithms [Goodfellow et al.2014, Kurakin et al.2016, Tramèr et al.2017]. Recently, several methods [Szegedy et al.2013, Kurakin et al.2016, Xiao et al.2018, Carlini and Wagner2017] have been proposed to find such examples. Generally, these methods can be divided into two types, 1) maximum-allowable attack and 2) regularization-based attack. Typically, maximum-allowable attacks achieve faster but larger perturbations than regularization-based attacks. Both types of attacks retain a hyper-parameter called step size or learning rate which controls the maximum-allowance of or rate of convergence. Step size, in most cases, is a fixed number, while Decoupling Direction and Norm (DDN) [Rony et al.2018] method changes its step size by either as a weight decay or weight increase depending on the label of the current iteration.

(a) Women’s Jacket
(b)
(c) Kid’s Jacket
(d) Kid’s Jacket
(e)
(f) Window Cleaner
Figure 1: Examples of adversarial examples generated by ensemble VGG16 and Resnet50 [Simonyan and Zisserman2014, He et al.2016]. Left column: the original images. Middle column: the adversarial noises, the gray color represent zero perturbation. Right column: adversarial images generated by our method.

Ensemble of models has been commonly used in competitions and researches to enhance performance and improving the robustness [Hansen and Salamon1990, Krogh and Vedelsby1995, Caruana et al.2004]. So attacking ensemble of models that use one set of input is also necessary. Currently, studies and competition solutions in adversarial attacks are often applied

fuse in logits

with fixed weights as simple ensemble method [Dong et al.2018, Rony et al.2018].

Inspired by DDN, we propose an efficient project gradient descent for ensemble adversarial attack which improves search direction and step size of ensemble models. Our method won the first place in IJCAI19 Targeted Adversarial Attack competition. By using the same codes, we ranked twenty-third in Non-Target Track.

2 Related Work

We focus on the competition problem formulation and its corresponding metric criteria as well as some well-known attacks.

2.1 Problem Formulation

Let be an sample from the input space where is the input width and height, with ground truth label and random assigned attack target label from label set . Let be the attack model with possbile labels and output logit values. Let be the distance measurement that compare the similarity of clear image and adversarial image . In this report, we use the following distance measurement which is consistent with competition criteria :

(1)

For target attack, success means

fool the classifier to the assigned label while for non-target attack, success just represent that

fool the classifier away from its ground truth label . One thing needs to be pointed out: this distance measurement is not the same as the or so-called Frobenius norm of the . In this case, it measures the average distance of spatial points in 3 image RGB channels. This specific design encourages us to generate more spatial sparse adversarial examples.

Given a set of evaluation models where , the final score is calculated as:

(2)

Where is the number of testing images. In the final stage, is equal to 550. For this task, the smaller score, the better the performance.

2.2 Attacks

In this section, we review some adversarial examples generated by methods which are related to our methods.

2.2.1 Fast Gradient Sign Method (FGSM)

Fast Gradient Sign Method (FGSM) performs a single step update on the original sample x along the direction of the gradient of a loss function  [Szegedy et al.2013]

. The loss could be either an accuracy metric like cross-entropy or a dispersion metric like standard deviation 

[Jia et al.2019].

(3)

where epsilon controls the maximum perturbation of the adversarial samples, and the function forces x to reside in the range of . FGSM can be easily extended to norm criteria which fit the competition criteria better. In the following section, we will focus on norms to make this report more related to the competition scenario.

(4)

2.2.2 Projected Gradient Descent (PGD)

Projected Gradient Descent (PGD) is an iterative version of FGSM. In each iteration, PGD greedily solves the problem of maximizing the loss function:

(5)

Here, clip is a projection function and can be replaced by other functions like tanh

to force output of each iteration within the effective range. To further use the second order information, Momentum-based method or other optimization methods like RMSprop and Adam can be applied to speed up the iteration process and enhance the transferability. Momentum-based PGD can be formal as below 

[Zheng et al.2018, Dong et al.2018].

(6)

Where is a parameter to adjust the balance of the current gradient and the historical gradient.

2.2.3 Decoupled Direction and Norm

For NeurIPS 18 adversarial attacks competition, [Rony et al.2018] propose a method which decouples direction and norm for PGD. DDN, generally says, at each iteration, associates its search direction and step size with its current predicted label. This method performs better and can also speed up the search process.

2.2.4 Ensemble Adversarial Attack

In NIPS 17 adversarial attack competition, [Dong et al.2018] reports that fuse in logits with cross-entropy and softmax achieve the best performance. Let becomes the logit output of s model. This ensemble method can be formed as follow:

(7)

where is defined as:

(8)

Here is s model weight. However, in their work, it is a fixed number and does not change during each iteration.

3 Epgd

In this section, first we post the Algorithm 1 EPGD, where the Capital E represents Efficient and Ensemble. Then we discuss and give some explanation of this modification.

Input: Input image , target class
Input: Model and for
Input: Mask
Input: Minimal step size , maximal step size , maximal iteration , confidence level
Output: adversarial example

1:  Initialize
2:  for  to  do
3:     
4:     
5:     
6:     
7:     
8:     
9:     
10:     
11:     if  is adversarial example for all models then
12:        return
13:     else
14:        for  to  do
15:           if  is adversarial example for  then
16:              
17:           else
18:              
19:           end if
20:           
21:        end for
22:     end if
23:  end for
24:  return
Algorithm 1 EPGD

3.1 Changes of Step Size

Changes of step size is not a new idea to both training deep neural networks and adversarial attacks. DDN [Rony et al.2018] modify step size as a weight decay per iteration. Here propose our step size modification method, instead of using exponential decay, we perform a truncated linear min-max scale for step size, formally:

(9)

where

represent of probability output of label

of model and the confidence factor (0.5 in this competition) which truncates and probability of ensemble model. If is above 0.5 then will be , otherwise it is linear proportion to in the range of

. Indeed,this method is a greedy estimation of the best step size, and perform better than DDN weight decay in this competition.

(a) VGG 16
(b) Inception V3
(c) Resnet 50
Figure 2: Examples of adversarial noises generated by single model of VGG 16 Resnet 50 and Inception V3. Here, black points represent that at least one channel value has been modified.

3.2 Changes of Model Ensemble Weights

Fig.2 shows the general pattern of the individual gradient of input image with respect to the loss function. In most cases, there are some overlapped regions, which lead to certain transformability for black box attack. We observed that for the fixed model ensemble weights, sometimes a single model has difficulty reaching the decision boundary, which we define as a point when fixed weights lead to the local optima. Different from DDN, other second-order searching methods like the momentum method, could speed up the process. This approach, however, has difficulty reaching the global optima. We address this issue by changing the model ensemble weights. At each iteration, we greedily use the average gradient of all models which were not attacked successfully as the direction to go. Through testing of all the above methods, we found that our method performs the best of all ensemble models.

3.3 Choices of Masks

We define mask as , and modify the final output as:

(10)

Using mask is a very natural idea to attacks, which has been used in both competitions and researches [Karmon et al.2018, Brendel and Brendel2018]. In our competition criteria, applying spatial masks is better than applying channel masks. For target attack, instead of updating the whole image space, we fixed the noise region as relatively small but continuous area in order reduce overall distance since it is obvious that some features are much more important than others. For non-target attack, disperse grid mask with size and space with 7 pixels performs the best. One possible reason is that most deep neural networks architectures are started with or global pooling operation.

4 Implementation and Scores

Input: clear image , target class , resnet50 model ,vgg16 model , Inception V3 model
Input: number of parameters set for EPGD, in principle, the smaller index, the smaller noise it will generate
Output: adversarial example

1:  
2:  for  to  do
3:     
4:     if  is adversarial example for both  then
5:        return
6:     end if
7:  end for
8:  return
Algorithm 2 Final Implementation Algorithm
Scenario Score

EPGD+Tensorflow+3 Models+All tricks

34.98
EPGD+Tensorflow+2 Models+All tricks 38.62
EPGD+Tensorflow+2 Models+No tricks 39.82
PGD+Tensorflow+2 Models+No tricks 41.00

PGD+Pytorch+2 Models+No tricks

41.64
Table 1: Major Millstone for the changes of method and framework

In a competition, some tricks also help to increase the online score and may have a significant influence on the final rank. Algorithm 2 is our implementation algorithm which addresses some specific issues. Table 1 is our major milestone for the changes in method and framework.Table 2 is out final submission parameters. In this section, we summarize some useful tricks in adversarial attack competition. For the final submission, we use one proxy model Inception V3 which was trained by ourselves and two office model which are publicly available Resnet 50 and VGG 16 [Simonyan and Zisserman2014, He et al.2016, Szegedy et al.2016].

4.1 Float Point Mapping

Consider the evaluation process which requires us map from to as RGB format, the method of mapping float number to integer also affects the attack successful rate as well as the overall distance. We define floor operation as:

(11)

Based on this definition, there are three different rounding methods:

(12)
(13)
(14)

Here and .

Eq.12 and Eq.13 are the standard definition of floor and round which no need to discuss much. Eq.14 is our implement of mapping float numbers to integers. The main idea is to round the number as close as possible to raw input. Our method performs the best compared to all three mapping operations in this competition.

4.2 Resize Methods

Given the fact that the input size of evaluation models varies while the distance measurement is based on . So the best choice is to generate adversarial examples under . We need a transformation function where is required input of the evaluation model. Typically resize transformations are bilinear interpolating, neatest interpolating

and so on. Since the official examples use PIL.Image.BILINEAR as the interpolating method, it becomes very natural to use bilinear interpolating and not surprised it performs best. However, for Tensorflow version no more than 1.13, tf.image.resize_images does not share the consistent behavior with PIL and will lead to lowering the attack success rate when applying the early-stop trick.

222https://github.com/tensorflow/tensorflow/issues/6720. This problem is issue for tensorflow 1.14 or tensorflow 2.0 given the new v2 image resize api333https://github.com/tensorflow/tensorflow/releases.

4.3 Parameters Selection

The ultimate aim for this competition is to generate adversarial example as small as possible within a limited time (550 images in 25 minutes). So using fixed parameters like step size and mask size for each image is not usually optimized: small step size and mask size but larger iteration step will lead to depleting time while large step size and mask size will cause large perturbation. Although the step size modification could partially solve this issue, there should be other tricks to balance it. For the real code implementation, like Algorithm 2, we apply a step-like or filter-like strategy, starting from small perturbation parameters and gradually increase the limit of perturbation.

4.4 Adversarial Example Evaluation

Adversarial example evaluation is a critical step for efficient computing and offline test. Based on the above analysis, we should apply this step in a cautious manner, since image transformation in different libraries is slightly different. To evaluate exactly the same image for submission, at each iteration for EPGD, we generate a round output and use PIL.Image.BILINEAR to resize the image as the adversarial example evaluation input.

4.5 Framework Selection

Tensorflow and Pytorch are the two most important deep learning libraries. In the preliminary stage and half time of the final stage, we use pytorch due to its usability. However, the different behavior of batch-norm implementation between Tensorflow and Pytorch causes us to use Tensorflow since we all know that 2 of 3 given models are used as online evaluation models and run under Tensorflow.

We also tested several versions of Tensorflow because we know that the permanence and APIs might be altered for different Tensorflow version. It seems that Tensorflow 1.14 could be the best choice for this competition since it has both tf.slim module and resize v2 method. However, we found that the speed is too slow compare to version 1.4. There is another github issue discuss the permanence difference 444https://github.com/tensorflow/tensorflow/issues/25606.

Model
0 50 100 50 50 0.5
1 1 100 30 40 0.5
2 1 300 0 40 0.5
3 1 600 0 40 0.5
Table 2: Parameter setting for final submission. Here represents the boarder size of mask

5 Conclusion

In this report, we propose an efficient modified PGD method called EPGD for attacking ensemble models by automatically changing ensemble weights and step size per iteration, per input. At the same time, we present some useful implementation tools, which aim to search small noise adversarial examples efficiently. Experiments show that our solution can generate smaller perturbation adversarial examples than PGD method, while remaining efficient. With this method, we won the first place in IJCAI19 Targeted Adversarial Attack competition.

References