Adaptive Gradient Refinement for Adversarial Perturbation Generation

02/01/2019 ∙ by Yatie Xiao, et al. ∙ 0

Deep Neural Networks have achieved remarkable success in computer vision, natural language processing, and audio tasks. However, in classification domains, researches proved that Deep neural models are easily fooled and make different or wrong classification prediction, which may cause server results. Many attack methods generate adversarial perturbation with large-scale pixel modification and low cosine-similarity between origin and corresponding adversarial examples, to address these issues, we propose an adversarial method with adaptive adjusting perturbation strength and update gradient in direction to generate attacks, it generate perturbation tensors by adjusting its strength adaptively and update gradient in direction which can escape local minimal or maximal by combining with previous calculate history gradient. In this paper, we evaluate several traditional perturbations creating methods in image classification with ours. Experimental results show that our approach works well and outperform recent techniques in the change of misclassifying image classification, and excellent efficiency in fooling deep network models.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Deep Neural Networks(DNNs) [1, 2, 3, 4, 5] have led to a dramatic improvement on image, text and audio tasks recently. With research going specific domain, many research works reveal that in image classification domain, deep neural classification models deployed are easily fooled by adversarial examples[14, 15, 16, 17, 18, 19, 20]. By adding appropriate perturbations into clean images. Perturbations should be in enough strength level so that perturbation can make origin images over decision boundary[17, 19, 20] between different classes, and make generated adversarial examples representation in latent space approaching the other classes space and having more corresponding features to fool deep neural models.

In gradient updating aspect, we can classify adversarial attack methods on image classification, single-step attack

[15], and iterative-steps attack [22]. From models’ structure perspective of view, if get understanding of deployed networks’ structure and parameters, attack methods is categorized into white-box attack [14, 19], on the opposite side, it is categorized into black-box attack [23, 24]. From attack oriented perspective, attacks can be classified into targeted or non-targeted attack methods[8, 25], targeted attack aims to find perturbation which can fool models with define label as wish, and non-targeted attack tries to find perturbation that fool model with the other labels without any definition.

Many proposed attack methods can fool DNNs with prediction confidence and attack deep neural networks in image classifications [3, 26]. We find that most of these methods craft adversarial examples with large perturbation and not very high attack success rates. In this paper we proposed adaptive perturbation generated algorithm which can craft adversarial cases that fool deep neural networks with slight perturbation and misclassify with almost high confidence.

We summarize contributions in this paper follows:

We propose algorithms for generating adversarial perturbation with adaptive gradient which computes adversarial perturbations in slight and small size level that fool deep neural networks deployed with high fooling probability, and give a formulation which aims to calculate adversarial perturbation strength. By using this algorithm, we can qualify the perturbation strength between clean image and corresponding adversarial examples and it to reflect pixel modification strength between clean images and corresponding examples.

2 Related Work

In this section, we first give the background knowledge of adversarial examples and corresponding formalization of proposed algorithms. In the first place, let define as a distribution of images in which means images from a distribution with dimensions, and then is an image from , at the same time, let denotes a classification function which is to make the input data(in this paper, we define images as input data)

output with an estimated label

. In adversarial perturbation seeking process, it finds a perturbation tensor by adding the tensor we gain to origin image, this generated image can fool the deep neural models to make mistake classification of from correct label to other labels , generally we can describe this process as follow:

(1)

2.1 Methods for Generating Adversarial Examples and Adversarial Defense

One-step generation algorithm (Fast Gradient Sign Method-FGSM[15, 16]

) try to find proper perturbation vector, by maximizing the loss function

, the algorithm finds a boundary between different classes so as to generate perturbation tensor, the formulation is shown following:

(2)

The iterative-steps gradient-based algorithm(I-FGSM) [22, 16] is learning models parameters well than FGSM due to updating gradient direction step by step well so that I-FGSM can generate perturbation with good transferability and high attack success rates for models under white-box setting.

(3)

Carlini and Wagner proposed a targeted attack method called attack[25], which generate adversarial examples reduce detecting defenses rates. It is shown:

(4)

the parameter is norm constraint which is set to 0, 1, 2 or . Moosavidezfooli [27]. seek adversarial examples which fool one deep neural model with only one perturbation.

(5)

In the equation, is set to control adversarial examples attack success rate. Besides, Meeting boundary distance between origin images and relevant adversarial examples under norm limitation setting [28], which is expressed in the following:

(6)

In this equation, the method tries to optimize boundary distance between the origin and adversarial examples. In [29], it introduce a simple iterative method(JSMA) for targeted attack.

(7)

Papernot and Fawzi [11, 31] think that by injecting adversarial examples to training datasets, the deployed deep neural models’ robustness is increased, [24, 32] proposed adversarial ensemble methods to improve defense abilities, as a result, the ensemble adversarial models perform well against gradient-based and black-box attacks strategy. Wenlin . illustrate Feature Squeezing method[30] which reduces the search space available to an adversarial example by coalescing samples that correspond to different feature vectors in the original space into a single sample.

Most adversarial attack defenses are effective for attacks and can reduce attack success rates. However, adversarial examples are generated with low distortion and high similarity with origin images, and it is difficult to defend methods, corresponding defenses are urgently updated in these domains.

3 Methodology

We describe our proposed methods with a slight adaptive gradient which contributes to the gradient direction directly and make adversarial examples generated fool deep neural models well. Before display our methods in a specific way, let define = ,, be a set of image from a distribution , be the classifier of deep neural models, besides we use norm constraint mentioned above to up attack success rate and limit pixel in image changing size.

3.1 Adversarial Example Generation with Adaptive Gradient Refinement

Updating gradient in direction with stable size may cause trapping into local minima, because steady gradient size lead gradient updating over maxima data point and miss the deepest valley point in models, in seeking global best point process, gradient updating with stable size seek minimal position between boundaries which may encounter back and forth shocks situation, and escape global position. This phenomenon mentioned above is more familiar with FGSM. Research works reveal that updating gradient with adaptive pace escape local maxima and poor maxima situation because of its adjusting mechanism. AdaGrad (for adaptive gradient algorithm) is a modified stochastic gradient descent with per-parameter learning rate. Examples of such applications include natural language processing and image recognition. It still has a base learning rate

, but this is multiplied with the elements of a vector which is the diagonal of the outer product matrix. We formulate as follows:

(8)

the is the gradient at time , we give it to , so in adaptive gradient mechanism, this model’s parameter is updated after every iteration, see in follows:

(9)

It reveals that parameters update relies on high-parameter and the denominator , once the grows high, the parameter will update with a low rate, so as escape the local minima or maxima when searching directions. We calculate as original adaptive pace by:

(10)

We calculate with sign function for the square of loss, according to adaptive gradient processing, we draw a figure in the following. is the key adaptive idea for our methods, using iterative fast gradient sign method strategy, adjust the pace of generating by the next equation,

(11)

This method adapts the learning rate to the parameters, performing smaller updates for parameters associated with frequently occurring features, and substantial updates for parameters associated with infrequent features. Process of generating adversarial examples.

As we can see from algorithm 1, we focus on the output vector which is generated by Eq.11, then updating the perturbation tensor, after iterations in the direction of generated perturbation, the classifier of models seek right perturbation tensor. We use cosine similarity to evaluate the difference between original images and adversarial images.

4 Experimental Results

We conduct our experiments on ImageNet datasets to validate the effectiveness of our proposed method in this section with attack setting in the following part, experiments settings are kept the same setup both in

and norm constraints. We show attack success rates, cosine similarity and perturbation strength with our proposed method on preprocessed ILSVRC2012(Val) datasets.

Attacks Inc-v3 Inc-v4 IR-v2
Inc-v3 I-FGSM 96.41% 28.86% 27.25%
MI-FGSM 95.62% 25.22% 25.27%
Ours 99.65% 28.99% 28.81%
Inc-v4 I-FGSM 29.36% 96.72% 25.45%
MI-FGSM 28.17% 95.27% 26.41%
Ours 31.55% 99.11% 31.44%
IR-v2 I-FGSM 28.77% 28.26% 96.65%
MI-FGSM 26.67% 25.65% 97.01%
Ours 35.16% 32.54% 98.14%
Attacks VGG16 VGG19 Res152
VGG16 I-FGSM 94.83% 58.29% 30.21%
MI-FGSM 94.22% 57.69% 31.75%
Ours 96.54% 58.35% 34.12%
VGG19 I-FGSM 58.70% 92.62% 32.53%
MI-FGSM 56.72% 95.17% 31.31%
Ours 59.46% 98.43% 32.49%
Res152 I-FGSM 33.63% 34.92% 99.29%
MI-FGSM 34.11% 34.17% 92.31%
Ours 34.25% 35.76% 99.41%
Table 1: ASR in norm constraint on six deep neural models, * indicate white-box attacks. IR-v2 indicate InceptionResnet-v2
Attacks Inc-v3 Inc-v4 IR-v2
Inc-v3 I-FGSM 98.72% 55.17% 57.31%
MI-FGSM 94.94% 56.55% 58.11%
Ours 99.86% 59.23% 63.01%
Inc-v4 I-FGSM 59.41% 98.34% 60.41%
MI-FGSM 61.65% 95.88% 61.01%
Ours 67.44% 99.62% 62.89%
IR-v2 I-FGSM 63.86% 60.17% 95.15%
MI-FGSM 66.20% 59.88% 97.41%
Ours 67.61% 64.42% 99.15%
Attacks VGG16 VGG19 Res152
VGG16 I-FGSM 99.31% 69.01% 62.45%
MI-FGSM 98.11% 65.47% 61.65%
Ours 99.97% 69.52% 64.63%
VGG19 I-FGSM 71.36% 98.98% 57.88%
MI-FGSM 66.52% 90.17% 57.01%
Ours 72.25% 99.56% 59.34%
Res152 I-FGSM 65.41% 59.65% 99.08%
MI-FGSM 63.55% 55.25% 87.31%
Ours 64.07% 61.12% 99.28%
Table 2: ASR in norm constraint on six deep neural models, * indicate white-box attacks. IR-v2 indicate InceptionResnet-v2

Our proposed method generate adversarial examples which has high cosine similarity because the adaptive gradient based approach reduced when approaching the decision boundary during the perturbation calculation. This visually intuitive performance is that under the same constraints, the adversarial examples produced by our method are less perturbative than the adversarial images generated by I-FGSM or MI-FGSM, and the adversarial images are more clear and more difficult to recognize as processed images. Our proposed method can achieve high attack success under the white box attack strategy, and it is better than the I-FGSM or MI-FGSM methods under the black box attack condition, which means that the proposed method can express better network transferability, especially network models are similar in structure, the transferability is better. e.g. Inception-v3 and Inception-v4.

Attacks =10 =1500
Inc-v3 I-FGSM 0.041 0.015
MI-FGSM 0.086 0.019
Ours 0.016 0.008
Table 3: AMP values with different norm constraints on Inception-v3

We use the average of the Absolute Mean Perturbation values:(AMP) = as a measure of the magnitude of the disturbance, which is a representation of the scope of the value of the disturbance added to the pixels of the clean image.

5 Conclusion

We describe an adversarial examples generation attack method which generate perturbation with adaptive pace gaping the boundary distance between different classes representation in latent space; we take two norm constraints into consideration with the no-targeted and targeted attack strategy. Our proposed methods generate adversarial examples which fool the deep neural networks with high probability and generated perturbations show good transferability on different deep neural models which has a good effect in the black-box attack setup, and having high similarity with original images which is directly reflected on the AMP. Next, we will focus our attention on the image-independent adversarial attack.

References