Harmonic Adversarial Attack Method

07/18/2018 ∙ by Wen Heng, et al. ∙ Megvii Technology Limited Peking University 0

Adversarial attacks find perturbations that can fool models into misclassifying images. Previous works had successes in generating noisy/edge-rich adversarial perturbations, at the cost of degradation of image quality. Such perturbations, even when they are small in scale, are usually easily spottable by human vision. In contrast, we propose Harmonic Adversar- ial Attack Methods (HAAM), that generates edge-free perturbations by using harmonic functions. The property of edge-free guarantees that the generated adversarial images can still preserve visual quality, even when perturbations are of large magnitudes. Experiments also show that adversaries generated by HAAM often have higher rates of success when transferring between models. In addition, we find harmonic perturbations can simulate natural phenomena like natural lighting and shadows. It would then be possible to help find corner cases for given models, as a first step to improving them.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 4

page 6

page 8

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

Introduction

Figure 1: (a) Reference image. (b) Adversarial image generated using HAAM with . (c) Harmonic perturbations of (b). (d) Adversarial image generated using FGSM with . (e) Noisy perturbations of (d). The adversarial image generated using HAAM is indistinguishable from natural images.

Deep neural networks (DNNs) have made great progresses in a variety of application domains, like computer vision, speech and many other tasks 

[Krizhevsky, Sutskever, and Hinton2012, Simonyan and Zisserman2014, He et al.2016, Hinton et al.2012, Clark and Storkey2015, Socher et al.2011]. However, it has been shown by many works that the state-of-the-art DNNs are vulnerable to adversarial examples, which are images generated by adding carefully designed perturbations on the natural images [Goodfellow, Shlens, and Szegedy2014, Kurakin, Goodfellow, and Bengio2016a, Papernot, McDaniel, and Goodfellow2016, Moosavi Dezfooli, Fawzi, and Frossard2016, Fawzi, Fawzi, and Frossard2018]

. The adversarial attacks reveal the weakness of DNNs models, even though they achieve human-competitive performances in many tasks. More importantly, adversarial examples also pose potential security threats to machine learning systems in practice, and may stunt the growth of applying DNNs in practice. Therefore, the study of adversarial attacks is crucial to improving the robustness of models.

Figure 2: First row shows the reference and its adversarial images generated by five adversarial methods. Second row shows their corresponding laplacian maps. We calculated the Edge-SSIM (ESSIM) between the reference and distorted laplacian maps. It’s shown that perturbations generated by HAAM don’t disturb the laplacian maps and adversaries show good visual quality. ESSIM is to apply SSIM on the laplacian maps. (Zoom in to see details)

Most of existing adversarial methods [Goodfellow, Shlens, and Szegedy2014, Kurakin, Goodfellow, and Bengio2016a, Moosavi Dezfooli, Fawzi, and Frossard2016, Carlini and Wagner2017] generate pixel-wise perturbations of limited magnitude. The perturbations usually show random patterns that are rich in edges, as is shown in Fig 1 (e). Such perturbations inevitably change the spatial frequency of natural images. Human vision is quite sensitive to the edge information, as there is a primary visual cortex (V1) which is devoted to edge extraction [Stevens2015]. Hence the adversarial patterns generated by adding pixel-wise noisy perturbations tend to be easily spottable by human vision. In addition, to reduce the magnitude of perturbations and meanwhile achieving high attacking rate, carefully designed adversarial methods usually only work effectively when models are known (white-box attack), but the adversarials may not transfer to unknown models (black-box attack) [Kurakin, Goodfellow, and Bengio2016b]. On the other hand, adversarial examples generated by Fast Gradient Sign Method (FGSM) [Goodfellow, Shlens, and Szegedy2014] have a good transferability, since FGSM is an one-step attack method which generate noisy/edge-rich adversarial images, but suffer in visual quality. In short, high visual quality and transferability are hard to be achieved at the same time.

We propose Harmonic Adversarial Attack Method (HAAM), a novel adversarial method to narrow the gap between visual quality and transferability. Different from previous noisy/edge-rich perturbations, the proposed method can generate edge-free perturbations, by requiring the laplacian of generated perturbations to be equal to 0 at any point. Consequently, there would be no detectable edges in the perturbations [Shapley and Tolhurst1973]

. We enforce the constraint by using harmonic functions as parametric models to generate perturbations, since harmonic functions satisfy the Laplace’s equations 

[Boas2006] and can be constructed from analytic complex functions. In experiments, we find that the adversarial images produced by HAMM are still of good visual quality, even when the magnitudes of added perturbations are quite large. In addition, the large magnitudes of perturbations guarantee good transferability between models. Moreover, by using special harmonic functions, the generated adversarial images may look like some natural phenomena. This can be used to generate corner cases for target models and reveal their weakness in nature environments.

In summary, our contributions are as follows.

  • We propose an adversarial method which can generate edge-free adversarial perturbations. In experiments, we find the adversarial examples can strike a balance between visual quality and transferability.

  • We propose using analytic complex functions as parametric models to systematically construct harmonic functions. The parameters are learned in an end-to-end fashion.

  • With special harmonic functions, we find the generated adversarial examples can simulate some natural phenomena and/or photographic effects. This helps to find real-life corner cases for target models and give suggestions for improving the models.

Related Works

The research of adversarial attacks on neural networks has made great progresses in recent years. Fast Gradient Sign Method (FGSM) [Goodfellow, Shlens, and Szegedy2014] is a one-step attack method, which shifts the input by in the direction of minimizing the adversarial loss. Based on FGSM, Basic Iterative Method (BIM) was proposed  [Kurakin, Goodfellow, and Bengio2016a]. Compared to FGSM, adversaries generated by BIM show less perturbations and consequently have a higher success rate in attacking. DeepFool [Moosavi Dezfooli, Fawzi, and Frossard2016] is a powerful attack method, which shifts the input with least perturbations in the direction to its nearest classification plane. It’s shown that the adversaries generated by DeepFool have sparser perturbations, compared to FGSM and BIM. In addition, C&W  [Carlini and Wagner2017] was proposed based on three distance metrics: , and

. With carefully designed optimization loss for perturbations searching, the C&W methods are shown to successfully overtake the defensive distillation 

[Papernot et al.2016] with 100% success rate. In summary, most of existing adversarial methods pursue a high success rate in attacking, but ignore the natural properties of adversarial images. Since the perturbations generated by these methods are usually noisy/edge-rich, this will make the adversarial images look unnatural in some degree. More importantly, such images barely exist in the real world.

Different from the methods which generate noisy/edge-rich perturbations, some works focus on generating physical-world adversaries. This kind of adversaries can attack target models successfully in the physical environment.  [Sharif et al.2016] proposes generating adversarial perturbations on the sunglasses mask which can be printed out and weared by people to trick the face recognizers. In  [Brown et al.2017]

, the authors generate a universal sticker, which can make any object recognized as a ‘toaster’ by the neural network classifier. Similarly, in 

[Athalye and Sutskever2017] the authors take 3D-printing technique to build one adversarial object. The object would be always recognized as the target class no matter with which angle it was captured by cameras.  [Evtimov et al.2017] proposes generating subtle posters or stickers which can be posted on traffic signs to cheat the traffic signs recognizer, which are crucial to a Autonomous Driving system.  [Zhao, Dua, and Singh2017] try to generate more natural adversaries using Generative Adversarial Networks (GAN) [Goodfellow et al.2014]. However, although the adversaries generated by above methods really work in the physical environments, they are still easily distinguishable to human vision in most cases because of their unnatural patterns.

Harmonic Adversarial Attack Method

In this section, we first introduce harmonic functions that are used as parametric models for constructing perturbations, and present an end-to-end procedure to learn the parameters. We also present methods to increase diversity of perturbasion by exploiting properties of harmonic functions.

Why harmonic function?

Harmonic functions can generate edge-free perturbations, from the perspectives of frequency domain analysis of images.

First, in mathematics, let be a twice continuously differentiable function, where is an open subset of . If satisfies Laplace’s equation everywhere on , it is a harmonic function  [Gamelin2003]. The Laplace’s equation is

(1)

which is also written as .

Considering the coordinate space of natural images. the region of harmonic functions should be in . According to Laplace’s equation, it’s easy to see that the sum of second-order derivatives with respect to variables equals to 0. This means the laplacian edge detector would never detect edges in the perturbations. We give some examples to show the differences between the laplacian maps of adversarial images generated using different methods in Fig. 2.

In terms of the frequency domain, harmonic perturbations would not significantly affect the frequency components of natural images since they are very smooth in nature. This is quite different from noisy/edge-rich perturbations, which would add extra high frequency components in images’ frequency domain  [Rabiner and Gold1975].

Generation of Harmonic Perturbation

In this section, we will describe how to generate harmonic perturbations for natural images. The key is to construct a flexible harmonic function on the coordinate space of natural images.

Let be a natural image set, and denote a natural image with a size of . We assume the space of pixels’ coordinates in to be a complex plane. Then the coordinate of one pixel can be represented as a complex number where and are the real and imaginary part respectively. To define a universal framework for images with different size in , we normalize the coordinate space for all images into a zero-centered unit square, i.e. normalize both and into .

To be compatible with mathematical theories, we assume the coordinate space (complex plane) of natural image is continuous instead of discrete. In the coordinate space, we define a complex function as , where denotes the parameters in . For example, a quadratic polynomial functions can be denoted as , where . We take to denote for simplicity in the following parts.

Preliminarily, we give three properties which are crucial for deriving harmonic functions from .

Property 1: If satisfies the Cauchy-Riemann equations, it is an analytic function  [Gamelin2003].

Property 2: If is analytic in a region, then and satisfy Laplace’s equation in the region, i.e. and are harmonic functions [Gamelin2003].

Property 3: The linear combination of analytic functions is still an analytic function.

According to Property 1, we define to satisfy the Cauchy-Riemann equations in order to make sure that it is an analytic function. Then based on Property 2, we know the real and imaginary parts of are harmonic functions and meanwhile they are conjugate [Boas2006]. In fact it has been proved that some known functions are analytic [Gamelin2003] e.g. polynomial, trigonometric, exponential functions. Not limited, we can build any complex function with just guaranteeing it satisfying the Cauchy-Riemann equations. Then, we can directly take the real part or imaginary part of complex functions as harmonic function to generate harmonic perturbations. We term the selected harmonic function as .

Next, with the aim of making the input image adversarial, we generate harmonic perturbations based on which is defined upon the coordinate space. We First normalize the range of the into[-1,1] with . Then we use a coefficient to control the scale of harmonic perturbations when added on input images.

(2)

is the generated distorted image, and denotes clipping the intensity of pixels in into [0,255]. To make adversarial to the target model, elaborate tunings of the parameters in are desirable. We take the adversarial learning strategy to learn the parameters, which will be introduced in Sec. Learning Strategy.

(a) (a) Combination of the real parts of and .
(b) (b) Affine transformation example. Harmonic function is the real part of . R denotes rotation. S denotes scaling. T denotes translation.
Figure 3: Examples to show how expansion tricks work.

Expansion Tricks for Perturbation Generation

In addition, to generate more flexible and powerful harmonic perturbations, we suggest two expansion tricks, based on the properties of harmonic functions.

Linear Combination of Harmonic Functions

The selection of harmonic functions is the key to generate adversarial perturbations. We term the known harmonic functions as basic functions. With basic functions, we can construct more complicated functions according to Property 3.

We select two known analytic functions: quadratic polynomial and sine function, which are marked as and respectively. It’s flexible to select any basic functions, we here take these two functions as an example. The linearly combined function is , where and are two learnable coefficients. It’s easy to get the real part and imaginary part of are and . And they both are harmonic functions according to Property 2 and Property 3. We use to denote the set of parameters in basic functions and learnable coefficients in the combined function.

Coordinate Space Affine Transformation

We find applying affine transformations on the coordinate space can argument the harmonic function to generate powerful adversarial perturbations. We consider three transformations: rotation, scaling and translation. For any point in the coordinate space (complex plane), three transformations are performed in turn.

  • Rotation: Let denote the cosine value of the rotation angle, where .

    (3)
  • Scaling: Let and be the scale factors, where .

    (4)
  • Translation: Let and be the translation distance, where .

    (5)

The parameters will be learned together with the parameters of the harmonic function. But differently, they are restricted in a narrow range. So we will give them a little learning rate during optimization. We mark the parameters set in affine transformations as .

To show how expansion tricks argument the generated perturbations, we give some examples in Fig 3.

Learning Strategy

Figure 4: End-to-end workflow of HAAM. Blue arrows show the forward pass, and red arrows show the backward pass for updating parameters.

The generation of harmonic perturbations depends on two groups of parameters: and . We use the adversarial loss to learn these parameters. The learning process is illustrated in Fig. 4.

We assume the target model is a neural network classifier

which predicts the logits for the given image. The number of classes is marked as

. is one image in , its true class is where

. For a classifier with a softmax prediction, when the class labels are integers the cross-entropy cost function equals the negative log-probability of the true class given the image. So we have the loss function as follows,

(6)

where is generated from Eqn. (2). The above loss function is non-targeted adversarial attack loss, optimizing with which would decrease the predictive confidence of the given image with respect to its true class label.

We use the back-propagation algorithm to optimize the parameters. Since the parameters in affine transformations are restricted in a narrow range, and parameters in the harmonic function are freely adjustable, we set different learning rates () for the optimization of parameters in these two modules. In our experiments, we set for the parameters in affine transformations and increase times for parameters in harmonic functions.

For the optimization procedure of the proposed algorithm please refer to Algorithm  1.

Input: natural image , label , harmonic function , target model , number of iterations , learning rate for , for
Output: adversarial image , indicator for adversary:
Initialization: , ,

1:
2:
3:while  and  do
4:     affine transform with on coordinate space
5:     generate perturbations with and
6:     generate according to Eqn. (2)
7:     update :
8:     update :
9:     
10:if  then
11:     
12:return ,
Algorithm 1 Optimization algorithm for HAAM

Gray-scale and Color Harmonic Perturbations

Since natural images have three color channels, we introduce two kinds of harmonic perturbations: gray-scale and color perturbations. A gray-scale perturbation means we learn a shared harmonic perturbations across all three channels. And a color perturbation means we learn separate perturbations for each channel of one image. In experiments, we let HAAM-g denote HAAM with gray-scale perturbations and HAAM-c denote HAAM with color perturbations.

Harmonic Perturbations vs. Natural Phenomena

We find the the harmonic perturbations generated using special harmonic functions can simulate some natural phenomena or photographic effects in life.

The adversarial images generated by harmonic functions of the real part of show some kinds of stripe-like pattern. This pattern looks like the natural shadows or light in the scene when photos were shot. We give some examples in Fig 5 (a). In addition, if we use the the harmonic functions of the real part of polynomial analytic functions, the adversarial images look like showing some kinds of photographic effects, e.g. uneven exposure. Some adversarial images are shown in Fig 5 (b). If we generate channel-wise perturbations using combined harmonic functions, the adversarial images would look like with colors adjusting using photo editors. Examples are shown in Fig 5 (c).

In summary, the adversarial images generated using HAAM with specific harmonic functions can simulate some natural phenomena or photographic effects. In term of this point, the HAAM is quite different from previous methods which only generate noisy perturbations. HAAM can help to find some corner cases which possibly exist in our lives to cheat the target model, which in fact is useful in guiding design of data argumentation for training models, which will consequently make the model more robust in practice.

(a)
(b) (a) Perturbations look like natural shadows.
(c)
(d)
(e) (b) Perturbations look like uneven exposure.
(f)
(g)
(h) (c) Perturbations look like color adjusting.
(i)
Figure 5: Examples to show that harmonic perturbations look like natural phenomena or photographic effects. (a) shows gray-scale perturbations generated by harmonic functions from the real part of with affine transformations. (b) shows gray-scale perturbations generated by harmonic functions from the real part of polynomial functions with affine transformations. (c) shows color perturbations that look like colors adjusting using photo editors. (Best viewed in color)

Experiments

(a)
(b)
Figure 6: SSIM and Edge-SSIM comparisons of different adversarial attack methods. (Zoom in to see details)

We experiment on the Imagenet 

[Deng et al.2009] classification task. The dataset is from the competition ‘Defense Against Adversarial Attack’ in NIPS 2017  111https://www.kaggle.com/c/nips-2017-defense-against-adversarial-attack. There are dev-dataset (1000 images) and test-dataset (5000 images) released, we only use the test-dataset in our experiments.

Multiple models trained on Imagenet dataset have been taken into account in our experiments, including Resnet50 [He et al.2016], SqueezeNet [Iandola et al.2016], VGG16 [Simonyan and Zisserman2014] and Densenet121 [Huang et al.2017]

. We take the Resnet50 as the main target model in the following experiments. All models are got from Pytorch official model zoo 

222http://pytorch.org/docs/master/torchvision/models.html.

 

SSIM Adv. Methods SqueezeNet VGG16 Densenet121 Average

 

[0.967, 1.0) FGSM 0.349 0.208 0.195 0.251
BIM 0.350 0.239 0.247 0.279
DeepFool 0.287 0.123 0.094 0.168
CWL2 0.459 0.188 0.088 0.245
HAAM-g (ours) 0.594 0.312 0.242 0.383
HAAM-c (ours) 0.606 0.413 0.282 0.434

 

[0.933, 0.967) FGSM 0.424 0.318 0.341 0.361
BIM 0.425 0.409 0.495 0.443
DeepFool - - - -
CWL2 - - - -
HAAM-g (ours) 0.625 0.371 0.284 0.427
HAAM-c (ours) 0.652 0.432 0.350 0.478

 

[0.900, 0.933) FGSM 0.476 0.384 0.401 0.420
BIM 0.487 0.503 0.608 0.533
DeepFool - - - -
CWL2 - - - -
HAAM-g (ours) 0.673 0.381 0.381 0.478
HAAM-c (ours) 0.705 0.314 0.339 0.453

 

Table 2: Transfer rate comparisons with different Edge-SSIM scores. The Edge-SSIM scores are in the range of [0.8,1.0). We uniformly split the range into three buckets to calculate the transfer rate. The source model is Resnet50, and target models are SqueezeNet, VGG16 and Densenet121.

 

Edge-SSIM Adv. Methods SqueezeNet VGG16 Densenet121 Average

 

[0.933, 1.0) FGSM 0.353 0.184 0.172 0.236
BIM 0.347 0.196 0.188 0.247
DeepFool 0.292 0.122 0.092 0.169
CWL2 0.501 0.216 0.092 0.270
HAAM-g (ours) 0.620 0.357 0.286 0.421
HAAM-c (ours) 0.637 0.304 0.408 0.450

 

[0.867, 0.933) FGSM 0.329 0.201 0.344 0.291
BIM 0.348 0.231 0.391 0.323
DeepFool 0.283 0.133 0.165 0.194
CWL2 0.402 0.150 0.086 0.213
HAAM-g (ours) 0.690 0.386 0.317 0.464
HAAM-c (ours) 0.620 0.325 0.319 0.421

 

[0.80, 0.867) FGSM 0.359 0.245 0.240 0.281
BIM 0.363 0.272 0.311 0.315
DeepFool 0.272 0.100 0.093 0.155
CWL2 0.342 0.104 0.082 0.176
HAAM-g (ours) 0.667 0.394 0.364 0.475
HAAM-c (ours) 0.617 0.209 0.252 0.359

 

Table 1: Transfer rate comparisons with different SSIM scores. The SSIM scores are in the range of [0.9, 1.0). We uniformly split the range into three buckets to calculate the transfer rate. The source model is Resnet50, and target models are SqueezeNet, VGG16 and Densenet121. There’s no adversarial examples generated using DeepFool and CWL2 whose SSIM scores less than 0.967.

We make comparisons to existing adversarial attack methods from two aspects: visual quality and transferability.

Two metrics are selected to measure the visual quality.

  • SSIM [Wang et al.2004]: SSIM is one full-reference image quality assessment (IQA) method. It measures how much quality degradation of distorted image when compared to the reference image. Different from PSNR which is another one full-reference IQA method, the measurement of SSIM is more consistent with human vision. A higher SSIM score indicates a small quality degradation.

  • Edge-SSIM (ESSIM): ESSIM is our proposed metric, and it’s also a full-reference quality assessment metric. ESSIM is to apply SSIM on the laplacian maps of the distorted image and the reference image. It measures how much distortions caused on the edge map. Since human vision is sensitive to edges in natural images, this metric is also very important for visual quality measurement.

Transferability is measured by the transfer rate.

  • Transfer Rate (TR) [Kurakin, Goodfellow, and Bengio2016b]: The transfer rate is calculated as follows,

    (7)

    where denotes the number of adversarial images generated by the source model, and denotes the number of adversarial images that successfully cheat the target model in . can reflect the transferability of the adversarial samples generated by one adversarial method. A higher TR means better transferability.

FGSM [Goodfellow, Shlens, and Szegedy2014], BIM [Kurakin, Goodfellow, and Bengio2016a], DeepFool [Moosavi Dezfooli, Fawzi, and Frossard2016] and CWL2 [Carlini and Wagner2017]

are considered for comparison in our experiments, and they all are classical and effective adversarial methods. All methods including our HAAM are implemented with Pytorch framework. FGSM, BIM and CWL2 are implemented with referring to their Tensorflow version in Cleverhans 

[Papernot et al.2017], and the implementation of DeepFool is from the authors 333https://github.com/LTS4/DeepFool. It’s worth noting that in FGSM, BIM and HAAM, there is a hyper-parameter to adjust the maximum magnitude () of generated perturbations. We select for FGSM, BIM and HAAM. For DeepFool and CWL2, there’s no to control the magnitude of perturbations, so we just run it one time to generate adversarial examples on the testing dataset. The generated adversarial examples of all four adversarial methods will be used for all the remaining experimental analyses.

Visual Quality Comparison

Objective Comparison

As highlighted in our paper, the adversarial examples generated using HAAM usually have a good visual quality on both the RGB color space and edge space. To make a fair comparison to other adversarial methods with respect to the visual quality, we will show the comparisons among several adversarial methods based on SSIM and Edge-SSIM metrics under the same perturbation magnitude. The magnitude of perturbation is measured by Perturbation Norm Ratio (PNR) metric  [Moosavi Dezfooli, Fawzi, and Frossard2016] defined as , where denote the adversarial perturbations generated for image . This metric measures the magnitude of perturbations. A higher PNR indicates larger changes in the intensity of images.

We only consider the adversarial examples with their PNR values in the range of , because PNR values of over 99% of adversarial examples of each method are in this range. To compare the visual quality under the condition of same PNR score, we first split the range of (0,0.2] into 10 buckets uniformly (i.e. ). Then we put the adversarial examples into different buckets according to their PNR scores. We display the mean SSIM and center PNR value in each bucket in Fig. 6 (a). Similarly for ESSIM and PNR in Fig. 6 (b).

It’s shown that the PNR values of adversarial examples generated using DeepFool and CWL2 are in a narrow range, and they show competitive visual qualities when compared to HAAM with respect to SSIM metric. But for Edge-SSIM metric, it’s easy to see the DeepFool and CWL2 show worse laplacian map quality than HAAM. This indicates that although a slight noisy perturbation will not significantly affect the SSIM metric, but it inevitably brings changes in the edge space of natural images. Compared to FGSM and BIM, the adversarial examples generated using HAAM show significant advantages with respect to both the SSIM and Edge-SSIM metrics.

Subjective Comparison

Besides the comparison based on objective metrics, we further conduct subjective experiments to compare the visual quality of adversaries generated by HAAM-g and HAAM-c to other methods. We recruit 14 subjects including 10 males and 4 females in the subjective experiments, whose ages are in .

Let’s take the comparison between HAAM-g and FGSM as an example. First, we sample 100 pairs of adversarial images. In each pair, the two images are generated by HAAM-g and FGSM respectively from the same original image and they have very similar PNR values. Then we let subjects vote which image in the pair showing better visual quality. If one method gets the majority of 14 votes, it wins in this pair. Finally, we calculate the ratio of 100 image pairs won by HAAM-g, which is 0.96 in the following table.

FGSM BIM DeepFool CWL2
HAAM-g 0.96 0.99 0.54 0.50
HAAM-c 0.97 0.98 0.37 0.29
Table 3: Subjective visual quality comparisons between HAAMs and other adversarial methods.

Similarly, we perform the same procedure for other pairs of comparison methods. The ratios of HAAM are listed in Table 3. It’s shown that HAAM-g and HAAM-c outperform other methods in most cases, except comparing HAAM-c to DeepFool and CWL2.

Transferability Comparison

Transfer testing is one kind of black-box attack  [Papernot, McDaniel, and Goodfellow2016]. In reality, the target model is usually not accessible to attackers. So it’s a feasible way to use the adversarial examples generated by a substitute model to attack the target model. In our experiment, we take the Resnet50 as the source model to generate adversarial examples, then take all adversarial examples to attack SqueezeNet, VGG16 and Densenet121. We take the metric of to measure transferability.

We compare the transferability of adversarial examples generated using different adversarial methods under the condition of same visual quality. The visual quality are measured by the SSIM and Edge-SSIM metric. With the statistic analysis, we find there’s a huge difference among the SSIM or Edge-SSIM score distributions of four adversarial methods. Over 99% of adversarial examples generated using HAAM, DeepFool and CWL2 own SSIM scores in the range of and Edge-SSIM scores in the range of

. But the SSIM scores and Edge-SSIM scores of adversarial examples generated using FGSM and BIM nearly uniformly distribute in the range of

and respectively. To make an efficient comparison, we only consider the adversarial examples with SSIM scores in the range of and Edge-SSIM scores in the range of .

We split the range of SSIM and Edge-SSIM scores into three buckets uniformly. Then assign adversarial examples into corresponding buckets according to their SSIM or Edge-SSIM scores. For each bucket, we calculate the transfer rate using Eqn. (7).

Transfer rates of different adversarial methods are shown in Table 2 and Table 2. In terms of the SSIM metric, it shows a trend that adversarial examples with a lower SSIM score have higher transferability (high TR). HAAM outperforms other methods on the phase of high SSIM scores (), but underperforms BIM on the phase of low SSIM scores. This is because the growth of TR of HAAM is slower than BIM while the SSIM score decreasing. In terms of the Edge-SSIM metric, HAAM outperforms other adversarial methods on all phases of Edge-SSIM scores and on all target models. That means with a similar edge (laplacian) map quality, the adversarial examples from HAAM always have a higher transferability.

Failure Cases Analyses

(a) (a) HAAM-g ()
(b) (b) HAAM-g ()
(c) (c) HAAM-c ()
Figure 7: Adversarial examples that look unnatural to human vision.

We find that adversarial examples generated using HAAM with a large show unnatural patterns to human vision at some special scenes e.g. sky, sea. These images are often with a simple flat background, which will highlight the unnaturalness of perturbations.

So, we suggest that attacking these kinds of images with HAAM should choose small and meanwhile selecting the harmonic functions without regular pattens (e.g. stripe-like pattern with sine). Moreover, gray-scale perturbations are more suitable for this scenario than color perturbations.

Conclusion

In this paper, we propose a new adversarial attack method - HAAM. Different from previous adversarial methods, HAAM generate edge-free perturbations, that are less disruptive to human vision compared to noisy/edge-rich perturbations. Experimentally, we find the adversarial examples generated by HAAM strike a balance between the visual quality and transferability between models. In addition, we find the adversarial examples generated by HAAM can simulate some nature phenomena or real-life photographic effects, which can be useful for the improvement of current DNNs, such as designing data augmentations in order to make the models more robust in practice.

References