Adversarial Attacks against Deep Saliency Models

04/02/2019 ∙ by Zhaohui Che, et al. ∙ 0

Currently, a plethora of saliency models based on deep neural networks have led great breakthroughs in many complex high-level vision tasks (e.g. scene description, object detection). The robustness of these models, however, has not yet been studied. In this paper, we propose a sparse feature-space adversarial attack method against deep saliency models for the first time. The proposed attack only requires a part of the model information, and is able to generate a sparser and more insidious adversarial perturbation, compared to traditional image-space attacks. These adversarial perturbations are so subtle that a human observer cannot notice their presences, but the model outputs will be revolutionized. This phenomenon raises security threats to deep saliency models in practical applications. We also explore some intriguing properties of the feature-space attack, e.g. 1) the hidden layers with bigger receptive fields generate sparser perturbations, 2) the deeper hidden layers achieve higher attack success rates, and 3) different loss functions and different attacked layers will result in diverse perturbations. Experiments indicate that the proposed method is able to successfully attack different model architectures across various image scenes.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 4

page 8

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Human visual attention is an advanced internal mechanism for selecting informative and conspicuous regions from external visual stimuli. A plethora of saliency models based on deep neural networks [5, 3, 4, 6, 7] have been proposed in the past decades to predict human gaze by simulating biological attention mechanisms [8, 9, 10]

. Bottom-up saliency is an efficient front-end process to complex back-end high-level vision tasks such as scene understanding, finer-grained classification, object recognition, visual description, and security-related automatic drive applications

[11, 12, 13, 1, 2].

However, most of current deep neural networks are highly vulnerable to the adversarial examples [14], which are generated by adding some negligible but deliberate adversarial perturbations to the original image, as shown in Fig.1. The adversarial perturbation is so subtle that a human observer cannot notice its presence, but the model outputs will be revolutionized. This phenomenon raises security threats to deep networks in practical applications.

In this paper, we dig into the feature space of deep networks, and investigate the feature-space attack against saliency detection task. We ask and answer the following questions: Which layers can perform successful feature-space attack? What is the difference between perturbations generated by image-space and feature-space attacks? Where (and at which layer) does the output of the threat model start to spoil when we feed the adversarial example?, and How can we defend these attacks? To the best of our knowledge, this is the first work addressing adversarial attacks against saliency models.

1.1 Related works and concepts

Most of previous works mainly focused on image classification tasks [15, 16, 17, 18, 19]

, which aimed to fool the image classifiers to predict a wrong category label. Recently, Xie 

et al.  [20] extended the adversarial attack to semantic segmentation and object detection fields using a dense adversary generation method. Generating the adversarial examples for semantic segmentation and object detection tasks are much more difficult than classification task, because there are more threat targets to be attacked, e.g.  multiple pixels or proposals. Metzen et al.  [21] explored the universal adversarial perturbations for semantic segmentation task, and verified the existence of universal perturbations for segmentation models. Universal perturbations are input-agnostic, and are able to fool the deep networks on the majority of various input images. Zeng et al.  [22] investigated the subset of adversarial examples that correspond to meaningful changes in 3D physical properties (i.e.  rotation, translation, and illumination conditions), and pointed out that the adversarial attack in 3D physical space is more difficult than traditional 2D image space.

One explanation for adversarial example is that the subtle adversarial perturbation falls on some areas in the large and high-dimensional feature space which has not been explored in the training stage [20]. Compared to image classification task, saliency detection is a basic pixel-to-pixel translation problem, which provides an opportunity to directly observe the internal attention regions of deep networks across different levels of representations. Therefore, investigating the feature-space adversarial attack against the saliency models helps to understand the internal attention mechanism of deep networks, and may provide the potential possibilities to promote deep networks’ robustness.

Most of previous works perform the adversarial attacks from image-space. For semantic segmentation task [20, 21], the image-space attack generates the adversarial perturbation by calculating the loss between the predicted segmentation labels of the original image and the guide image in image space (i.e.  the predicted segmentation results have the shape of heightwidthchannels, normally channels=3 for RGB image), then back-propagates the gradients of image-space loss with respect to the original image through the entire threat model to generate the adversarial perturbation. Notably, the threat model means the pretrained model to be attacked. The adversarial attacks may have knowledge of the threat model including parameters and architectures, but are not allowed to modify the threat model [23].

Accordingly, the feature-space attack generates adversarial perturbation by computing the loss between the intermediate representations (i.e.  the shape of the intermediate representation is heightwidthchannels, where height and width represent the downsampled image size, while channels represents the extended channels, e.g.  channels=1024 in the convolutional layer of ResNet-101 [24]) between original image and guide image, then back-propagates the gradients through the layers before the attacked layer, rather than the entire network. Thus, the feature-space attack requires less information about the threat model, and manipulates the intermediate high-dimensional representations with smaller spatial resolutions but greater channels/dimensions.

Generally speaking, most of the adversarial attacks can be categorized along different dimensions. First, attacks can be classified by the type of desired output. In Targeted Attack Scenario, the attacks aim to change the threat model’s output toward some specific guide direction. In Nontargeted Attack Scenario, the attacks only need to destroy the original correct prediction, but the specific guide direction does not matter. Second, attacks can also be classified by the amount of knowledge that the attacks have already known about the threat model [14]. White Box Attacks have full knowledge of the entire threat model. Black Box Attacks have no knowledge about the threat model. Black Box Attacks with Probing do not have full information about the threat model, but can obtain a part of parameters and architectures of the threat model. In this paper, the proposed feature-space attack is a variant of the Black Box Attack with Probing, while the image-space attack is a White Box attack.

A previous work studied adversarial examples in the feature space for image classification task. Sabour et al.  [25] tried to fool the image classifier by minimizing the distance of the internal representations of the original image and the guide image by a non-iterative optimization method. However, this work has not revealed the sparsity of feature-space adversarial attack, and has not investigated the finer-grained relations between feature-space perturbation and its diversity, perceptibility and aggressivity.

1.2 Our contributions

Our contributions and lessons include:

  • We propose the targeted and nontargeted adversarial attack methods against deep saliency models. The proposed methods are able to perform successful attacks in both image-space and feature-space across different network architectures and various image scenes.

  • The proposed feature-space attack only requires a part of information of the threat model, and can completely change the output of entire model. The generated perturbation is sparser and more imperceptible compared to traditional image-space attack, as shown in Fig.1.

  • On the one hand, the success rate of feature-space attack is highly related to the depth of the attacked layer, i.e.  the deeper layers which detect more semantic information achieve the higher success rates, while the shallower layers which extract tiny texture and edges always fail to generate an effective adversarial perturbation. On the other hand, the perceptibility/sparsity of the generated perturbation is highly depended on the receptive field of the attacked layer, e.g.  when we attack the SalGAN [3] model which utilizes the classic encoder-decoder architecture, the context hidden layer between the encoder layers and the decoder layers, which has the biggest receptive field, achieves the sparsest adversarial perturbation compared to other layers and image-space attack.

  • For targeted attack scenario, when we fix the loss function, the perturbations from different attacked layers have discrepant patterns, but result in similar model predictions toward guide image. Besides, when we fix the position of attacked layer, the perturbations generated by different losses also result in similar outputs.

  • For the nontargeted attack scenario, different attacked layers and different loss functions will produce diverse adversarial perturbations, and result in totally different outputs, as shown in Fig.2.

  • For defending nontargeted attack, we find that the perturbations generated by image-space attack are able to mitigate the perturbations generated by feature-space attack to a certain extent, however, the perturbations generated by feature-space attack can not countervail the image-space attack. The targeted attack is more difficult to defend, because perturbations generated by different losses cannot countervail with each other.

2 Proposed Adversarial Attack Method

Figure 3: This diagram illustrates the main idea behind the proposed sparse feature-space adversarial attack method. In this figure, we select the SalGAN model [3], which utilizes a classic encoder-decoder architecture, as the threat model. Specifically, we perform the attack from the context hidden layer between encoder layers and decoder layers of SalGAN model, which obtains 1024 feature maps with small scale resolution of 68. This is because the context hidden layer has the biggest receptive field, and encodes the most important semantic information into the high-dimensional representations. We further uniformly select 32 feature maps from the 1024 feature maps of the adversarial example and the guide image respectively. Notably, the positions of the selected sparse feature maps from adversarial example and guide image are the same. Then we compute the losses between the selected sparse feature map pairs, and back-propagate the gradients of the feature-space loss with respect to the adversarial example through the known layers before the attacked layer. This way, we generate a subtle adversarial perturbation, which thoroughly changes the final prediction of the entire threat model. This method can be extended to attack any layer. When we attack the final output layer, this method becomes the image-space attack. The sparsity of the proposed method has two meanings. First, we select a fraction of feature maps from hundreds of feature maps of the attacked layer to perform the attack. Second, the generated adversarial perturbation is sparse and visually imperceptible.

2.1 Targeted Attack Method

Unlike traditional attacks for image classification [15, 16, 17] and semantic segmentation [20] tasks, which aim to change the predicted category label of the entire image or pixels/proposals, we perform the attack by reducing the distance between the high-dimensional representations of the adversarial example and the guide image. The general idea behind the proposed method is shown in Fig.3.

To generate the targeted adversarial perturbation against the saliency detection model, the goal is distracting the model attention from the salient regions of the original image towards the salient regions of the guide image, at the same time, keeping that the adversarial example looks almost the same as the original image, i.e.  :

(1)

where means the threat model, while and represent the predicted saliency maps of the guide image G and the adversarial example . In addition, and represent the high-dimensional representations of G and , which are selected from the convolution layer of . Besides, measures the distance between high-dimensional representations of the adversarial example and the guide image, while measures the distance between model predictions of the adversarial example and the guide image, and means the perceptual similarity between original input I and the adversarial example . The smaller values of , and mean the higher similarity.

In most cases, we cannot obtain the full information about the threat model. Suppose that we can only obtain the information of some front-end layers of the threat model, we can perform the attack from a known convolution layer as shown in Fig.3. Specifically, we feed the adversarial example into the threat model , and obtain the feature maps from the hidden layer . Next, we select a fraction of feature maps from , e.g.  we uniformly select 32 feature maps from the 1024 feature maps of the context hidden layer of SalGAN model, as shown in Fig.3. In Section-3.3, we further discuss how many sparse feature maps should we select to guarantee a successful attack. Similarly, we select 32 sparse feature maps of guide image G. Importantly, the position of each feature map in is the same as that of , because we tried to permute and mismatch the feature maps from and , but failed to attack the model. Then, we calculate the feature-space loss between and :

(2)

where =32 is the amount of channels of selected sparse feature maps, and is the feature map within . KL

represents the Kullback-Leibler Divergence metric

[26] which is wildly used to measure the similarity between two saliency maps, and the smaller KL score means the higher similarity. We can adopt some other loss functions to replace KL metric, e.g.  Pearson’s Linear Coefficient (CC) metric [26], Normalized Scanpath Saliency (NSS) metric [27], and distance. We further compare the performance of difference losses in Section-3.2.

By minimizing , especially when attacking the layers which extract the high-level semantic information, we can reduce the distance between high-dimensional representations of the adversarial example and the guide image. This way, the final model prediction will be modified towards the salient regions of guide image. Surprisingly, by attacking some layers with big receptive fields, we can generate the much sparser perturbations compared to attacking final output layer, because the high-dimensional representations in these layers are able to distract model attention to the explicit target salient region, while keeping the smooth background region undisturbed. In other word, the final model prediction is highly depended on some intermediate hidden layers. If we can obtain the information of these layers, we can easily destroy the entire network. We further explore which hidden layers can generate sparse and valid perturbations in Section-3.3.

We use an iterative gradient descent optimization method to generate the targeted adversarial example. We set the initial perturbation , and set the initial adversarial example . For the iteration, we obtain:

(3)

where is the step size to control the magnitude of the gradient descent. And is a scaling hyper-parameter to limit the intensity of perturbation. is a very small positive number, which is used to guarantee the divisor nonzero. The targeted attack algorithm terminates until that the model prediction of the adversarial example is almost the same as that of guide image, i.e.  , where is the termination threshold, or it reaches the maximum iteration number.

2.2 Nontargeted Attack Method

In nontargeted attack scenario, the goal is distracting the model attention away from the salient regions of the original input, and there is no explicit guide image. We achieve this by maximizing the distance between high-dimensional representations of original input and the adversarial example, while keeping that the original and the adversarial example look the same. Similar to the targeted attack, we perform the nontargeted attack as follows:

(4)

The nontargeted attack algorithm terminates until that the distance between the model predictions of adversarial example and original image are large enough, i.e.  .

0:    original image I; guide image G; the threat model ; the index of the intermediate layer of to be attacked; the amount of channels of selected feature maps; the step size of iterative gradient descent ; the maximum iterations X

; the saliency map evaluation metric

;the termination thresholds and for targeted and nontargeted attacks, respectively;
0:    the adversarial example ;
1:  Initialization: ; ; ;
2:  if (Targeted Attack) then
3:      Select sparse feature maps from ;
4:      while (   or   do
5:          Select feature maps from ;
6:          ;
7:          ;
8:          ;
9:          ;
10:      end while
11:  else
12:      Select sparse feature maps from ;
13:      while (   or   do
14:          Select feature maps from ;
15:          ;
16:          ;
17:          ;
18:          ;
19:      end while
20:  end if
21:  return  .
Algorithm 1 : Sparse Feature-Space Adversarial Attack

3 Experiments and Discussions

Threat Model Base Network Attacked Layer Original Performance Targeted Attack Nontargeted Attack
SALICON [4] multi-stream VGG-16 conv5-3-fine 0.748, 0.726, 0.723 0.279, 0.501, 0.469 -0.262, 0.390, 0.041
SALICON [4] multi-stream GoogleLenet inception4-fine 0.755, 0.731, 0.742 0.288, 0.514, 0.457 -0.255, 0.385, 0.042
SALICON [4] multi-stream AlexNet conv5-1-fine 0.719, 0.705, 0.691 0.291, 0.520, 0.463 -0.280, 0.382, 0.039
SALICON [4] single-stream VGG-16 conv5-3-coarse 0.700, 0.695, 0.714 0.266, 0.482, 0.428 -0.307, 0.346, 0.035
GazeGAN [29] single-stream U-Net decoder1-coarse 0.796, 0.740, 0.764 0.388, 0.505, 0.513 -0.424, 0.371, 0.060
GazeGAN [29] multi-stream U-Net decoder1-fine 0.808, 0.743, 0.769 0.419, 0.511, 0.534 -0.356, 0.382, 0.069
SalGAN [3] single-stream VGG-19 conv5-3-coarse 0.703, 0.707, 0.713 0.365, 0.502, 0.468 -0.471, 0.354, 0.055
SalGAN [3] multi-stream VGG-19 conv5-3-fine 0.719, 0.715, 0.722 0.373, 0.511, 0.486 -0.454, 0.360, 0.058

Global pix2pix

[30]
single-stream ResNet res9-coarse 0.766, 0.729, 0.748 0.289, 0.498, 0.469 -0.164, 0.475, 0.046
Local pix2pix[30] multi-stream ResNet res3-fine 0.773, 0.737, 0.751 0.313, 0.504, 0.435 -0.188, 0.490, 0.048
DVA [31] single-stream VGG-16 conv5-3-coarse 0.774, 0.733, 0.754 0.308, 0.514, 0.517 -0.400, 0.415, 0.054
DVA [31] multi-stream VGG-16 conv5-3-fine 0.782, 0.736, 0.760 0.312, 0.519, 0.522 -0.373, 0.454, 0.057
Table 1: Performance of the proposed feature-space attack method on different threat model architectures. For the - columns, from left to right, the performance scores represent CC, sAUC, and SIM metrics [28] respectively, which measure the similarity between model predictions and human fixation ground truth. The higher CC, sAUC and SIM values mean the better performance. The column represents the intermediate layer of the threat model to be attacked. The , and columns represent the performance of threat model on original image, the adversarial examples generated by targeted and nontargeted attacks, respectively.
(a) Targeted Attack by KL loss
(b) Targeted Attack by CC loss
(c) Targeted Attack by NSS loss
(d) Targeted Attack by L loss
(e) Nontargeted Attack by KL loss
(f) Nontargeted Attack by CC loss
(g) Nontargeted Attack by NSS loss
(h) Nontargeted Attack by L loss
Figure 4: The convergence of the proposed attack when using different losses.

3.1 Threat Model and Attack Performance

As shown in Table 1, we adopt 5 state-of-the-art deep networks and their variants to test the proposed method. These thread models are based on different base networks, and have different architectures. All of the threat models are trained and tested on the SALICON dataset [32]. More details about the architectures and parameters of these threat models are provided in the Supplementary Material.

Experiments indicate that the proposed targeted and nontargeted attacks cause the significant performance drops for different saliency models. Hendrycks et al.  [33] pointed out that multi-scale architectures achieve better robustness by propagating features across different scales at each layer rather than slowly gaining a global representation of the input as in traditional neural networks. For verifying this point, we further extend some existing single-stream saliency models to the multi-scale architectures, e.g.  SalGAN [3], GazeGAN [29], and DVA [31]. Specifically, we duplicate the single-stream network as two sub-networks, i.e.  a coarse sub-network and a fine sub-network. Then we feed the original image and the downsampled image into the fine and coarse sub-networks, respectively. Finally, we concatenate the feature maps from coarse sub-network with the feature maps from fine sub-network in channel direction to predict final saliency map. However, experiments in Table 1 indicate that the multi-scale architectures are also highly vulnerable to adversarial attacks on saliency detection task.

3.2 Convergence and Losses

The convergences of proposed attack when using different losses are shown in Fig.4. For targeted attack, the losses measure the similarity between high-dimensional representations of guide image and the adversarial example, where the smaller losses mean the better similarity. We notice that, all of the losses decrease rapidly in the first 20 iterations, then go smoothly in the last iterations. KL and CC converge to the theoretical boundary values, i.e.  KL=0 and CC=1. For nontargeted attack, the losses measure the similarity between the representations of original image and the adversarial example. KL, CC and L losses increase rapidly in the first 20 iterations, while NSS loss decreases first, then increases rapidly and approximates the boundary value, i.e.  NSS=0. This is because NSS metric calculates mean value of the normalized saliency map at fixation locations, and is sensitive to the false positives [28]. In beginning iterations, the model attention on some secondary salient regions is distracted to the most salient regions, thus the mean value at obvious fixation locations is increased, i.e.  -NSS value decreases. In later iterations, the model attention is further distracted away from the original salient regions until there is no overlapping regions with original fixation locations, i.e.  NSS=0. We provide more visualizations about the convergence process in the Supplementary Material.

3.3 Perceptibility, Agressivity, and Attacked Layer

In this section, we explore the finer-grained correlation between the properties of attacked layer (i.e.  layer depth and receptive field), and the quality (i.e.  perceptibility and agressivity) of perturbations generated by this layer.

We first investigate the relationship between the targeted attack performance and the depth of attacked layer, as shown in Fig.5. We adopt CC, sAUC, SIM and AUC-Borji metrics to measure the similarity between the predicted results of adversarial example and guide image. Thus, the higher CC, sAUC, SIM and AUC-Borji scores represent the better targeted attack performance. We notice that, in general, the deeper hidden layers achieve the better attack performance compared to the shallower layers. Besides, the attack performance will be greatly promoted at some intermediate hidden layers. For SALICON model, the layer significantly increase the attack performance, because SALICON adopts a coarse-scale and a fine-scale sub-networks, and the layer concatenates the feature maps from two sub-networks. This indicates that, for multi-scale architecture, attacking the layers of front-end sub-networks has slight impact on final model prediction, while attacking the deeper layers that pooling the features from sub-networks will revolutionize the final output. For GazeGAN, the layer achieves a satisfying attack performance compared to the deeper layers including the image-space attack (i.e.  the layer). In other word, if we obtain the information of the - layers of GazeGAN, we can change the output of entire model completely.

(a) GazeGAN Model
(b) SALICON Model
Figure 5: The relationship between the targeted attack performance and the depth of the attacked layer.
(a) GazeGAN Model
(b) SALICON Model
Figure 6: The relationship between perceptibility of the adversarial perturbation and receptive field of the attacked layer.

Next, we investigate the relationship between perceptibility of generated perturbation and the receptive field of attacked layer, as shown in Fig.6. We use two metrics to evaluate the perceptibility of adversarial perturbation, i.e.  SSIM and 2-norm distance. SSIM [34] is a popular quality metric to measure the perceptual similarity of original image and adversarial example, and 2-norm distance [23] measures the magnitude of perturbation. The higher SSIM and the lower 2-norm distance represent the lower perceptibility. Experiments indicate that the layers with the bigger receptive fields generate more imperceptible perturbations.

(a) GazeGAN Model
(b) SALICON Model
Figure 7: This figure explores how many sparse feature maps are able to perform a successful attack for a certain hidden layer.
(a) Feature-space Attack
(b) Image-space Attack
Figure 8: This figure explores where (at which layers) does the threat model (SalGAN) start to spoil when suffering from attacks.

Then, we explore that, for a certain hidden layer, how many sparse feature maps are qualified to perform a successful feature-space attack. The correlation between the attack performance and the amount of channels of sparse feature-maps is shown in Fig.7. Specifically, we select 1, 8, 16 , 32, … feature maps from the 1024 feature maps of the () hidden layer of GazeGAN (SALICON) model, respectively. Then we compare the attack performance caused by different amount of feature channels. We notice that, for GazeGAN, 32 feature maps can generate a satisfying perturbation compared to using all 1024 feature maps. For SALICON, 64 feature maps can perform a satisfying attack, but there is still a gap compared to using all 1024 feature maps. In other word, GazeGAN is easier to be attacked compared to SALICON, because there are too many similar and residual representations in feature space of GazeGAN. However, the performance of SALICON on clean images is worse than GazeGAN. We borrow the conclusion proposed by Su et al.  [35] to explain this case: For image classification models, there is a clear trade-off between accuracy and robustness, and a better performance in testing accuracy reduces robustness. For saliency detection task, most of deep saliency models adopt similar base networks used in classification task as feature extractors.

Finally, we investigate where (at which layer) does the threat model start to spoil when suffering from feature-space and image-space attacks, as shown in Fig.8. For each hidden layer, we adopt SSIM and CC metrics to measure the similarity between intermediate feature maps of original image and adversarial example. The lower SSIM and CC values mean that the model attention has been distracted away from the salient regions of original image. We unify the resolution (i.e.  heightwidth) of feature maps from different layers as 6080, for fair comparison. We find that, SalGAN model starts to spoil from the layer, which is a context hidden layer with a big receptive field, and extracts more high-level semantic information rather than low-level texture and edges. Besides, the feature-space and image-space attacks will destroy the SalGAN model at a similar position, i.e.  the hidden layer here.

3.4 Transferability and Countervailing Relation

Fig.9 shows the transferability of proposed feature-space attack across different models. We adopt CC metric to measure the performance drop caused by different perturbations generated by different models. The higher performance drop represents a better transferability. Experiments indicate that transferability between different networks is weak, and the perturbations generated by other networks only result in a slight performance drop. This case is consistent with the attacks for segmentation and object detection tasks [20]. Besides, for single-stream and multi-scale models which utilize similar base networks, i.e.  DVA DVA, SalGAN SalGAN, GazeGAN GazeGAN, the transferability is still weak. This indicates that the validity of proposed perturbation is highly depended on the thread model, including base network and architecture.

In Fig.10, we explore weather the perturbations generated by different losses (or spaces) can mitigate each other. Specifically, we select one type of loss to generate a perturbation from image/feature space, and subtract it from the adversarial examples generated by unknown attacks. Then we adopt CC metric to measure the performance drop caused by the modified adversarial example. GazeGAN serves as threat model here. Experiments indicate that, for targeted attack, perturbations generated by KL and CC losses from image-space can mitigate each other, but the other perturbations have discrepant patterns and are not able to countervail with each other. For nontargeted attack, the perturbations generated by KL and CC losses from image-space are able to mitigate other perturbations generated by different losses from both image and feature spaces to a certain extent. Besides, the feature-space perturbations can mitigate the perturbations generated by feature-space attack (i.e.  regions in the yellow rectangle in Fig.10 (b)), but have slight impact on the perturbations from image-space (i.e.  regions in the red rectangle in Fig.10 (b)). This indicates that the perturbations from feature-space have more similar patterns compared to perturbations from image-space. Besides, the perturbations from image-space are much denser, and can be harnessed to mitigate the sparser feature-space perturbations to a certain extent.

We perform a similar experiment as [20], which is randomly permuting the rows or columns of the adversarial perturbation. We find that the permuted perturbation fails to attack the models, indicating that the spatial structure of the perturbation is critical to the adversarial attack. However, we can not use this method to defend attack, because it also destroys the spatial information of original image.

(a) Targeted Attack
(b) Nontargeted Attack
Figure 9: The transferability of proposed attack across different models. The vertical axis represents the target threat models, while the horizontal axis represents the source threat models. We use the perturbations generated by source models to attack the target models.
(a) Targeted Attack
(b) Nontargeted Attack
Figure 10: The countervailing relationship between different perturbations generated by different losses (Mix means the linear combination of KL, CC, NSS and L losses), and generated by image-space and feature-space attacks. The vertical axis represents the target losses, while the horizontal axis represents the source losses. We use the perturbations generated by source losses to mitigate perturbations generated by target losses.

4 Conclusion

We propose a sparse feature-space adversarial attack method against saliency models for the first time. The proposed method generates a sparser and more visually imperceptible adversarial perturbation. Besides, it only requires partial information regarding the threat model, and completely changes the final prediction of the entire network. We further verify that the quality (i.e.  aggressivity, perceptibility and diversity) of the generated adversarial perturbation is related to loss functions, the depth and receptive field of attacked layer, and the amount of channels of sparse feature maps. Our work also provides lessons for devising defense method in the future: First, a good defense method should consider the category of perturbations, because different losses and attacked layers will result in discrepant perturbation patterns. Second, defense method should discriminate the validity of perturbations in advance, because some invalid perturbations cannot disturb the model prediction at all, e.g.  the perturbations generated by shallower hidden layers or generated by other model architectures.

References

  • [1] S. C. Yang and Y. L. Hsu. Full speed region sensorless drive of permanent-magnet machine combining saliency-based and back-emf-based drive. IEEE transactions on Industrial Electronics, Vol. 64, No. 2, pp.1092-1101, 2017.
  • [2] S. Alletto, A. Palazzi, F. Solera, S. Calderara, and R. Cucchiara. Dr (eye) ve: a dataset for attention-based tasks with applications to autonomous and assisted driving. In

    Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops (pp. 54-60).

    , pp. 54-60, 2016.
  • [3] J. Pan, C. Canton, K. McGuinness, and et al. Salgan: Visual saliency prediction with generative adversarial networks. In arXiv preprint arXiv:1701.01081, 2017.
  • [4] X. Huang, C. Shen, X. Boix, and Q. Zhao. Salicon: Reducing the semantic gap in saliency prediction by adapting deep neural networks. In Proceedings of IEEE International Conference on Computer Vision, pp. 262-270, 2015.
  • [5] M. Cornia, L. Baraldi, G. Serra, and R. Cucchiara. A deep multi-level network for saliency prediction. In Proceedings of IEEE International Conference on Pattern Recognition, pp. 3488-3493, 2016.
  • [6] J. Pan, K. McGuiness, E. Sayrol, N. Conner, and et al. Shallow and deep convolutional networks for saliency prediction. In Proceedings of IEEE International Conference on Computer Vision and Pattern Recognization, pp. 598-606, 2016.
  • [7] M. Cornia, L. Baraldi, G. Serra, and et al.

    Predicting human eye fixations via an lstm-based saliency attentive model.

    IEEE Transactions on Image Processing, Vol. 27, No. 10, pp. 5142-5154, 2018.
  • [8] A. Borji and L. Itti. State-of-the-art in visual attention modeling. IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 35, No. 1, pp. 185-207, 2013.
  • [9] A. Borji, D. N. Sihite, and L. Itti. Quantitative analysis of human-model agreement in visual saliency modeling: A comparative study. IEEE Transactions on Image Processing, Vol. 22, No. 1, pp. 55-69, 2013.
  • [10] A. Borji, H. R. Tavakoli, D. N. Sihite, and L. Itti. Analysis of scores, datasets, and models in visual saliency prediction. In Proceedings of IEEE International Conference on Computer Vision, pp. 921-928, 2013.
  • [11] A. Oliva, A. Torralba, M. S. Castelhano, and J. M. Henderson. Top-down control of visual attention in object detection. In Proceedings of IEEE International Conference on Image Processing, pp. I-253-I-257, 2003.
  • [12] S. Frintrop. A visual attention system for object detection and goal-directed search. Springer, 2005.
  • [13] A. Mishra, Y. Aloimonos, and C. L. Fah. Active segmentation with fixation. In Proceedings of IEEE 12th International Conference on Computer Vision, pp. 468-475, 2009.
  • [14] A. Kurakin, I. Goodfellow, S. Bengio, Y. Dong, F. Liao, M. Liang, J. Wang, and etc. Adversarial attacks and defences competition. In In The NIPS’17 Competition: Building Intelligent Systems, Springer, pp. 195-231, 2018.
  • [15] I. Goodfellow, J. Shlens, and C. Szegedy. Explaining and harnessing adversarial examples. In Proceedings of the International Conference on Learning Representations, 2015.
  • [16] C. Szegedy, W. Zaremba, I. Sutskever, J. Bruna, D. Erhan, I. J. Goodfellow, and R. Fergus. Intriguing properties of neural networks. In Proceedings of the International Conference on Learning Representations, 2014.
  • [17] A. Kurakin, I. Goodfellow, and S. Bengio. Adversarial examples in the physical world. In Proceedings of the Workshop of International Conference on Learning Representations, 2016.
  • [18] N. Carlini and D. Wagner. Towards evaluating the robustness of neural networks. In Proceedings of the IEEE Symposium on Security and Privacy, 2017.
  • [19] S. Baluja and I. Fischer.

    Adversarial transformation networks.

    2017.
  • [20] C. Xie, J. Wang, Z. Zhang, Y. Zhou, and L. Xie an A. Yuille. Adversarial examples for semantic segmentation and object detection. In Proceedings of the IEEE International Conference on Computer Vision, pp. 1369-1378, 2017.
  • [21] J. H. Metzen, M. C. Kumar, T. Brox, and V. Fischer. Universal adversarial perturbations against semantic image segmentation. In Proceedings of the IEEE International Conference on Computer Vision, pp. 2774-2783, 2017.
  • [22] X. Zeng, C. Liu, Y. Wang, W. Qiu, L. Xie, Y. Tai, and A. Yuille. Adversarial attacks beyond the image space. In Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition, 2019.
  • [23] X. Yuan, P. He, Q. Zhu, and X. Li.

    Adversarial examples: Attacks and defenses for deep learning.

    IEEE transactions on neural networks and learning systems, 2019.
  • [24] K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition. In Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition, pp. 770-778, 2016.
  • [25] S. Sabour, Y. Cao, F. Faghri, and D. Fleet. Adversarial manipulation of deep representations. In Proceedings of the International Conference on Learning Representations, 2016.
  • [26] Z. Bylinskii, T. Judd, A. Borji, L. Itti, F. Durand, A. Oliva, and A. Torralba. Mit saliency benchmark. http://saliency.mit.edu/.
  • [27] R. Peters, A. Iyer, L. Itti, and C. Koch. Components of bottom up gaze allocation in natural images. Visual Research, Vol. 45, No. 8, pp. 2397 C2416, 2005.
  • [28] Z. Bylinskii, T. Judd, A. Oliva, A. Torralba, and F. Durand. What do different evaluation metrics tell us about saliency models? In arXiv preprint cs.CV, 2017.
  • [29] Z. Che, A. Borji, G. Zhai, G. Guo, and P.L. Callet. Gazegan: Invariance analysis and a robust new model. https://github.com/CZHQuality/Sal-CFS-GAN, 2019.
  • [30] T. C. Wang, M. Y. Liu, J. Y. Zhu, A. Tao, J. Kautz, and B. Catanzaro. High-resolution image synthesis and semantic manipulation with conditional gans. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 8798-8807, 2018.
  • [31] W. Wang and J. Shen. Deep visual attention prediction. IEEE Transactions on Image Processing, Vol.27, No.6, pp. 2368-2378, 2018.
  • [32] M. Jiang, S. Huang, J. Duan, and Q. Zhao. Salicon: Saliency in context. In Proceedings of IEEE International Conference on Computer Vision and Pattern Recognition, pp. 1072-1080, 2015.
  • [33] D. Hendrycks and T. G. Dietterich. Benchmarking neural network robustness to common corruptions and surface variations. In arXiv preprint arXiv:1807.01697, 2018.
  • [34] Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli. Image quality assessment: from error visibility to structural similarity. IEEE Transactions on Image Processing, Vol. 13, No. 4, pp.600-612, 2004.
  • [35] D. Su, H. Zhang, H. Chen, J. Yi, P.Y. Chen, and Y. Gao. Is robustness the cost of accuracy?–a comprehensive study on the robustness of 18 deep image classification models. In Proceedings of the European Conference on Computer Vision, pp. 631-648, 2018.