Characterizing Adversarial Examples Based on Spatial Consistency Information for Semantic Segmentation

10/11/2018 ∙ by Chaowei Xiao, et al. ∙ 0

Deep Neural Networks (DNNs) have been widely applied in various recognition tasks. However, recently DNNs have been shown to be vulnerable against adversarial examples, which can mislead DNNs to make arbitrary incorrect predictions. While adversarial examples are well studied in classification tasks, other learning problems may have different properties. For instance, semantic segmentation requires additional components such as dilated convolutions and multiscale processing. In this paper, we aim to characterize adversarial examples based on spatial context information in semantic segmentation. We observe that spatial consistency information can be potentially leveraged to detect adversarial examples robustly even when a strong adaptive attacker has access to the model and detection strategies. We also show that adversarial examples based on attacks considered within the paper barely transfer among models, even though transferability is common in classification. Our observations shed new light on developing adversarial attacks and defenses to better understand the vulnerabilities of DNNs.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Deep Neural Networks (DNNs) have been shown to be highly expressive and have achieved state-of-the-art performance on a wide range of tasks, such as speech recognition [20], image classification [24], natural language understanding [54], and robotics [32]. However, recent studies have found that DNNs are vulnerable to adversarial examples [38, 17, 31, 47, 45, 40, 9, 8, 7]. Such examples are intentionally perturbed inputs with small magnitude adversarial perturbation added, which can induce the network to make arbitrary incorrect predictions at test time, even when the examples are generated against different models [27, 5, 33, 46]. The fact that the adversarial perturbation required to fool a model is often small and (in the case of images) imperceptible to human observers makes detecting such examples very challenging. This undesirable property of deep networks has become a major security concern in real-world applications of DNNs, such as self-driving cars and identity recognition systems [16, 37]. Furthermore, both white-box and black-box attacks have been performed against DNNs successfully when an attacker is given full or zero knowledge about the target systems [2, 17, 45]. Among black-box attacks, transferability is widely used for generating attacks against real-world systems which do not allow white-box access. Transferability refers to the property of adversarial examples in classification tasks where one adversarial example generated against a local model can mislead another unseen model without any modification [33].

Given these intriguing properties of adversarial examples, various analyses for understanding adversarial examples have been proposed [29, 30, 43, 42], and potential defense/detection techniques have also been discussed mainly for the image classification problem [13, 21, 30]. For instance, image pre-processing [14], adding another type of random noise to the inputs [48], and adversarial retraining [17]

have been proposed for defending/detecting adversarial examples when classifying images. However, researchers 

[4, 19]

have shown that these defense or detection methods are easily attacked again by attackers with or even without knowledge of the defender’s strategy. Such observations bring up concerns about safety problems within diverse machine learning based systems.

In order to better understand adversarial examples against different tasks, in this paper we aim to analyze adversarial examples in the semantic segmentation task instead of classification. We hypothesize that adversarial examples in different tasks may contain unique properties that provide in-depth understanding for such examples and encourage potential defensive mechanisms. Different from image classification, in semantic segmentation, each pixel will be given a prediction label which is based on its surrounding information [12]. Such spatial context information plays a more important role for segmentation algorithms, such as [50, 55, 26, 23]. Whether adversarial perturbation would break such spatial context is unknown to the community. In this paper we propose and conduct image spatial consistency analysis, which randomly selects overlapping patches from a given image and checks how consistent the segmentation results are for the overlapping regions. Our pipeline of spatial consistency analysis for adversarial/benign instances is shown in Figure  1. We find that in segmentation task, adversarial perturbation can be weakened for separately selected patches, and therefore adversarial and benign images will show very different behaviors in terms of the spatial consistency information. Moreover, since such spatial consistency is highly random, it is hard for adversaries to take such constraints into account when performing adaptive attacks. This renders the system less brittle even facing the sophisticated adversaries, who have full knowledge about the model as well as the detection/defense method applied..

We use image scale transformation to perform detection of adversarial examples as a baseline, which has been used for detection in classification tasks [39]. We show that by randomly scaling the images, adversarial perturbation can be destroyed and therefore adversarial examples can be detected. However, when the attacker knows the detection strategy (adaptive attacker), even without the exact knowledge about the scaling rate, attacker can still perform adaptive attacks against the detection mechanism, which is similar with the findings in classification tasks [4]. On the other hand, we show that by incorporating spatial consistency check, existing semantic segmentation networks can detect adversarial examples (average AUC 100%), which are generated by the state-of-the-art attacks considered in this paper, regardless of whether the adversary knows the detection method. Here, we allow the adversaries to have full access to the model and any detection method applied to analyze the robustness of the model against adaptive attacks. We additionally analyze the defense in a black-box setting, which is more practical in real-world systems.

In this paper, our goal is to further understand adversarial attacks by conducting spatial consistency analysis in the semantic segmentation task, and we make the following contributions:

  1. We propose the spatial consistency analysis for benign/adversarial images and conduct large scale experiments on two state-of-the-art attack strategies against both DRN and DLA segmentation models with diverse adversarial targets on different dataset, including Cityscapes and real-world autonomous driving video dataset.

  2. We are the first to analyze spatial information for adversarial examples in segmentation models. We show that spatial consistency information can be potentially leveraged to distinguish adversarial examples. We also show that spatial consistency check mechanism induce a high degree of randomness and therefore is robust against adaptive adversaries. We evaluate image scaling and spatial consistency, and show that spatial consistency outperform standard scaling based method.

  3. In addition, we empirically show that adversarial examples generated by the attack methods considered in our studies barely transfer among models, even when these models are of the same architecture with different initialization, different from the transferability phenomena in classification tasks.

Figure 1: Spatial consistency analysis for adversarial and benign instances in semantic segmentation.

2 Related work

Semantic Segmentation

has received long lasting attention in the computer vision community 

[25]

. Recent advances in deep learning 

[24] also show that deep convolutional networks can achieve much better results than traditional methods [28]. Yu et al. [50] proposed using dilated convolutions to build high-resolution feature maps for semantic segmentation. They can improve the performance significantly compared to upsampling approaches [28, 34, 1]. Most of the recent state-of-the-art approaches are based on dilated convolutions [51, 55, 44] and residual networks [18]. Therefore, in this work, we choose dilated residual networks (DRN) [51] and deep layer aggregation (DLA) [52] as our target models for attacking and defense.

Adversarial Examples for Semantic Segmentation

have been studied recently in addition to adversarial examples in image classification. Xie et al. proposed a gradient based algorithm to attack pixels within the whole image iteratively until most of the pixels have been misclassified into the target class [49]

, which is called dense adversary generation (DAG). Later an optimization based attack algorithm has been studied by introducing a surrogate loss function called Houdini in the objective function 

[10]

. The Houdini loss function is made up of two parts. The first part represents the stochastic margin between the score of actual and predicted targets, which reflects the confidence of model prediction. The second part is the task loss, which is independent with the model and corresponds to the actual task. The task loss enables Houdini algorithm to generate adversarial examples in different tasks, including image segmentation, human pose estimation, and speech recognition.

Various detection and defense methods have also been studied against adversarial examples in image classification. For instance, adversarial training [17] and its variations [41, 30] have been proposed and demonstrated to be effective in classification task, which is hard to adapt for the segmentation task. Currently no defense or detection methods have been studied in image segmentation.

(a) Cityscapes
(b) BDD
Figure 2: Samples of benign and adversarial examples generated by Houdini on Cityscapes [11] (targeting on Kitty/Pure) and BDD100K [53] (targeting on Kitty/Scene). We select DRN as our target model here. Within each subfigure, the first column shows benign images and corresponding segmentation results, and the second and third columns show adversarial examples with different adversarial targets.

3 Spatial Consistency Based Method

In this section, we will explore the effects that spatial context information has on benign and adversarial examples in segmentation models. We conduct different experiments based on various models and datasets, and due to the space limitation, we will use a small set of examples to demonstrate our discoveries and relegate other examples to the supplementary materials. Figure 2 shows the benign and adversarial examples targeting diverse adversarial targets: “Hello Kitty” (Kitty) and random pure color (Pure) on Cityscapes; and “Hello Kitty” (Kitty) and a real scene without any cars (Scene) on BDD video dataset, respectively. In the rest of the paper, we will use the format “attack method | target” to label each adversarial example. Here we consider both DAG [49] and Houdini [10] attack methods.

(a) Benign example
(b) Heatmap of benign image
(c) DAG | Kitty
(d) DAG | Pure
(e) Houdini | Kitty
(f) Houdini | Pure
Figure 3: Heatmap of per-pixel self-entropy on Cityscapes dataset against DRN model. (a) and (b) show a benign image and its corresponding per-pixel self-entropy heatmap. (c)-(f) show the heatmaps of the adversarial examples generated by DAG and Houdini attacks targeting “Hello Kitty” (Kitty) and random pure color (Pure).
Figure 4: Examples of spatial consistency based method on adversarial examples generated by DAG and Houdini attacks targeting on Kitty and Pure. First column shows the original image and corresponding segmentation results. Column and show two randomly selected patches, while column and represent the segmentation results of the overlapping regions from these two patches, respectively. The mIOU between and are reported. It is clear that the segmentation results of the overlapping regions from two random patches are very different for adversarial images (low mIOU), but relatively consistent for benign instance (high mIOU).

3.1 Spatial Context Analysis

To quantitatively analyze the contribution of spatial context information to the segmentation task, we first evaluate the entropy of prediction based on different spatial context. For each pixel within an image, we randomly select patches which contain . Afterwards, within each patch , the pixel

will be assigned with a confidence vector based on Softmax prediction, so pixel

will correspond to vectors in total. We discretize each vector to a one-hot vector and sum up these one-hot vectors to obtain vector . Each component of the vector represents the number of times pixel is predicted to be class . We then normalize by dividing . Finally, for each pixel , we calculate its self-entropy

and therefore calculate the self entropy for each vector. We utilize such entropy information of each pixel to convey the consistency of different surrounding patches and plot this information in the heatmaps in Figure 3. It is clear that for benign instances, the boundaries of original objects have higher entropy, indicating that these are places harder to predict and can gain more information by considering different surrounding spatial context information.

3.2 Patch Based Spatial Consistency

The fact that surrounding spatial context information shows different spatial consistency behaviors for benign and adversarial examples motivates us to perform the spatial consistency check hoping to potentially tell these two data distributions apart.

First, we introduce how to generate overlapping spatial contexts by selecting random patches and then validate the spatial consistency information. Let be the patch size and be the width and height of an image . We define the first and second patch based on the coordinates of their top-left and bottom-right vertices , where Let be displacement between the top-left coordinate of the first and second patch: . To guarantee that there is enough overlap, we require and to be in the range . Here we randomly select the two patches, aiming to capture diverse enough surrounding spatial context, including information both near and far from the target pixel. The patch selection algorithm (getOverlapPatches) is shown in supplementary materials.

Next we show how to apply the spatial consistency based method to a given input and therefore recognize adversarial examples. The detailed algorithm is shown in Algorithm 1. Here denotes the number of overlapping regions for which we will check the spatial consistency. We use the mean Intersection Over Union (mIOU) between the overlapping regions , from two patches , to measure their spatial consistency. The mIOU is defined as , where denotes the number of pixels predicted to be class in and class in , and is the number of the unique classes appearing in both and . is a function that computes the mIOU given patches , along with their overlapping regions and shown in supplementary materials.

input:
Input image ;
number of overlapping regions ;
patch size ;
segmentation model ;
bound ;
output:
Spatial consistency threshold ;
[], ;
1 for  to  do
2       ;
3       ;
4       /* get prediction result of two random patches from */;
5       ;
6       /* get prediction of the overlap area between two patches */;
7       ;
8       ;
9       /* get consistency value (mIOU) from two patches */;
10       ;
11      
12 end for
13;
Return: c
Algorithm 1 Spatial Consistency Check Algorithm

4 Scale Consistency Analysis

We have discussed how spatial consistency can be utilized to potentially characterize adversarial examples in segmentation task. In this section, we will discuss another baseline method: image scale transformation, which is another natural factor considered in semantic segmentation [22, 28]. Here we focus on image blur operation by applying Gaussian blur to given images [6], which is studied for detecting adversarial examples in image classification [39]. Similarly, we will analyze the effects of image scaling on benign/adversarial samples. Since spatial context information is important for segmentation task, scaling or performing segmentation on small patches may damage the global information and therefore affect the final prediction. Here we aim to provide quantitative results to understand and explore how image scale transformation would affect adversarial perturbation.

4.1 Scale Consistency Property

Scale theory is commonly applied in image segmentation task [35], and therefore we train scale resilient models to obtain robust ones, which we perform attacks against. On these scale resilient models, we first analyze how image scaling affect segmentation results for benign/adversarial samples. We applied the DAG [49] and Houdili [10] attacks against the DRN and DLA models with different adversarial targets. The images and corresponding segmentation results before and after scaling are shown in Figure 5

. We apply Gaussian kernel with different standard deviations (std) to scale both benign and adversarial instances. It is clear that when we apply Gaussian blurring with higher std (3 and 5), adversarial perturbation is harmed and the segmentation results are not longer adversarial targets for scale transformed adversarial examples as shown in Figure 

5 (a)-(e).

(a) Benign example
(b) DAG | Kitty
(c) DAG | Pure
(d) Houdini | Kitty
(e) Houdini | Pure
Figure 5: Examples of images and corresponding segmentation results before/after image scaling on Cityscapes against DRN model. For each subfigure, the first column shows benign/adversarial image, while the later columns represent images after scaling by applying Gaussian kernel with std as 0.5, 3, and 5, respectively. (a) shows benign images before/after image scaling and the corresponding segmentation results; (b)-(e) present similar results for adversarial images generated by DAG and Houdini attacks targeting on Kitty and Pure.

5 Experimental Results

In this section, we conduct comprehensive large scale experiments to evaluate the image spatial and scale consistency information for benign and adversarial examples generated by different attack methods. We will also show that the spatial consistency based detection method is robust against sophisticated adversaries with knowledge about defenders, while scale transformation method is not.

5.1 Implementation Details

Datasets.

We apply both Cityscapes [11] and BDD100K [53] in our evaluation. We show results on the validation set of both datasets, which contains 500 high resolution images with a combined 19 categories of segmentation labels. These two datasets are both outdoor datasets containing instance-level annotations, which would raise real-wold safety concerns if they were attacked. Comparing with other datasets such as Pascal VOC [15] and CamVid [3], these two dataset are more challenging due to the relatively high resolution and diverse scenes within each image.

Semantic Segmentation Models.

We apply Dilated residual networks (DRN) [51] and Deep Layer Aggregation (DLA) [52] as our target models. More specifically, we select DRN-D-22 and DLA-34. For both models, we use 512 crop size and 2 random scale during training to obtain scale resilient models for both the BDD and Cityscapes datasets. The mIOU of these two models on pristine training data are shown in Table 1. More result on different models can be found in supplementary materials.

Adversarial Examples

We generate adversarial examples based on two state-of-the-art attack methods: DAG [49] and Houdini [10] using our own implementation of the methods. We select a complex image, Hello Kitty (Kitty), with different background colors and a random pure color (Pure) as our targets on Cityscapes dataset. Furthermore, in order to increase the diversity, we also select a real-world driving scene (Scene) without any cars from the BDD training dataset as another malicious target on BDD. Such attacks potentially show that every image taken in the real world can be attacked to the same scene without any car showing on the road, which raises great security concerns for future autonomous driving systems. Furthermore, we also add three additional adversarial targets, including “ECCV 2018”, “Remapping”, and “Color strip” in supplementary materials to increase the diversity of adversarial targets.

We generate 500 adversarial examples for Cityscapes and BDD100K datasets against both DRN and DLA segmentation models targeting on various malicious targets (More results can be found in supplementary materials).

5.2 Spatial Consistency Analysis

To evaluate the spatial consistency analysis quantitatively for segmentation task, we leverage it to build up a simple detector to demonstrate its property. Here we perform patch based spatial consistency analysis, and we select patch size and region bound as , . We select the number of overlapping regions as

. Here we first select some benign instances, and calculate the normalize mIOU of overlapping regions from two random patches. We record the lower bound of theses mIOU as the threshold of the detection method. Note that when reporting detection rate in the rest of the paper, we will use the threshold learned from a set of benign training data; while we also report Area Under Curve (AUC) of Receiver Operating Characteristic Curve (ROC) curve of a detection method to evaluate its overall performance. Therefore, given an image, for each overlapping region of two random patches, we will calculate the normalize mIOU and compare with the threshold calculated before. If it is larger, the image is recognized as benign; vice versa. This process is illustrated in Algorithm 

1. We report the detection results in terms of AUC in Table 1 for adversarial examples generated in various settings as mentioned above. We observed that such simple detection method based on spatial consistency information can achieve AUC as nearly 100% for adversarial examples that we studied here. In addition, we also select with a random number between 384 to 512 (too small patch size will affect the segmentation accuracy even on benign instances, so we tend not to choose small patches on the purpose of control variable) and show the result in supplementary materials. We observe that random patch sizes achieve similar detection result.

Method Model mIOU Detection Detection Adap
DAG Houdini DAG Houdini
Pure Kitty Pure Kitty Pure Kitty Pure Kitty
Scale (std) 0.5 DRN (16.4M) 66.7 100% 95% 100% 99% 100% 67% 100% 78%
3.0 100% 100% 100% 100% 100% 0% 97% 0%
5.0 100% 100% 100% 100% 100% 0% 71% 0%
0.5 DLA (18.1M) 74.5 100% 98% 100% 100% 100% 75% 100% 81%
3.0 100% 100% 100% 100% 100% 24% 100% 34%
5.0 100% 100% 100% 100% 97% 0% 95% 0%
Spatial (K) 1 DRN (16.4M) 66.7 91% 91% 94% 92% 98% 94% 92% 94%
5 100% 100% 100% 100% 100% 100% 100% 100%
10 100% 100% 100% 100% 100% 100% 100% 100%
50 100% 100% 100% 100% 100% 100% 100% 100%
1 DLA (18.1M) 74.5 96% 98% 97% 97% 99% 99% 100% 100%
5 100% 100% 100% 100% 100% 100% 100% 100%
10 100% 100% 100% 100% 100% 100% 100% 100%
50 100% 100% 100% 100% 100% 100% 100% 100%
Table 1: Detection results (AUC) of image spatial (Spatial) and scale consistency (Scale) based methods on Cityscapes dataset. The number in parentheses of the Model shows the number of parameters for the target mode, and mIOU shows the performance of segmentation model on pristine data. We color all the AUC less than 80% with red.

5.3 Image Scale Analysis

As a baseline, we also utilize image scale information to perform as a simple detection method and compare it with the spatial consistency based method. We apply Gaussian kernel to perform the image scaling based detection, and select as the standard deviation of Gaussian kernel. We compute the normalize mIOU between the original and scalled images. Similarly, the detection results of corresponding AUC are shown in Table 1. It is demonstrated that detection method based on image scale information can achieve similarly high AUC compared with spatial consistency based method.

5.4 Adaptive Attack Evaluation

(a) Image scaling
(b) Convergence analysis
(c) spatial consistency
Figure 6: Performance of adaptive attack. (a) shows adversarial image and corresponding segmentation result for adaptive attack against image scaling. The first two rows show benign images and the corresponding segmentation results; the last two rows show the adaptive adversarial images and corresponding segmentation results under different std of Gaussian kernel (0.5, 3, 5 for column 2-4). (b) and (c) show the performance of adaptive attack against spatial consistency based method with different . (b) presents mIOU of overlapping regions for benign and adversarial images during along different iterations. (c) shows mIOU for overlapping regions of benign and adversarial instances at iteration 200.

Regarding the above detection analysis, it is important to evaluate adaptive attacks, where adversaries have knowledge of the detection strategy.

As Carlini & Wagner suggest [4], we conduct attacks with full access to the detection model to evaluate the adaptive adversary based on Kerckhoffs principle [36]. To perform adaptive attack against the image scaling detection mechanism, instead of attacking the original model, we add another convolutional layer after the input layer of the target model similarly with [4]. We select std to apply adaptive attack, which is the same with the detection model. To guarantee that the attack methods will converge, when performing the adaptive attacks, we select 0.06 for the upper bound for adversarial perturbation, in terms of distance (pixel values are in range [0,1]), since larger than that the perturbation is already very visible. The detection results against such adaptive attacks are shown in Table 1 on Cityscapes (We omit the results on BDD to supplementary materials). Results on adaptive attack show that the image scale based detection method is easily to be attacked (AUC of detection drops dramatically), which draws similar conclusions as in classification task [4]. We show the qualitative results in Figure 6 (a), and it is obvious that even under large std of Gaussian kernel, the adversarial example can still be fooled into the malicious target (Kitty).

(a) Kitty
(b) Pure
Figure 7: Detection performance of spatial consistency based method against adaptive attack with different on Cityscapes with DRN model. X-axis indicates the number of patches selected to perform the adaptive attack (0 means regular attack). Y-axis indicates the number of overlapping regions selected for during detection.

Next, we will apply adaptive attack against the spatial consistency based method. Due to the randomness of the approach, we propose to develop a strong adaptive adversary that we can think of by randomly select patches (the same value of used by defender). Then the adversary will try to attack both the whole image and the selected patches to the corresponding part of malicious target. The detailed attack algorithm is shown in the supplementry materials. The corresponding detection results of the spatial consistency based method against such adaptive attacks on Cityscapes are shown in Table 1. It is interesting to see that even against such strong adaptive attacks, the spatial consistency based method can still achieve nearly 100% detection results. We hypothesize that it is because of the high dimension randomness induced by the spatial consistency based method since the search space for patches and the overlapping regions is pretty high. Figure 6 (b) analyzes the convergence of such adaptive attack against spatial consistency based method. From figure 6 (b) and (c), we can see that with different

, the selected overlapping regions still remain inconsistent with high probability.

Since the spatial consistency based method can induce large randomness, we generate a confusion matrix of detection results for adversaries and detection method choosing various

as shown in Figure 7. It is clear that for different malicious targets and attack methods, choosing is already sufficient to detect sophisticated attacks. In addition, based on our empirical observation, attacking with higher increases the computation complexity of adversaries dramatically.

(a) DAG
(b) Houdini
Figure 8: Transferability analysis: cell shows the normalized mIoU value or pixel-wise attack success rate of adversarial examples generated against model and evaluate on model . Model A,B,C are DRN (DRN-D-22) with different initialization. We select “Hello Kitty” as target

5.5 Transferability Analysis

Given the common properties of adversarial examples for both classifier and segmentation tasks, next we will analyze whether transferability of adversarial examples exists in segmentation models considering they are particularly sensitive to spatial and scale information. Transferability is demonstrated to be one of the most interesting properties of adversarial examples in classification task, where adversarial examples generated against one model is able to mislead the other model, even if the two models are of different architectures. Given this property, transferability has become the foundation of a lot of black-box attacks in classification task. Here we aim to analyze whether adversarial examples in segmentation task still retain high transferability. First, we train three DRN models with the same architecture (DRN-D-22) but different initialization and generate adversarial images with the same target.

Each adversarial image has at least 96% pixel-wise attack success rate against the original model. We evaluate both the DAG and Houdini attacks and evaluate the transferability using normalized mIoU excluding pixels with the same label for the ground truth adversarial target. We show the transferability evaluation among different models in the confusion matrices in Figure 8111Since the prediction of certain classes presents low IoU value due to imperfect segmentation, we eliminate K classes with the lowest IoU values to avoid side effects. In our experiments, we set K to be 13. . We observe that the transferability rarely appears in the segmentation task. More results on different network architectures and data sets are in the supplementary materials.

As comparison with classification task, for each network architecture we train a classifier on it and evaluate the transferability results as shown in supplementary materials. As a control experiments, we observe that classifiers with the same architecture still have high transferability aligned with existing findings, which shows that the low transferability is indeed due to the natural of segmentation instead of certain network architectures.

This observation here is quite interesting, which indicates that black-box attacks against segmentation models may be more challenging. Furthermore, the reason for such low transferability in segmentation is possibly because adversarial perturbation added to one image could have focused on a certain region, while such spatial context information is captured differently among different models. We plan to analyze the actual reason for low transferability in segmentation in the future work.

6 Conclusions

Adversarial examples have been heavily studied recently, pointing out vulnerabilities of deep neural networks and raising a lot of security concerns. However, most of such studies are focusing on image classification problems, and in this paper we aim to explore the spatial context information used in semantic segmentation task to better understand adversarial examples in segmentation scenarios. We propose to apply spatial consistency information analysis to recognize adversarial examples in segmentation, which has not been considered in either image classification or segmentation as a potential detection mechanism. We show that such spatial consistency information is different for adversarial and benign instances and can be potentially leveraged to detect adversarial examples even when facing strong adaptive attackers. These observations open a wide door for future research to explore diverse properties of adversarial examples under various scenarios and develop new attacks to understand the vulnerabilities of DNNs.

Acknowledgments

We thank Warren He, George Philipp, Ziwei Liu, Zhirong Wu, Shizhan Zhu and Xiaoxiao Li for their valuable discussions on this work. This work was supported in part by Berkeley DeepDrive, Compute Canada, NSERC and National Science Foundation under grants CNS-1422211, CNS-1616575, CNS-1739517, JD Grapevine plan, and by the DHS via contract number FA8750-18-2-0011.

References

Appendix 0.A Adversarial Examples For Cityscapes and BDD Datasets Against DRN and DLA models

(a) DAG | DRN | Cityscapes
(b) DAG | DRN | BDD
(c) DAG | DLA | Cityscapes
(d) DAG | DLA | BDD
(e) Houdni | DLA | CityScapes
(f) Houdini | DLA | BDD
Figure 9: Samples of benign and adversarial examples. We use the format “attack method |attack model | dataset” to label the settings of each adversarial examples. Within each subfigure, the first column shows benign images and corresponding segmentation results, the second and third columns show adversarial examples with different adversarial targets (targeting on Kitty/Pure in (a),(c), (d) and on Kitty and Scene in (b),(d),(f)).

Figure 9 shows the benign and adversarial examples targeting at diverse adversarial targets: “Hello Kitty” (Kitty) and random pure color (Pure) on Cityscapes [11]; and “Hello Kitty” (kitty) and a real scene without any cars (Scene) on BDD [xu2017end] dataset against DRN [51] and DLA [52] segmentation models. In order to increase the diversity of our target set, we also apply different colors for the background of “Hello Kitty” on BDD dataset against DLA model.

Figure 10: Attack results of additional targets on Cityscapes. The first column shows benign instance, while 2-4 columns show adversarial examples with target “ECCV 2018”, “Remapping”, and “Color strip”, respectively.

Figure 10 shows the additional adversarial targets, including “ECCV 2018”, “Remapping”, and “Color strip”. Here remapping means we generate an adversarial target by shifting the numerical label of each class in the ground truth by a constant offset. This way, we can guarantee that each target has no overlap with the ground truth mask. For “Color strip", we divide the target into 19 strips evenly, each of which is filled with a class label, aiming to mitigate possible bias for different classes.

Appendix 0.B Spatial Consistency Based Method

input : 
patch size image width image height bound output : 
1 Two random patches and : Generate two random integer numbers , where . ;
2 Generate two random integer numbers , where . ;
Return:
Algorithm 2 Patch Selection Algorithm ()
Method Model mIOU Detection Detection Adap
DAG Houdini DAG Houdini
Scene Kitty Scene Kitty Scene Kitty Scene Kitty
Scale (std=0.5) DRN (16.4M) 54.5 96% 100% 99% 100% 69% 89% 46% 91%
Scale (std=3.0) 100% 100% 100% 100% 31% 89% 1% 48%
Scale (std=5.0) 100% 100% 100% 100% 8% 84% 0% 36%
Scale (std=0.5) DLA (18.1M) 46.29 96% 88% 99% 99% 89% 90% 80% 58%
Scale (std=3.0) 100% 100% 100% 100% 66% 88% 11% 26%
Scale (std=5.0) 98% 100% 99% 100% 32% 78% 2% 12%
Spatial (K=1) DRN (16.4M) 54.5 98% 100% 99% 99% 89% 99% 89% 99%
Spatial (K=5) 100% 100% 100% 100% 100% 100% 100% 100%
Spatial (K=10) 100% 100% 100% 100% 100% 100% 100% 100%
Spatial (K=50) 100% 100% 100% 100% 99% 100% 99% 100%
Spatial (K=1) DLA (18.1M) 46.29 98% 99% 95% 95% 96% 99% 98% 95%
Spatial (K=5) 100% 100% 98% 98% 99% 100% 99% 96%
Spatial (K=10) 100% 100% 99% 99% 99% 100% 99% 96%
Spatial (K=50) 100% 100% 99% 99% 100% 100% 99% 93%
Table 2: Detection results (AUC) of image spatial (Spatial) and scale consistency (Scale) based methods on BDD dataset. The number in parentheses of the “Model” shows the number of parameters for the target mode, and “mIOU” shows the performance of segmentation model on pristine data. We color all the AUC less than 80% with red.
Method (Spatial) Model mIOU Detection Detection Adap
DAG Houdini DAG Houdini
Pure Kitty Pure Kitty Pure Kitty Pure Kitty
K=1 DRN (16.4M) 66.7 % % % % % % % %
K=5 % % % % % % % %
K=10 % % % % % % % %
K=50 % % % % % % % %
K=1 DLA (18.1M) 74.5 % % % % % % % %
K=5 % % % % % % % %
K=10 % % % % % % % %
K=50 % % % % % % % %
Table 3: Detection results (AUC) of image spatial (Spatial) based method with random patch size on Cityscapes dataset.
Method (Spatial) Model mIOU Detection Detection Adap
DAG Houdini DAG Houdini
Pure Kitty Pure Kitty Pure Kitty Pure Kitty
K=1 DRN (16.4M) 54.5 % % % % % % % %
K=5 % % % % % % % %
K=10 % % % % % % % %
K=50 % % % % % % % %
K=1 DLA (18.1M) 46.29 % % % % % % % %
K=5 % % % % % % % %
K=10 % % % % % % % %
K=50 % % % % % % % %
Table 4: Detection results (AUC) of image spatial (Spatial) based method with random patch size on BDD dataset.
Method (Spatial) Model mIOU Detection Detection Adap
DAG Houdini DAG Houdini
ECCV Remap Strip ECCV Remap Strip ECCV Remap Strip ECCV Remap Strip
K=1 DRN (16.4M) 66.7 93% 91% 91% 91% 91% 91% 90% 92% 89% 90% 92% 90%
K=5 99% 100% 99% 99% 100% 99% 100% 99% 100% 100% 100% 99%
K=10 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 99%
K=50 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100%
K=1 DLA (18.1M) 74.5 96% 99% 97% 95% 97% 96% 99% 99% 98% 99% 99% 98%
K=5 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100%
K=10 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 99%
K=50 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100%
Table 5: Detection results (AUC) of spatial consistency (Spatial) based method on Cityscapes dataset for additional targets.
Method (Spatial) Model mIOU Detection Detection Adap
DAG Houdini DAG Houdini
ECCV Remap Strip ECCV Remap Strip ECCV Remap Strip ECCV Remap Strip
K=1 DRN (16.4M) 54.5 99% 99% 99% 99% 99% 99% 99% 99% 99% 99% 98% 97%
K=5 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 98%
K=10 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 99% 97%
K=50 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 99% 97%
K=1 DLA (18.1M) 46.29 99% 99% 99% 98% 97% 98% 99% 99% 99% 98% 97% 99%
K=5 100% 100% 100% 100% 99% 99% 100% 100% 100% 99% 99% 99%
K=10 100% 100% 100% 100% 100% 100% 100% 100% 100% 99% 99% 99%
K=50 100% 100% 100% 100% 100% 100% 100% 100% 100% 99% 99% 99%
Table 6: Detection results (AUC) of spatial consistency (Spatial) based method on BDD dataset for additional targets.

0.b.1 Spatial Context Analysis

(a) Original example | DLA | Cityscapes
(b) Benign example | DLA | Cityscapes
(c) DAG | Kitty | DLA | Cityscapes
(d) DAG | Pure | DLA | Cityscapes
(e) Houdini | Kitty | DLA | Cityscapes
(f) Houdini | Pure | DLA | Cityscapes
(g) Original example | DRN | BDD
(h) Benign example | DRN | BDD
(i) DAG | Kitty | DRN | BDD
(j) DAG | Scene | DRN | BDD
(k) Houdini | Kitty | DRN | BDD
(l) Houdini | Scene | DRN | BDD
(m) Original example|DLA|BDD
(n) Benign example | DLA | BDD
(o) DAG | Kitty | DLA | BDD
(p) DAG | Scene | DLA | BDD
(q) Houdini | Kitty | DLA | BDD
(r) Houdini | Real | DLA | BDD
Figure 11: Heatmap of per-pixel self-entropy. (a), (b), (g), (h), (m) and (n) show benign images and its corresponding per-pixel self-entropy heatmaps. We use the format “examples | attack model | dataset" to label them. For the rest, we use the format “attack method | target label | attack model | dataset” to label each subcaption.

Algorithm 2 describes the algorithm of getOverlapPatches. Figure 11 shows the heatmaps of the per-pixel self-entropy on Cityscapes and BDD dataset against DRN and DLA models. It is clearly shown that the adversarial instances have higher entropy than benign ones. Table 2 shows that the detection results (AUC) based on spatial consistency method with fix patch size. It demonstrates that the spatial consistency information can help to detect adversarial examples with AUC nearly 100% on BDD dataset. Table 5 6 show the results on additinal targets on Cityscapes and BDD datasets. Table 3 4 show the detection results (AUC) based on spatial consistency method with random patch size. They show that random patch sizes achieve the similar detection result.

0.b.2 Scale Consistency Analysis

(a) Benign example | DLA | Cityscapes
(b) DAG | Kitty | DLA | Cityscapes
(c) DAG | Pure | DLA | Cityscapes
(d) Houdini | Kitty | DLA | Cityscapes
(e) Houdini | Pure | DLA | Cityscapes
(f) Benign example | DRN | BDD
(g) DAG | Kitty | DRN | BDD
(h) DAG | Scene | DRN | BDD
(i) Houdini | Kitty | DRN | BDD
(j) Houdini | Scene | DRN | BDD
(k) Benign example | DLA | BDD
(l) DAG | Kitty | DLA | BDD
(m) DAG | Scene | DLA | BDD
(n) Houdini | Kitty | DLA | BDD
(o) Houdini | Scene | DLA | BDD
Figure 12: Examples of images and corresponding segmentation results before/after image scaling. For each subfigure, the first column shows benign/adversarial images, while the following columns represent images after scaling by applying Gaussian kernel with std as 0.5, 3, and 5, respectively. (a),(f) and (k) show benign images before/after image scaling and the corresponding segmentation results and we use the format “example | attack model | dataset” to identify the corresponding model and dataset; (b)-(e), (g)-(j) and (l)-(o) present similar results for adversarial images and we use the format “attack method | target label | attack model | dataset” to label the settings of each image.
(a) Benign example | DRN | Cityscapes
(b) DAG | Kitty | DRN | Cityscapes
(c) DAG | Pure | DRN | Cityscapes
(d) Houdini | Kitty | DRN | Cityscapes
(e) Houdini | Pure | DRN | Cityscapes
(f) Benign example | DLA | Cityscapes
(g) DAG | Kitty | DLA | Cityscapes
(h) DAG | Pure | DLA | Cityscapes
(i) Houdini | Kitty | DLA | Cityscapes
(j) Houdini | Pure | DLA | Cityscapes
(k) Benign example | DRN | BDD
(l) DAG | Kitty | DRN | BDD
(m) DAG | Scene | DRN | BDD
(n) Houdini | Kitty | DRN | BDD
(o) Houdini | Scene | DRN | BDD
(p) Benign example | DLA | BDD
(q) DAG | Kitty | DLA | BDD
(r) DAG | Scene | DLA | BDD
(s) Houdini | Kitty | DLA | BDD
(t) Houdini | Scene | DLA | BDD
Figure 13: Examples of images and corresponding segmentation results for adaptive attack against image scaling. For each subfigure, the first column shows benign/adversarial images, while the following columns show images after scaling by applying Gaussian kernel with std as 0.5, 3, and 5, respectively. (a), (f), (k) and (p) show benign images before/after image scaling and the corresponding segmentation results and The format “example | attack model | dataset” uses to identify the corresponding model and dataset; (b)-(e), (g)-(j), (l)-(o) and (q)-(t) present similar results for adaptive adversarial images and we describe by the format “attack method | target label | attack model | dataset”.

We applied image scaling to the adversarial examples generated by Houdini [10] and DAG [49] on Cityscapes and BDD datasets against DRN and DLA models. The result shows in Figure 12. We can find the same phenomenon that when we applying Gaussian blurring with high std (3 and 4), adversarial perturbation is harmed and segmentation result are no longer adversarial targets.

Table 2 shows that the method based on image scale information can achieve similarly AUC compared with spatial consistency based method on BDD.

0.b.3 Adaptive Attack Evaluation

input:
Input image ;
Number of attack patches ;
Patch size ;
Segmentation model ;
bound ;
Recognition targets ;
Adversarial label set ;
Maximal iteration ;
output:
Adversarial perturbation ;
;
1 while  do
2       ;
3       ;
4       /* Attack C random patches */;
5       for  to  do
6             Generate two random integers where and ;
7             ;
8             ;
9            
10       end for
11      ;
12       ;
13       ;
14       ;
15      
16 end while
Return:
Algorithm 3 DAG adaptive attack against spatial consistency method
input:
Input image ;
Number of attack patches ;
Patch size ;
Segmentation model ;
bound ;
Adversarial target label ;
Maximal iteration ;
Houdini loss ;
output:
Adversarial perturbation ;
;
1 while  do
2       ;
3       /* get the gradient of the perturbation from the objective */;
4       ;
5       ;
6       for  0 to  do
7             Generate two random interger numbers where ;
8             ;
9            
10       end for
11      ;
12       ;
13      
14 end while
Return:
Algorithm 4 Houdini adaptive attack against spatial consistency method
(a) DLA | Kitty | Cityscapes
(b) DLA | Pure | Citysacpes
(c) DRN | Kitty | BDD
(d) DRN | Scene | BDD
(e) DLA | Kitty | BDD
(f) DLA | Scene | BDD
Figure 14: Detection performance of spatial consistency based method against adaptive attack with different . We use the format “attack model | target label | dataset” to label the settings for each figure. X-axis indicates the number of patches selected to perform the adaptive attack (0 means regular attack). Y-axis indicates the number of overlapping regions selected during detection. We select the minimal mIOU from benign patches as our threshold on Cityscapes, and the one which guarantees accuracy as above 95% on benign images for BDD.

Here we illustrate the adaptive attack algorithm based on DAG and Houdini against spatial consistency method in Algorithm 3 and Algorithm 4. Instead of only attacking the benign image, the adaptive attack here will randomly pick some patches and attack them with the whole image together.

Let be an image and is the number of patches selected from to perform adaptive attack. We define as the set comprising the coordinates of the recognition target pixels. denotes the segmentation network, and we use to denotes the classification score of the entire image and to denote the classification score vector at pixel . denotes the adversarial label set where represents the adversarial label of pixel

. Given an input tensor

, the function returns its norm defined to be . In Algorithm 4, we follow the same definition of Houdini loss as proposed in the work [10]. We set the maximal number of iteration to be approximately 300222 300 is approximately three times the average number of iterations in non-adaptive attack. in all settings. We set bound to be for simplicity.

The detection results in term of AUC of the spatial consistency based method against such adaptive attacks on BDD and Cityscapes are shown in Table 2 3 4 5 6. Even against such strong adaptive attacks, the spatial consistency based method can still achieve nearly 100% AUC. Figure 14 shows the confusion matrix of detection result for adversaries and detection method choosing various . It is clear that when we choose , it is already sufficient to detect sophisticated attacks on Cityscapes dataset against DLA model and can also achieved 100% detection rate with false positive rate 95% on BDD.

The detection results of the image scaling based method against adaptive attacks on BDD are shown in Table 2. For image scaling based detection method, it can be easily attacked (AUC drops drastically). Figure 13 shows the qualitative results. It is obvious that even under Gaussian kernel with large standard deviation, the adversarial example can still be fooled into predicting different malicious targets (“Kitty”, “Pure”, “Scene”) on Cityscapes and BDD dataset against DRN and DLA models.

Appendix 0.C Additional Results for Transferability Analysis

We present additional results for transferability analysis in Figure 15 to Figure 19.

Figure 15 to Figure 16 show the transferability analysis results for segmentation models. We report pixel-wise attack success rate for the pure target and normalized mIoU after eliminating K classes with the lowest IoU values for other targets. We set K to be 13 for CityScapes dataset and 5 for BDD dataset. Additional qualitative results are presented in Figure 17 to Figure 18.

Fig. 19 shows the transferability experiments results for classification models under targeted attack. The adversarial images are generated using iterative FGSM method from MNIST and CIFAR10 datasets. The caption of each sub-figure indicates the dataset and the attacked classification model.

(a) DAG Pure DRN-D-22
(b) Houdini Pure DRN-D-22
(c) DAG Pure DRN-C-26
(d) Houdini Pure DRN-C-26
(e) DAG Hello Kitty DRN-C-26
(f) Houdini Hello Kitty DRN-C-26
(g) DAG Pure DLA34UP
(h) Houdini Pure DLA34UP
(i) DAG Hello Kitty DLA34UP
(j) Houdini Hello Kitty DLA34UP
Figure 15: Transferability analysis on CityScapes dataset: cell shows the normalized mIoU value or pixel-wise attack success rate of adversarial examples generated against model and evaluate on model . Model A,B,C have the same architecture (DRN-C-26 or DLA34UP) with different initialization. We use format “attack method attack target model ” to denote the caption of each sub-figure.
(a) DAG Scene DRN-D-22
(b) Houdini Scene DRN-D-22
(c) DAG Hello Kitty DRN-D-22
(d) Houdini Hello Kitty DRN-D-22
(e) DAG Scene DRN-C-26
(f) Houdini Scene DRN-C-26
(g) DAG Hello Kitty DRN-C-26
(h) Houdini Hello Kitty DRN-C-26
(i) DAG Scene DLA34UP
(j) Houdini Scene DLA34UP
(k) DAG Hello Kitty DLA34UP
(l) Houdini Hello Kitty DLA34UP
Figure 16: Transferability analysis on BDD dataset.
(a) DAG Pure DRN-D-22
(b) Houdini Pure DRN-D-22
(c) DAG Hello Kitty DRN-D-22
(d) Houdini Hello Kitty DRN-D-22
(e) DAG Pure Rider DRN-C-26
(f) Houdini Pure Rider DRN-C-26
(g) DAG Hello Kitty DRN-C-26
(h) Houdini Hello Kitty DRN-C-26
(i) DAG Pure Rider DLA34UP
(j) Houdini Pure Rider DLA34UP
(k) DAG Hello Kitty DLA34UP
(l) Houdini Hello Kitty DLA34UP
Figure 17: Transferability visualization on CityScapes dataset. In each sub-figure, the first row presents the segmentation results of adversarial example on model A (targeted model) and model B. The second row shows the adversarial target and the ground truth. We use format “attack method attack target model ” to denote the caption of each sub-figure.
(a) DAG Scene DRN-D-22
(b) Houdini Scene DRN-D-22
(c) DAG Hello Kitty DRN-D-22
(d) Houdini Hello Kitty DRN-D-22
(e) DAG Scene DRN-C-26
(f) Houdini Scene DRN-C-26
(g) DAG Hello Kitty DRN-C-26
(h) Houdini Hello Kitty DRN-C-26
(i) DAG Scene DLA34UP
(j) Houdini Scene DLA34UP
(k) DAG Hello Kitty DLA34UP
(l) Houdini Hello Kitty DLA34UP
Figure 18: Transferability visualization on BDD dataset.
(a) MNIST DRN-D-22
(b) CIFAR10 DRN-D-22
(c) MNIST DRN-C-26
(d) CIFAR10 DRN-C-26
(e) MNIST DLA34
(f) CIFAR10 DLA34
Figure 19: Transferability analysis for classification models: cell shows the attack success rate of the adversarial examples generated against Model and evaluate on Model under targeted attack setting. Model A,B,C are model with the same architecture (DRN-D-22, DRN-C-26 or DLA34UP) and different initialization. All the adversarial examples are generated using fast iterative gradient sign method. The caption of each sub-figure bear the “dataset model”.