While machine learning methods and in particular deep neural networks have led to significant performance increases for numerous tasks, several studies have found them to be vulnerable to adversarial attackers (Szegedy et al., 2014)
. These attackers compute perturbed versions of input data that fool the classifier to change its predictions on the new version while these perturbations stay almost imperceptible to the human eye. Most of the work on adversarial examples focusses on the task of image classification. In this paper, we investigate the effect of adversarial attacks on a localization task, in particular semantic segmentation.
Our work uses a common deep neural network for semantic segmentation and evaluates to which extent this task is vulnerable to adversarial examples. We introduce a way how adversarial examples could be defined for semantic segmentation and show how existing methods to create adversarial perturbations can be adapted for semantic segmentation. On the basis of the Cityscapes dataset, we find that it is easily possible to let a deep network misclassify entire regions of an image while replacing these misclassified areas with a convenient substitution. Finally we introduce a measure to analyse the effectiveness of a given attacker for a semantic segmentation network.
Generating adversarial examples
Let denote a function for semantic segmentation such as a deep neural network with parameters and an input image of size with ground-truth label . As most semantic segmentation tasks, we assume that has the same pixel dimensions as the image times the number of classes. We also denote with the classification loss (i.e. cross-entropy) and assume it is differentiable with respect to and .
Since adversarial examples were discovered by Szegedy et al. (2014), several methods have been introduced to generate adversarial perturbations (e.g. (Kurakin et al., 2016), (Liu et al., 2015), (Goodfellow et al., 2015)). In the scope of this work we will focus on the so-called least likely method presented by Kurakin et al. (2016). Despite its name this method is not restricted to target the least-likely class, but, in contrast to most other approaches, allows to explicitly specify a target class we want the classifier to predict. This method can easily be transferred from a single pixel to an entire image.
|(a) original image||(b) adv. example||(c) adv. restricted|
||(d) adv. noise ()||(e) adv. restricted noise ()|
(f) adv. target
|(g) pred. on adv.||(h) pred on restricted adv.|
|(j) pred. vs. pred. on adv.||(k) pred. vs. pred. rest. adv.|
For an arbitrary given target label adapting (Kurakin et al., 2016), the least-likely method iteratively computes an adversarial perturbation as:
Here denotes the maximum -norm of and the single step-size of one iteration. We set , hence changing each pixel maximally by for each iteration. As suggested in Kurakin et al. (2016), the number of iterations was set to and adversarial perturbations were evaluated for different . The function clips all values of into the intervall .
Definition of adversarial target for semantic segmentation
In our approach the adversarial target covers the entire image and all pixels of a specific class are changed towards other classes. To look more natural, we suggest to choose the target class for all to be fooled pixels by replacing them with the class of their nearest neighbor that is different from . For all pixels not of class in the original network prediction, we set . We measure the effect of adversarial examples by the percentage of pixels of the chosen class that were changed and the percentage of background pixels that were preserved. Because we want to fool the classifier relative to its actual prediction and not the ground-truth, we generate the target label on the basis of the network’s actual prediction on the unaltered image.
3 Results & Discussion
Experiments were run with the fully convolutional network architecture introduced by Long et al. (2015) for the VGG16 model. We trained a version of the FCN8 achieving an intersection over union of . Network training and evaluation as well as generation and evaluation of adversarial examples was done on the downscaled () Cityscapes dataset (Cordts et al., 2016).
For the task of hiding all pixels of the same class, we chose to hide all person pixels of the Cityscapes validation dataset. As described above, adversarial target labels were created by first fixating all non-person pixels and second by replacing person pixels with their nearest-neighbor non-person class (see example in Fig.1 original image (i) and target label (f)). These labels were created for all network predictions on Cityscapes validation images and then used to compute adversarial variations from the images as described above. We used different values for investigating the effect for different magnitudes of adversarial perturbations. In Fig. 1 a representative example is shown, we see that it is possible to fool the deep network into almost not recognising any person pixels while predicting the substituted background and preserving the background outside the original persons nearly perfectly. We also see that the adversarial perturbation is hard to detect.
Mean and standard deviation over Cityscapes validation dataset for percentages of preserved (light grey) background and fooled (dark red) person pixels for differentand for noise applied to the entire image (left) and restricted to person pixels (right).
Finally we restricted the adversarial perturbations computed above and applied them only on person class pixels of the network’s prediction on the original image. The right column of Fig. 1 shows an example for this case. We see that a majority of person pixels can be cloaked while preserving the background. This indicates that especially the noise on the humans is important to hide them.
Fig. 2 shows the mean and standard deviation for percentages of person pixels changed and background pixels preserved for different . For the left plot in Fig. 2 adversarial perturbations were applied to the entire image, for the right only on person class pixels. In the first case, we see that for sufficiently low of about , over of person pixels could be hidden while over of background pixels were preserved. Considering the case adversarial noise was restricted to person pixels (right plot of Fig. 2), we see that the background is preserved even for smaller while the number of cloaked person pixels decreases for small but recovers for larger values of .
We adapted the concept of adversarial examples to the task of semantic segmentation and showed that existing approaches to generate adversarial examples for classification can be easily transferred to this task. We showed that there exist imperceptible adversarial perturbations that cloak almost all pixels of a target class while leaving the other classes across the image nearly unchanged. Many open topics remain, such as: usage of more performant networks, comparison of different network architectures, more sophisticated methods to measure the effectiveness of adversarial examples for semantic segmentation, can these adversarial examples be applied in the physical world?
Cordts et al. (2016)
Marius Cordts, Mohamed Omran, Sebastian Ramos, Timo Rehfeld, Markus Enzweiler,
Rodrigo Benenson, Uwe Franke, Stefan Roth, and Bernt Schiele.
The cityscapes dataset for semantic urban scene understanding.In
- Goodfellow et al. (2015) Ian J. Goodfellow, Jonathon Shlens, and Christian Szegedy. Explaining and Harnessing Adversarial Examples. In International Conference on Learning Representations (ICLR), 2015.
- Kurakin et al. (2016) Alexey Kurakin, Ian Goodfellow, and Samy Bengio. Adversarial examples in the physical world. arXiv:1607.02533, July 2016.
- Liu et al. (2015) Ziwei Liu, Xiaoxiao Li, Ping Luo, Chen Change Loy, and Xiaoou Tang. Semantic image segmentation via deep parsing network. In The IEEE International Conference on Computer Vision (ICCV), 2015.
- Long et al. (2015) Jonathan Long, Evan Shelhamer, and Trevor Darrell. Fully convolutional networks for semantic segmentation. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015.
- Szegedy et al. (2014) Christian Szegedy, Wojciech Zaremba, Ilya Sutskever, Joan Bruna, Dumitru Erhan, Ian Goodfellow, and Rob Fergus. Intriguing properties of neural networks. In International Conference on Learning Representations (ICLR), 2014.
- The Theano Development Team (2016) The Theano Development Team. Theano: A Python framework for fast computation of mathematical expressions. arXiv:1605.02688, May 2016.