Adversarial Attack in the Context of Self-driving

04/05/2021
by   Zhenhua Chen, et al.
Indiana University Bloomington
0

In this paper, we propose a model that can attack segmentation models with semantic and dynamic targets in the context of self-driving. Specifically, our model is designed to map an input image as well as its corresponding label to perturbations. After adding the perturbation to the input image, the adversarial example can manipulate the labels of the pixels in a semantically meaningful way on dynamic targets. In this way, we can make a potential attack subtle and stealthy. To evaluate the stealthiness of our attacking model, we design three types of tasks, including hiding true labels in the context, generating fake labels, and displacing labels that belong to some category. The experiments show that our model can attack segmentation models efficiently with a relatively high success rate on Cityscapes, Mapillary, and BDD100K. We also evaluate the generalization of our model across different datasets. Finally, we propose a new metric to evaluate the parameter-wise efficiency of attacking models by comparing the number of parameters used by both the attacking models and the target models.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 2

page 6

page 7

03/13/2021

Generating Unrestricted Adversarial Examples via Three Parameters

Deep neural networks have been shown to be vulnerable to adversarial exa...
08/12/2021

Deep adversarial attack on target detection systems

Target detection systems identify targets by localizing their coordinate...
06/09/2020

GAP++: Learning to generate target-conditioned adversarial examples

Adversarial examples are perturbed inputs which can cause a serious thre...
11/25/2019

Adversarial Attack with Pattern Replacement

We propose a generative model for adversarial attack. The model generate...
08/25/2021

Improving Visual Quality of Unrestricted Adversarial Examples with Wavelet-VAE

Traditional adversarial examples are typically generated by adding pertu...
09/16/2020

Contextualized Perturbation for Textual Adversarial Attack

Adversarial examples expose the vulnerabilities of natural language proc...
05/26/2020

Generating Semantically Valid Adversarial Questions for TableQA

Adversarial attack on question answering systems over tabular data (Tabl...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Many breakthroughs have happened since deep neural networks were re-introduced to the computer vision community a few years ago. At the same time, worries about whether these deep neural networks are reliable or robust to potential attacks are emerging. Unfortunately, more and more research has shown that these deep neural networks are vulnerable to attacks based on adversarial examples 

[10, 17, 22, 25]. One explanation is that since a neural network is roughly linear, a small perturbation on the input scan make a big difference on the output if the neural network is deep. Many researchers have been working on improving the robustness of neural networks [7, 11, 21, 18, 24, 26, 27], but most of them can be easily circumvented [3] and we do not have general techniques for ensuring robustness for neural networks.

In contrast, attacking a deep learning-based model is much easier. FGSM 

[10]

was among the very first attacking algorithms, whose primary idea was to convert the attacking problem into an optimization problem. Specifically, FGSM manipulates the input data in some direction in the image space to make the classifier predict incorrect labels. The main difference between FGSM and backpropagation is that FGSM considers the perturbations rather than each layer’s parameter as trainable parameters.

What is surprising is that we can limit the magnitude of the perturbations to a small range (for example, the infinity norm of the perturbation image is smaller than 8, ) but can still fool a neural network effectively. Although FGSM is designed to attack deep learning-based classifiers, it can also be applied to some high-level applications like image segmentation and object detection. Because stripped to the essence, semantic segmentation methods and object detectors are also composed of multiple “classifiers”. However, attacking a segmentation model or object detection model is relatively more difficult since more constraints have to be considered.

Based on FGSM, several varied forms are proposed. The most straight-forward modification is replacing the one-step optimization with a multi-step optimization process (Here we focus on only attacking segmentation models). For example, [2] explore the robustness of several segmentation models based on FGSM. [29] propose Dense Adversary Generation to attack both segmentation models and two-stage object detectors. Both [29] and  [2] fall into the slot of untargeted attacking which makes target segmentation models predict random labels. A non-target attack can completely collapse a segmentation model’s predictions and thus easy to detect, as Figure 1 shows. The targeted attack, if designed with semantic meanings, can be stealthy. For example,  [8] can fool a segmentation model to predict labels without ’Person’ when there are people in the image. From another viewpoint, these attacking models can be classified in terms of how the perturbations are generated, either by backpropagating the gradients to the input space directly or by adopting a separate generator. The former type is simple but not so flexibile (the number of the parameters always equal to the number of input pixels) [2, 29] while the latter is powerful but more complex [20].

Considering that these deep learning-based segmentation models are being applied to autonomous cars [28, 13, 31, 4] as well as many other applications, once these segmentation models are attacked, serious problems in the real world could happen. In this paper, we propose a method to generate perturbations that can strike dynamic, semantically stealthy attacks. Specifically, we start from the input images as well as the segmentation labels predicted by the target model, then generate semantically stealthy perturbations based on them. When deployed, we use the trained generator to generate perturbations that apply to each testing image. Our main contributions can be summarized as follows:

  • We propose an attacking framework that can strike stealthy, semantically meanful attack on dynamic targets.

  • We design three basic types of semantic attacking, namely label vanishing, fake label adding, and label displacement. Other high-level semantic manipulation can be acquired by combining these basic types.

  • We evaluate our model’s generalization ability across different datasets as well as its parameter-wise efficiency.

Figure 1: Non-targeted attack [2]. Top: The predicted label generated by an target segmentation model. Bottom: The target model’s prediction collapses completely.
Figure 2:

The overview of our attacking model. The ‘Target model’ is the one we attack. During training, each input image is fed into a generator (which is composed of a couple of convolutonal layers and deconvolutional layers) as well as the target model. After the ‘clean logits’ and the ‘dirty logits’ are acquired, we feed them into a customized ‘adversarial softmax loss’. Another critical design is that we regulate the embedded features that come from the generator to predict the target model’s original prediction. The ‘embedded label’ here provides a template for fake labels.

2 Related work

2.1 FGSM-style attacking

Current attacking methods are mostly inspired by FGSM [10]. The core idea of these FGSM-style attacking methods is to choose a label or labels for a target model (the model is treated as a black-box), calculate the loss, and then backpropagate the gradients to the perturbation space. Most attacking algorithms designed for segmentation models also follow this idea. For example,  [2, 29] attack a couple of segmentation models [12, 19, 5] by setting untargeted labels. After attacking, the targeted models predict random labels for each pixel. Although these attacking models are pretty efficient, they are far from subtle and would be easy to get caught. [8] overcomes this issue by setting semantically meaningful target.

Another issue is that these attacking methods, no matter targeted or untargeted, are a straightforward generalization of FGSM [10] which treats each image as an independent target. In other words, their target is each input image rather than the whole model. Treating each input image separately results in low efficiency during testing.

2.2 Universal adversary

The universal adversary might be a solution to overcome the computationally intensive issue of FGSM-based attacks. Universal Adversarial Perturbations (UP) [15] can generate sample-agnostic perturbations offline and thus can strike an attack in real-time. The primary idea of UP is trying to find a universal perturbation that can fool all the training/test images. UP is originally designed for attacking classifiers, but can be easily extended to attack segmentation models. For example,  [14]

propose an attacking solution by finding a perturbation that maximizes the predictive probability of a target class. Specifically,  

[14] iteratively backpropagate the gradients to the input space. When the training is finished, the final gradients form the perturbation which is added to each input image during testing time.

2.3 Attacking based on GAN-like structures

UP-based attacking models largely avoid the computationally expensive issue. However, it is not so flexible since the model is limited to have the same neurons as the number of pixels. One straight-forward way to solve this issue is adopting a GAN-like structure, in which the discriminator corresponds to the target model while the generator is supposed to generate the perturbations. The perturbation generator can start from either random noise or an image. If the perturbation comes from an image, it is called as an image-dependent perturbation. For example,

[20] proposes a unified, GAN-like, attacking model that can attack both segmentation models and classifiers. One problem of  [20] is that we found that its structure does not perform very well for dynamic target labels. To soften this issue, we add an extra regularizer that encodes the information of the predicted segmentation labels as well as the input image to generate perturbations for dynamic segmentation targets. The experiment shows that our model achieves good performance on several different attacking types.

2.4 Attacking based on physical attributes

Apart from adding perturbation, it is also possible to add physical attributes to the original image to strike an attack. For example, [23] can fool a face recognizer by adding glasses to a face. [1]

can fool a real-world face recognition system by manipulating some attribute (like hair color, etc.) These methods used to generate physical adversarial examples are weakly related to our work.

3 Our Approach

More and more self-driving cars are adopting deep learning-based segmentation models. Once the segmentation model was attacked, an accident would happen.

3.1 Adversary model

We assume that a potential attacker can get access to the model but can not modify it. The model can be stored in either a self-driving car itself or a server which can communicate with it. We assume the attacker can use the model as a black-box, and can inject unlimited samples into the target model to observe the model’s output. Thus we can train an adversary model to generate adversarial perturbations.

Figure 3: The detailed structure of the generator in Figure 2.

3.2 System model

Our model is visually similar to the Generative Adversarial Network (GAN) since it also contains a generator and a discriminator (the target segmentation model), as Figure 

2 shows. Specifically, we adopt an encoder-decoder structure to map an input image to perturbations. At the same time, the encoder-decoder is also responsible to predict the target model’s labels. The potential assumption is that if a generator can generate perturbations that serve a particular target, then the embedded features that come from the generator should also have a good knowledge of the spatial relationship within each input image. In other words, the two branches of the generator (one for perturbations, the other one for predicting the target model’s label) are complementary with each other. After the perturbation is acquired, we forward both the adversarial image and the original image through the target model to get “clean logits” and the “dirty logits”. The logits are then fed into the adversarial softmax loss layer (See 3.2.3 for details).

3.2.1 Problem definition

Let be the target segmentation model trained on some dataset with image & label pairs, where , . , , and represents the height, width and the channel number respectively. The perturbation has the same size as the input image. An adversarial example is the addition of the perturbation and the input image . For each input image , outputs a predict label . Given a semantic target label , our attacking model is supposed to get,

(1)

During training, we first load the target model into the discriminator branch and then set the learning rate of the whole discriminator branch to 0 to avoid learning. Then we take a batch of samples and feed them to the generator. Finally, in the adversarial softmax loss layer, we compare the clean logits with the adversarial logits and output the loss according to Equation 2. The whole algorithm is summarized in Algorithm 1.

Load the target model into the discriminator;
Initialize the generator. while not converge do
       Generate clean logits, and perturbations ;
       Generate an adversarial image by adding a clean image to ;
       Generate dirty logits;
       Calculate the adversarial Softmax loss;
       Backpropagate the gradients;
      
end while
Algorithm 1 Our attacking algorithm.

3.2.2 Attacking type

Focusing on the person labels in the context of self-driving, we design three attacking types, Type#1: the person label vanishing, Type#2: the fake person label generation, and Type#3 the person label displacement. Specifically,

Type #1: A potential attacker makes the target model fail to predict any ‘person’ labels. As a result, the original ‘person’ label looks like vanished or blended into the context. We achieve this by training the attacking model to predict the target model’s second possible labels for those pixels that are identified as ‘person’.

Type #2: The labels of a set of refereed pixels are transformed into the person labels. Visually, it would look like an attacker creates a person that comes from nowhere.

Type #3: The attacker of this type can “move” some person labels from one position to another. This can be achieved by combining the attack of the Type#1 and Type#2. Namely, vanishing the person labels in one position and then create fakes person labels in another position. Visually, it looks like a person is moved from one position to another position.

Apart from the above three attacking types, one common thing shared by them is that when striking above attacks, we keep other irrelevant pixels from being modified to achieve stealthiness.

3.2.3 Adversarial softmax loss

We design our loss function to make sure that the target label is manipulated from

to . equals to if the pixel k is one of our target pixels, otherwise . , , are the batch size, channel size (total number of categories), and the total number of pixels in each image. represents the probability of each pixel belonging to which comes from either (the logits forms of ) or (the logits forms of ). is introduced to control the stealthy degree between the targets and the background. The loss function is shown in Equation 2.

(2)

Assume the the logits that come from the generator is , then the regularizer loss can be summarized as,

(3)

It is a common practice to limit the range of the perturbations, as shown in [10].

(4)

However, we believe that it does not make a lot of sense to make the adversarial image imperceptible if your purpose is to fool a visual system. The idea is also supported by Goodfellow [9], “There is no need for the adversarial example to be made by applying a small or imperceptible perturbation to a clean image”, Thus we do not test how different norm ranges affect the performance. If not specified, is always equal to 10.

Our attacking generally falls into the slot of adversarial attacking, so the corresponding defending policy is adversarial training. In other words, we should incorporate adversarial examples into the training cycle to force a segmentation model to learn to be robust. Generalization usually degrades across datasets and models, so generating adversarial examples from diversified datasets and models would be a good choice.

4 Experiments

Model type1 type2 type3
Cityscapes 96.34% 96.56% 96.09%
BDD100K 97.86% 96.67% 95.95%
Mapillary 96.40% 94.51% 93.35
Table 1: The success rate of on Cityscapes, BDD100k and Mapillary. The target model is FCN-8s.
manipulated (%) preserved (%)
Model type1 type2 type3 type1 type2 type3 overall
Cityscapes 66.18% 91.27% 67.73% 91.40% 92.76% 88.74% 83.01%
BDD100K 66.56% 66.54% 68.69% 97.09% 92.46% 90.58% 80.32%
Mapillary 53.58% 78.67% 56.07% 96.41% 88.62% 85.42% 76.46%
Table 2: FCN results.
Figure 4: Type#1 attacking on Cityscapes. Top left: Original image. Top middle: The perturbation generated from our model. Top right: Image after apply perturbation to the original image. Lower left: Predicted label on original image. Lower right: Predicted label on perturbed image.

4.1 Datasets & Quantitative analysis

Since our attacking is set in the context of self-driving, we choose three street scene datasets, namely, Cityscapes (2975 training image, 1525 testing images) [6], Mapillary Vistas Dataset (18K training images, 5K testing images ) [16] and BDD100K (7K training image, 2K testing images) [30]. During training we resize each input image to 160320 to save memory. is set as 1. The learning rate is 1e-4. The weight of the regularizer is 1e-2.

We evaluate our model by using “success rate”, which is defined as the average accuracy among all pixels between the targeted labels and the predicted labels of the adversarial examples. Specifically, the success rate contains two parts, the percentage of labels that are manipulated successfully for target pixels and the percentage of labels that are preserved for non-target pixels. Table 2 shows the success rate of attacking the FCN-8s model [12].

Models Cityscapes BDD100K Mapillary
Cityscapes 96.34% 95.44 % 92.96%
BDD100K 97.24% 97.86% 96.56%
Mapillary 96.86% 97.41% 96.40%
Table 3: The Type#1 generalization of different models.
Model type model size Cityscapes BDD100K Mapillary
Normal 3.02M 74.71% 55.08% 68.39%
Regularizer-removed 3.00M 62.87% 54.88% 59.98%
Table 4: The comparision between the attacking model in Figure 3 and its corresponding regularizer-removed model.
Attacking model size target model size ratio Cityscapes BDD100K Mapillary
Model#1 (531M) 269M 1.97 96.34% 96.56% 96.09%
Model#2 (10.69M) 269M 0.040 74.71% 55.08% 68.39%
Model#3 (2.69M) 269M 0.010 64.47% 53.63% 60.39%
Model#4 (680K) 269M 0.003 60.63% 50.99% 45.53%
Table 5: The parameter-wise efficiency of type#1 attacking. Model#1 adopt the generator in [20]. Model#2 uses the generator in Figure 3. Model#3 is the same as Model#3 only the number of feature maps in each layer is cut in half. Similarly, Model#4 cut half of its parameters from Model#3.
Figure 5: Type#2 attacking on Cityscapes. Top left: Original image. Top middle: The perturbation generated from our model. Top right: Image after apply perturbation to the original image. Lower left: Predicted label on original image. Lower right: Predicted label on perturbed image.
Figure 6: Type#3 attacking on Cityscapes. Top left: Original image. Top middle: The perturbation generated from our model. Top right: Image after apply perturbation to the original image. Lower left: Predicted label on original image. Lower right: Predicted label on perturbed image.

4.1.1 Generalization across datasets.

It is common that different cities have different street scenes, so it is desirable to check the generalization ability of our model across different datasets. In other words, we want to find out whether our model can achieve good performance on unseen data. Specifically, we train three attacking models on three datasets (Cityscapes, Mapillary, and BDD100K), then apply each trained model to the other two unseen datasets. As Table 3 shows, our attacking model generalizes well across datasets.

Apart from generalization over datasets, it seems desirable to evaluate our model in generalization across different target models. However, the target models are usually treated as black-boxes, their differences are largely related to their sizes. This it makes more sense to compare the size of a target model and the size of an attacking model, please check Section 4.1.3 for details.

4.1.2 Ablation study.

The regularizer in our model plays a vital role in the attacking segmentation models dynamically. The potential assumption is that if a generator can generate perturbations for dynamic target labels, then the embedded features that come from the generator should also have a good knowledge of the spatial structure within each image. As a result, we can use these embedded features to generate labels that come from the target model. To explore the necessity of the regularizer, we compare the performance (success rate) between our model and the corresponding regularizer-removed one on three datasets. As Table 5 shows, even though the increased number of parameters is negligible, adding a separate regularizer could still make a big difference.

4.1.3 Parameter-wise efficiency of attacking models

The parameter-wise efficiency of attacking models have been largely ignored by previous works. Here we propose a new metric which is defined as the ratio of the number of attacking models’ parameters and the number of target models’ parameters. For the same success rate, smaller attacking models are more efficient. Table 5 shows that

5 Conclusion

We propose a new framework for attacking semantic segmentation models with dynamic targets. The attacking is supposed to happen in the context of self-driving, so it matters to make the whole process stealthy to avoid getting caught. We test our model on three basic semantic attacking types as well as three street-scene datasets, and the experiment shows that our model achieves relatively good performance. We also evaluate how the regularizer affects performance. Finally, we propose a metric, which equals to the ratio of an attacking model’s size and a target model’s size, to evaluate the parameter-wise efficiency.

References