A deep neural network has shown remarkable performance in vision-related tasks such as image classification, object detection, and semantic segmentation. With this performance, deep learning technology has started to be applied to various practice areas such as a self-driving car, health care artificial intelligence. However, according to a recent study, deep learning models are vulnerable to well-designed perturbation of input. These perturbations are hard to detect via human eyes, so human can still understand objects correctly. But a deep neural network can produce completely different results than we expect. The adversary can even make perturbation in the way they want. For instance, they can change the image so that a deep neural network misclassifies it as a wrong target set by them. This phenomenon is important issue in terms of security and safety of artificial intelligence. For instance, a very dangerous scenario can occur in self-driving cars. Self-driving cars can misunderstand the stop signal or misperceive the area of the road[sitawarin2018darts, eykholt2018robust] . In health care, a medical diagnosis system based on deep learning technique can misjudge the status of the patient [finlayson2018adversarial].
This perturbed image is called adversarial example or adversarial attack
. It is basically generated by using the parameters and loss function of the victim model. And it is calledwhite box attack since it requires the information of the model. But in case of no access to the model, it is still possible to create an adversarial example because of the property called the transferability. It allows attack on this situation, which is called black box attack. i.e, an adversarial example generated from specific model works with other models as well. It is known that an adversarial attack works better with a similar task. For instance, adversarial attack on particular neural network can fool other neural networks with different architectures[papernot2016transferability]. With this transferable feature, an adversary can more easily fool diverse models.
Many research has tried to generate stronger adversarial examples to attack the state-of-the-art models [papernot2016limitations, moosavi2016deepfool, carlini2017towards]. And in response to this, there has been some research approaches to make robust models against adversarial attack [papernot2016distillation, meng2017magnet]. There were also competitions involving adversarial attack and defense in the field of image classification [kurakin2018adversarial]. Several methods for attack and defense are proposed. However, many of the attack and defense involved in adversarial examples have been limited to the problem of image classification. Although there are related works which deal with the semantic segmentation and object detection [thys2019fooling, 8237562, xie2017adversarial]
, only attack scenarios are addressed and studies of defense scenarios for complex tasks such as semantic segmentation are insufficient. Also, for the defense model of the classification task, they often use simple dataset like MNIST, CIFAR 10, which have a small resolution. Therefore, it is needed to experiment with images which have large resolution since more complicated images are used in the real world. In addition, we need to study the defense scenario of semantic segmentation since it is more practical. For instance, in autonomous driving vehicles or medical intelligence, most of the scene understanding is performed through semantic segmentation rather than classification.
In this paper, we aim to provide robust mechanisms to secure semantic segmentation model from adversarial attack To achieve that, we propose DAPAS, a denoise autoencoder to prevent adversarial attack in semantic segmentation that effectively removes adversarial perturbation. Since semantic segmentation involves the classification of pixels, it is important to restore the original image at the pixel level so that the restored image gives the correct semantic segmentation result. We use random noise that follows a particular distribution. We use the Gaussian distribution, Uniform distribution, and Bimodal distribution. The adversarial attack would change the pixel value of input X slightly, the random noise could cover a variety of attack methods. For the dataset, we use the PASCAL VOC 2012[everingham2010pascal] and test it on the DeepLab V3 Plus [chen2018encoder] which has one of the state-of-the-art models in the field of semantic segmentation task. We first generate an adversarial example of DeepLab V3 Plus, and we verify that our approach is effective against adversarial attack on semantic segmentation. As a result, the performance of our proposed model was around 97 % compared to the original model DeepLab V3 Plus on clean image. In the case of an adversarial attack, the performance of DeepLab V3 Plus dropped to about 13 % of the original performance, but when it passed our denoise autoencoder, it covered up to 68 % compared to the original performance. Therefore, the method we proposed confirmed that the attack is effectively defended while minimizing performance degradation. In addition, we don’t have to retrain the segmentation model. We leave the model we want to defend as it is, and we defend the model by putting a DAPAS in front of it.
The content of the remaining parts is as follows. We show an overall review of related work in Section 2. And we explain our method and show our architecture in Section 3. In Section 4, We evaluate our defense method with an adversarial example. And the conclusion and discussion are provided in Section 5.
2 Related Work
Szegedy et al. found the existence of perturbation that breaks the classifier. This paper presents a simple and effective attack called Fast Gradient Sign Method (FGSM) . It shows small perturbation is enough to fool the classifier. [moosavi2016deepfool] measures the minimum size required for the attack. They give better intuition of the existence of adversarial example by calculating sufficient magnitude of the perturbation. In addition to classification problems, [8237562, xie2017adversarial] shows the adversarial attack on the task of the segmentation model and object detection. And [arnab2018robustness] experiments and analyzes the effect of the adversarial attack on the various semantic segmentation models such as DeepLab V2 [chen2017deeplab] and PSPNet [zhao2017pyramid].
To counter adversarial attacks, some works trained the model with the normal example and adversarial example [kurakin2016adversarial, tramer2017ensemble] which is called adversarial training. During the process of training, they generate adversarial for the training. Although it works, it depends on the particular adversarial data used in the training process. For instance, [kurakin2016adversarial] shows their approach is robust in the simple attack, but not in a more sophisticated attack. In addition, it has engineering penalty since it requires retraining the model. If it takes longer to create an adversarial example, it will take more time to retrain the model. Instead of using the data augmentation, methods to change the model itself were also proposed [elsayed2018large, cisse2017parseval]. They change the objective function of the problem for obtaining the robustness. However, this approach also requires retraining the model so it costs time. [samangouei2018defense, meng2017magnet] preprocess the image before putting in to the model. This approach is similar to our work, but they experiment with the image which has a small resolution like MNIST and CIFAR series.
Currently, there is no general defense method of adversarial attack. In addition, to the best of our knowledge, there is no defense scenario of the semantic segmentation in the context of the adversarial attack. We verify our approach is effective against an adversarial attack in semantic segmentation.
3 Adversarial Attack
Basically, all of the attacks use the gradient of data with respect to the loss function of victim model. In this section, we briefly review the basic method of adversarial attack.
Fast Gradient Sign Method (FGSM). The FGSM is proposed by . It is simple and effective attack method.
Where is a magnitude of noise and is a loss with respect to the true label of the image. It adjusts input X by adding a sign of the gradient of X. It increases the loss function of the victim model so that the model misjudges about the adjusted input. Since it updates input X once, it is also called single-step method. Adversaries can update the input in the direction they want. It decreases the loss of the victim model with respect to the target label set by the adversary.
If the input is modified enough, the model predicts the target which the adversary wants. In both cases, the have the role of the scale of the perturbation. We call the first method as untargeted FGSM and second method as targeted FGSM.
Iterative FGSM (I-FGSM). The iterative FGSM is a repetitive version of FGSM. it is a more powerful attack method compared to the FGSM. It uses the following equations:
Where is a step size for adjusting , and clip function ensures that for all . And we choose step number as ) if , otherwise . It is also called multi-step method. Here is the in the scale of 0 to 255 in the original paper. We use 0.25 for the .
As the case of targeted FGSM, the adversary can modify the data in the way they want.
We denote this algorithm I-FGSM on this paper. Although theses attack methods were introduced in the context of image classification, the same method can be applied in the context of semantic segmentation task. We call the first method as an untargeted I-FGSM and second method as targeted I-FGSM. We use untargeted FGSM and untargeted I-FGSM on this paper.
Our mechanism does not modify the semantic segmentation model. We train denoise autoencoder as a preprocessor. And we place it in front of the semantic segmentation model. We use pre-trained DeepLab V3 Plus for the semantic segmentation model which is the state-of-the-art. In this section, we show the architecture of denoise autoencoder, detail of training setup and demonstrate how the denoise autoencoder is deployed in constructing a robust semantic segmentation model.
We trained with PASCAL VOC 2012 data which is widely used for the task of semantic segmentation. There are a total of 1424 training images and 1424 verification data . The pixel value with a value from 0 to 255 was re-scaled to change from 0 to 1. And the resolution of the image was fixed to .
4.2 Architecture of Denoise Autoencoder
The overall structure of the model is shown in Figure 2. The denoise autoencoder is divided into two parts, orange-colored encoder part, and blue-colored decoder part. While encoder extracts the feature of the input image, the decoder restores the input image from the extracted feature. The encoder consists of five convolutional layers and the resolution of each feature map gradually reduces by half. We did not use max pooling or average pooling for decreasing the resolution of the feature map. We adjust the stride of the kernel for decreasing the resolution. In the decoder, it also consists of five deconvolutional layers and the resolution of each feature map expands twice. And we use skip connection to restore the details of the spatial information of the feature maps. In other words, the features used in the encoder were symmetrically linked to the features of the decoder. We add features which have the same resolution. Here we did not connect the first feature through the skip connection, i.e input image. Since input image has noisy information. For the activation function, we use ELU instead of RELU for each layer. And we use sigmoid as the last activation function.
We use Adam optimizer, 0.0005 for learning rate. We add random noise to the train set. And we use either clean or noisy input. It is a little bit different from the original framework of denoise autoencoder [vincent2010stacked]
. Since our purpose is to also maintain the original performance in case of no adversarial attack, we also put clean input. For random noise, we use Gaussian distribution, Uniform distribution, and Bimodal distribution. For the Gaussian distribution, we set mean of zero and the standard deviation of 0.004. For the Uniform distribution, we set the range from -0.035 to 0.035. For the Bimodal distribution, we use a mixture of two Gaussian distributions. For each Gaussian distribution, we set mean of -0.024 and 0.024. And we use the standard deviation of 0.004.
4.4 Combining with semantic segmentation model
The created denoise autoencoder is connected to the general model that performs semantic segmentation like Figure 4. The denoise autoencoder has the role of preprocessing before the image entering the segmentation model. Since denoise autoencoder is independent of the segmentation model, it can be located in front of any model. Therefore it serves as a general defense mechanism.
Hence, we do not have to re-train the model we want to defend. Besides, the random noise used in training the denoise autoencoder is independent of any adversarial attack, it can defend against a variety of attack.
In this section, we look at the ability of the denoise autoencoder to restore and then measure how the restored image performs in the segmentation model DeepLab V3 Plus. We test the results of segmentation in DeepLab V3 Plus using test data from the Pascal VOC with additional annotation from SBD [hariharan2011semantic]
. In adversarial attack, we assumed that noise is not large. Because the purpose of adversarial attack is not to deceive people but to deceive models. So we limit the magnitude of the noise to 0.032 of pixel level. i.e, it changes 3.2 % of the original pixel value. We properly adjust the standard deviation of Gaussian distribution and Bimodal distribution,and the range of Uniform distribution. The details of the distribution, the evaluation metric, and the result of experiment are summarized below.
We use a pre-trained model, which normalized the data so we did the same for testing. In other words, the average was subtracted and divided by the standard deviation. We use PASCAL VOC validation set which is not used in the process of training the denoise autoencoder. And we resize the image as
5.2 Denoise Autoencoder as a restoration
We visualize the output of denoise autoencoder before measuring the robustness against adversarial attack. Although we use three different distribution for noise, we show the case of Gaussian noise for simplicity. Since the noise level is not much different, the results of other distributions are similar. We make sure that the clean image is well restored after denoise autoencoder as well as in case of the noisy image since the performance on a clean image should not be compromised. Figure 3-(a) shows the original image, Figure 3-(b) shows the noisy image, Figure 3-(c) shows the original image after denoise autoencoder and Figure 11 shows the noisy image after denoise autoencoder. It is easy to see that Figure 3-(c) is more clear than Figure 3-(d). Therefore we can intuitively expect that the reduction ratio will be not that much.
|mIoU||IoU ratio of robust (%)||mIoU||IoU ratio of robust (%)||mIoU||IoU ratio of robust (%)|
|mIoU||IoU ratio of robust (%)||mIoU||IoU ratio of robust (%)||mIoU||IoU ratio of robust (%)|
5.3 Evaluation metric
The mean Intersection over Union (mIoU) is widely used for evaluating the performance of semantic segmentation [everingham2010pascal]. And we adapt relative metric IoU Ratio for measuring the robustness [arnab2018robustness]. The IoU ratio on the attack is defined as follows.
: mIoU of a clean image on the original
: mIoU of an adversarial image on the
: mIoU of a clean image on the proposed
: mIoU of an adversarial image on the
This is a metric that shows the performance is compared to the original model performance. And we measure the mIoU ratio of the original model to the proposed model to calculate the reduction on a clean image as the following.
5.4 Analysis of results
The performance reduction due to the denoise autoencoder was as low as around 3% for all three distributions. And in the situation of attack, we verify that denoise autoencoders are effective against in adversarial attack. Among the three denoise autoencoders, using the Bimodal distribution has the best performance in the scenario of the attack. Figures 11 and 21 show the results. We can see that the segmentation output is weird when the FGSM and the I-FGSM are applied. In addition, We can see that the I-FGSM is less noisy than the FGSM, but the segmentation results show that the attack was more effective in I-FGSM (Fig 14, Fig 18). And after passing the denoise autoencoder, it successfully defends on both cases. Table 2 shows IoU ratio of attack. The I-FGSM is more powerful than FGSM. When the is 0.008, 0.0016 and 0.0032, IoU ratio of attack on FGSM are similar. But in case of I-FGSM, IoU ratio of attack significantly drops to 24.2%, 21.9% and 12.0%. Table 3 and Table 4 shows IoU ratio of robust on FGSM and I-FGSM. Table 3 and Table 4 show IoU ratio of robust on FGSM and I-FGSM. Comparing the two tables, IoU ratio of robust on I-FGSM is larger than IoU ratio of robust on FGSM although the attack is more effective on I-FGSM.
Gaussian distribution We use the small standard deviation of Gaussian distribution since we want to check the performance depends on the noise distribution, so we use the mean of 0 and the standard deviation of 0.004. Although it is also effective, the performance is middle among the three denoise autoencoders.
Uniform distribution We give the range from -0.035 to 0.035 of Uniform distribution since the maximum magnitude of adversarial perturbation is 0.032. Among the three distribution, denoise autoencoder using the Uniform distribution has the best performance in terms of reduction. It shows 96.9% IoU ratio of reduction. Therefore, it only decreased by 3.1 % compared to the original performance. In case of the attack scenario, it shows worst performance compared to other denoise autoencoders.
Bimodal distribution For most cases, denoise autoencoder trained with Bimodal distribution noise shows the best performance among tree distribution. Since the adversarial attack would add or subtract noise of fixed size, we use the noise following the Bimodal distribution using the two Gaussian distributions like Figure 8. Each Gaussian distribution has the mean of -0.24 and 0.24, the standard deviation of the 0.004.
The IoU reduction is 97.2 %, which is the worst among three distributions, but the difference is small. And in case of attack, the performance was the best.
We verify the denoise autoencoder is effective in defending against adversarial attack in the context of the semantic segmentation task. We also confirm that the performance varies slightly depending on what kind of noise distribution the denoise autoencoder produces in the input. We also believe that since our denoise autoencoder is independent of particular attack when designing the denoise autoencoder, this approach is available not only in the semantic segmentation task but also in the areas of classification and object detection. The design of a more detailed and careful denoise autoencoder against adversarial attack remains a future study.