Reversible Adversarial Example based on Reversible Image Transformation

11/06/2019 ∙ by Zhaoxia Yin, et al. ∙ 0

At present there are many companies that take the most advanced Deep Neural Networks (DNNs) to classify and analyze photos we upload to social networks or the cloud. In order to prevent users privacy from leakage, the attack characteristics of the adversarial example can be exploited to make these models misjudged. In this paper, we take advantage of reversible image transformation to construct reversible adversarial example, which is still an adversarial example to DNNs. It not only allows DNNs to extract the wrong information, but also can be recovered to its original image without any distortion. Experimental results show that reversible adversarial examples obtained by our method have higher attack success rates while ensuring that the reversible image quality is still high. Moreover, the proposed method is easy to operate, suitable for practical applications.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

With the rapid development of the Internet and the popularity of smart devices, there emerge a large amount of multimedia data, which is presented in the form of images, videos and others in everyday life. More and more people like to share multimedia data, especially images, to social networks or back up to the cloud. However, these personal data has become the most precious commodity in the digital age, and some powerful companies use it to conduct large-scale transactions. In 2014, many private photos of Hollywood actresses on iCloud were leaked. In 2018, Facebook was exposed to the largest data breach in history, and Cambridge data analysis agency involved was related to Trump’s investigation. Cambridge analysis was alleged to use the personal data of 50 million users obtained on Facebook to create files without the user’s consent, and to target them during the 2016 presidential election. It can be clearly seen that personal privacy disclosure rises a significant impact on individuals and society.

Deep Neural Networks (DNNs) have achieved extraordinary success in different fields, ranging from image classification [9] [5], text analysis [2] to speech recognition [3]. However, recent works show that DNNs are extremely vulnerable to adversarial examples generated by imposing subtle perturbations to inputs that lead a model to predict incorrect outputs. The existence of adversarial examples poses a great danger to the deployment of security-related applications [18] [11]. However, we can employ adversarial examples to protect the privacy of the image content. Many companies take some state-of-the-art algorithms to classify images and analyze user information. At present, the most advanced algorithms are based on deep neural networks. In order to protect the privacy of users, we can take advantage of the attack characteristic of adversarial examples. Specifically, we can use attack algorithms to generate adversarial example images, then share them to the social platform or back up to the cloud. It will allow the illegal organizations to misclassify the images and extract wrong information for protecting our privacy.

There is a problem with this method. In order to save storage space, local images are usually deleted after the users upload adversarial examples to the cloud. However, at the same time we will be unable to get original images. Therefore, people hope to develop a kind of adversarial example which can fool DNNs like its adversarial example, and the created adversarial examples can be controlled by the users [7], i.e, the original image can be recovered from it. Reversibility is an ideal feature of cyber weapons, Liu et al. first proposed reversible adversarial example to solve this problem [12]. It mainly uses the reversible data hiding (RDH) [16] to hide the adversarial perturbation image as secret information into its adversarial example to obtain the reversible adversarial example. This method will be specifically introduced in related work. However, due to the limited embedding amount of RDH, it will be difficult to fully embed the perturbation image into adversarial example or the image quality will be poor when the adversarial perturbation is slightly enhanced. And this method has many operation steps and the implementation of this method is complex.

Based on this, we hope that there is a way to hide the original image directly into the adversarial example so that the original image can be recovered without distortion from the reversible adversarial example. Reversible image transformation [6] is a special data hiding algorithm with super large payload, which can hide the original image into the same size adversarial image so that the original image can be reversibly recovered. Therefore, we can take advantage of Reversible Image Transformation (RIT) to construct reversible adversarial examples. Specifically, we utilize reversible image transformation to disguise original image as its adversarial example to obtain the reversible adversarial example. The original image can be recovered without distortion from the reversible adversarial example by RIT recovery algorithm.

The experimental results show that the visual quality of the reversible adversarial examples obtained by our method is better in the same situation, and the attack success rates on IFGSM [10] and C&W_L2 [1] can reach 99.24% and 94.74%, respectively. Moreover, the operation process of our proposed scheme is simpler, can be applied to many practical scenarios, such as military images, image privacy protection for social platform and the cloud.

2 Related work

In this section, we first briefly summarize several classification methods of the adversarial attacks as well as the existing adversarial attack algorithms, and then describe the reversible adversarial example framework proposed by Liu et al. Finally, we introduce reversible image transformation algorithm used in our proposed scheme.

2.1 Adversarial Example

2.1.1 Attack classification

  • Based on the adversarial goal, attacks can be classified into two categories: targeted and non-target attacks [15]. The target attack is to change the input so that the classifier produces a specific output that is different from its true label. The non-target attack is to create an arbitrary output that is different from its true label by modifying the input.

  • Based on the adversarial capabilities, these attacks can be categorized as white-box and black-box attacks [15]

    . The white-box attack has a priori knowledge of the target model, such as the network structure, parameters, hyperparameters, training methods and training data. The black-box attack has limited knowledge of the model (for example, its training process or architecture), but the model parameters are never known.

  • Based on the calculation methods of generating adversarial disturbances, the distortion can be calculated by three measures: [1]. The distance measures the number of pixels changed in the image instead of perturbing the whole image to fool the classifier. The distance measures the standard Euclidean distance of the changed pixel. The distance can keep small when there are many small changes to many pixels. The measures the maximum change to any pixels, there is no limit to the number of pixels to be modified.

Method Black/White box Targeted/Non-targeted Perturbation norm Strength
FGSM White box Targeted  ***
IFGSM White box Targeted  ***
C&W White box Targeted  *****
Table 1: Summarize the attributes of these types of attack methods. The strength higher for more asterisks.

In this work, we focus on the white-box setting to generate adversarial examples to verify the attack effect of reversible adversarial examples. As shown in Table 1, we show several kinds of the most advanced attack methods. In the following paragraph, we will specifically introduce the principles of these attack methods.

2.1.2 Attack methods

  • Fast Gradient Sign Method(FGSM) [4] Goodfellow believes that even a small disturbance in a linear high-dimensional space can have a very large impact on the output. The original image , perturbed , adversarial example: , when there is a small enough negligible that satisfies the condition:

    , we expect the classifier to agree on the classification results of the two samples, but now consider adding the weight vector

    , then there is a formula:

    (1)

    When the dimension of

    is high, the adversarial disturbances affect the activation function by

    so that the classification result is wrong.

    Based on the proposed linearization of the classifier, Goodfellow proposed a very simple method to generate adversarial example called FGSM. For an input image, the model was misclassified by adding adversarial disturbances in the direction that the DNNs gradient changed the most. The calculation formula for the disturbance in (1) as follows:

    (2)

    where

    denotes the cross entropy cost function. In short, the loss function of the model is first derived, then the sign function is taken, and the amplitude of the disturbance

    is multiplied to obtain the adversarial example.

  • IFGSM [10] IFGSM was proposed as an iterative version of FGSM. It applies FGSM multiple times with small disturbance instead of adding a large disturbance. The pixels are appropriately clipped after each iteration to ensure that the results remain in the neighborhood of the input image .

    (3)
  • Carlini and Wagner(C&W) [1] Carlini and Wagner proposed a stronger iterative attack method called C&W. It is an optimization-based attack that makes perturbations undetectable by limiting the , , norms. The advantage of this method is that the generated perturbations are small, and the disadvantage is that it takes a long time to generate an adversarial example. C&W_L2 algorithm obtains adversarial examples by solving the following optimization problems:

    (4)

    Where controls the confidence that the image is misunderstood by the model, i.e., the confidence gap between the sample category and the real category. is the logical output of the category .

2.2 Reversible Adversarial Example

The framework of the algorithm proposed by liu et al. is shown in Fig. 1, which mainly uses the concept: an adversarial example is formed by adding a slight adversarial perturbation to the original image. Firstly, perturbation image is obtained by making the pixel difference between adversarial example and original image, and then Recursive Histogram Modification (RHM) [16] embedding algorithm is used to hide the perturbation image as secret information into adversarial example to obtain reversible adversarial example. Next, RHM extraction algorithm is executed to extract the embedded information and restore it to the perturbation image. Finally, the original image is obtained by making a difference between the corresponding pixel value of adversarial example and perturbation image.

Figure 1: The Overall framework of reversible adversarial samples in [12].

2.3 Reversible Image Transformation

Reversible image transformation reversibly transforms an original image to an arbitrarily-chosen target image with the same size and gets a camouflage image similar to the target image, which is a new type of data hiding method with a super large payload. As shown in Fig. 2, it includes two stages: In the transformation phase, image visual transformation operation is first performed, and then embedding image visual transformation information into the transformed image to get the camouflage image. In recovery phase, the reversible data hiding algorithm is used to extract the image visual transformation information from the camouflage image, and the transformation information can be used to recover the original image without distortion.

Figure 2: Process of reversible image transformation algorithm. A is an original image. B is a target image. is the camouflage image. is the transformed image recovered from .

Hou et al. first proposed the concept of reversible image transformation [8]. Before image transformation, they make use of the non-uniform clustering algorithm to match the original blocks and the target blocks, which greatly reduces the amount of auxiliary information (AAI) for recording the indexes of original blocks. Not only does the visual quality of camouflage images can keep good, but also the original image can be recovered without loss. This method has been applied to reversible data hiding in encrypted image (RDH-EI) [17]. In [6], Hou et al. raise a new reversible image transformation technique for color images. By exploring and utilizing the correlation between the three channels of the color image and compressing the transformation parameters, the AAI for restoring the original image is greatly reduced, and original images and target images can be divided into smaller blocks so that the visual quality of the camouflaged images is further improved.

3 Reversible Adversarial Examples based on Reversible Image Transformation

In this paper, we take advantage of reversible image transformation algorithm to hide the original image into adversarial example to get reversible adversarial example. The architecture of the proposed scheme described in Fig. 3, and the model classification results of images at various stages are shown below the corresponding image.

Figure 3: Architecture of the proposed scheme.

The whole procedures of reversible adversarial example to protect image privacy are as follows. Firstly, we use the current attack algorithm to attack the original image to generate the adversarial example. Then, in order to obtain a reversible adversarial example, we regard its adversarial example as the target image and disguise the original image into the adversarial example with reversible image transformation algorithm. The reversible adversarial example is reversible because its image is embedded with image transformation information, which is the auxiliary information used to recover the original image. Finally, we can upload the reversible adversarial example to social platform or the cloud to fool deep neural networks while ensuring that human eyes can correctly extract the semantic information. Since reversible adversarial example is embedded with auxiliary information, we can extract it to recover the original image Losslessly with RIT recovery algorithm.

4 Evaluation and Analysis

4.1 Experimental Setup

4.1.1 Dataset

The ImageNet Large Scale Visual Recognition Challenge(ILSVRC 2012)

111http://www.image-net.org/challenges/LSVRC/2012/nonpub-downloads.

4.1.2 Model

Use the Inception_v3 model on ImageNet [14].

4.1.3 Attack Methods

Here we generate adversarial examples as target images in the two white-box methods: IFGSM, C&W [13].

4.1.4 Reversible Image Transformation

We use the method in [6]. The block size is set to be 1*1 with the best visual quality.

4.2 Performance Evaluation

In this section, in order to evaluate the performance of the generated RAEs, we test attack success rates of the RAEs on ImageNet.

On ImageNet dataset, we choose 1000 images that can be correctly classified by the model, use two attack algorithms to generate adversarial examples. Then, we select the samples that can successfully attack the model, use reversible image transformation algorithm to transform original images into target adversarial images to get reversible adversarial examples. Finally, we use the generated reversible adversarial images to attack the model to get its attack success rates. From Table 2, we can see the final attack success rate is up to 99.24% on IFGSM, 94.74% on C&W_L2, attack effect is better than Liu et al. At the same time, we can see from the experimental results that the attack effect on IFGSM is relatively better than C&W_L2. As shown in Fig. 4, the reversible adversarial examples generated by the proposed method can still keep good visual quality, while there is basically no visual difference between the reversible adversarial example and its adversarial example.

IFGSM IFGSM C&W_L2 C&W_L2
(=8/225) (=16/225) (=50) (=100)
The method in [12] 86.43% 98.09% 46.34% 63.16%
Proposed method 96.54% 99.24% 80.49% 94.74%
Table 2: Attack success rates of reversible adversarial examples. RAE: Reversible Adversarial Examples
Attacks methods RAE/Ori RAE/AE Ori/AE
IFGSM(=8/225) The method in [12] 21.77 23.33 32.34
Proposed method 27.61 32.23
IFGSM(=16/225) The method in [12] 20.35 23.59 27.30
Proposed method 23.84 31.03
C&W_L2(=50) The method in [12] 26.20 26.59 44.40
Proposed method 34.64 36.16
C&W_L2(=100) The method in [12] 22.45 23.18 36.77
Proposed method 30.46 33.41
Table 3: Using PSNR to measure the image quality of reversible adversarial examples. RAE: Reversible Adversarial Examples, AE: Adversarial Examples, Ori: Original Images
Figure 4: Image quality of adversarial example and reversible adversarial example obtained by different attack methods

Next, in order to quantitatively evaluate the image quality of reversible adversarial examples, we measured three sets of PSNR: reversible adversarial examples and original images, reversible adversarial examples and adversarial examples as well as original images and adversarial examples. The experimental results are shown in Table 3. From Table 3 and Fig. 4 we can see that the visual quality of the reversible adversarial examples obtained by our method is better than that of Liu et al. From the third column of Table 3, we can know that the difference between the reversible adversarial examples and the original images is difficult to detect by the human eyes on CW_L2 by our method, and can be perceived subtlely by the human eye on IFGSM. From the fourth column, we can see that after the reversible image transformation disguise the original image as its adversarial examples by RIT, there is basically no visual difference between reversible adversarial images and the adversarial images, so it can maintain the attack characteristics of the adversarial examples. From the last column and Table 2, we can see that the larger the perturbation is used to generate adversarial examples by our method, the better the attack effect of the reversible adversarial examples will be obtained. This is because the amount of auxiliary information embedded in the reversible adversarial example used to restore the original image has nothing to do with the magnitude of the perturbation in the adversarial example. We know that the adversarial example obtained by the IFGSM attack (Fig. 4 (b)) algorithm has a larger perturbation than C&W_L2 (Fig. 4 (f)), so the auxiliary information embedded in the reversible adversarial example by IFGSM has less interference with its perturbation structure, leading to its attack effect is still good.

During the experiment we found that the attack effect is affected to some extent by the difference in the amount of auxiliary information used to recover the original image and the block size of reversible image transformation. In our experiment, we set the block size to be 1*1. In this case, the visual difference between the reversible adversarial examples and its adversarial examples is minimal, the generated reversible adversarial examples have the best attack effect. In general, the addition of adversarial disturbances has more effect on the image quality than the embedding of auxiliary information. We can reduce the influence of embedding on the reversible adversarial examples by increasing the amount of disturbance, i.e., it can be said that we can enhance the attack success rate of the generated reversible adversarial examples by increasing the disturbance amounts when generating the adversarial images.

5 Conclusion

In this paper, we propose an efficient image privacy protection scheme for social platform and the cloud. We take advantage of reversible image transformation to construct reversible adversarial examples, which aims to fool deep neural networks that are used to analyze user-uploaded image content. In this work, we regard adversarial example as the target image, original image will be disguised as its adversarial example to get reversible adversarial example. And the original image can be recovered from the reversible adversarial example without distortion. The experimental results show that the reversible adversarial example can provide an excellent attack efficiency to achieve desired privacy protection goals while ensuring that the image quality is still good. Our further work includes reducing the amount of auxiliary information embedding and improving RDH methods to enhance reversible image transformation algorithm so that the attack success rate of reversible adversarial examples is further improved.

References

  • [1] N. Carlini and D. Wagner (2017) Towards evaluating the robustness of neural networks. In 2017 IEEE Symposium on Security and Privacy (SP), pp. 39–57. Cited by: §1, 3rd item, 3rd item.
  • [2] R. Collobert and J. Weston (2008)

    A unified architecture for natural language processing: deep neural networks with multitask learning

    .
    In

    Proceedings of the 25th international conference on Machine learning

    ,
    pp. 160–167. Cited by: §1.
  • [3] H. Geoffrey, D. Li, Y. Dong, E. D. George, and A. Mohamed (2012) Deep neural networks for acoustic modeling in speech recognition: the shared views of four research groups. IEEE Signal Processing Magazine 29 (6), pp. 82–97. Cited by: §1.
  • [4] I. J. Goodfellow, J. Shlens, and C. Szegedy (2014) Explaining and harnessing adversarial examples. Computer Science. Cited by: 1st item.
  • [5] K. He, X. Zhang, S. Ren, and J. Sun (2016) Deep residual learning for image recognition. In

    Proceedings of the IEEE conference on computer vision and pattern recognition

    ,
    pp. 770–778. Cited by: §1.
  • [6] D. Hou, C. Qin, N. Yu, and W. Zhang (2018) Reversible visual transformation via exploring the correlations within color images. Journal of Visual Communication and Image Representation 53, pp. 134–145. Cited by: §1, §2.3, §4.1.4.
  • [7] D. Hou, W. Zhang, J. Liu, S. Zhou, D. Chen, and N. Yu (2019) Emerging applications of reversible data hiding. In Proceedings of the 2nd International Conference on Image and Graphics Processing, pp. 105–109. Cited by: §1.
  • [8] D. Hou, W. Zhang, and N. Yu (2016) Image camouflage by reversible image transformation. Journal of Visual Communication and Image Representation 40, pp. 225–236. Cited by: §2.3.
  • [9] A. Krizhevsky, I. Sutskever, and G. E. Hinton (2012)

    Imagenet classification with deep convolutional neural networks

    .
    In Advances in neural information processing systems, pp. 1097–1105. Cited by: §1.
  • [10] A. Kurakin, I. Goodfellow, and S. Bengio (2016) Adversarial examples in the physical world. arXiv preprint arXiv:1607.02533. Cited by: §1, 2nd item.
  • [11] B. Li and Y. Vorobeychik (2015) Scalable optimization of randomized operational decisions in adversarial classification settings. In Artificial Intelligence and Statistics, pp. 599–607. Cited by: §1.
  • [12] J. Liu, D. Hou, W. Zhang, and N. Yu (2018) Reversible adversarial examples. arXiv preprint arXiv:1811.00189. Cited by: §1, Figure 1, Table 2, Table 3.
  • [13] N. Papernot, F. Faghri, N. Carlini, I. Goodfellow, R. Feinman, A. Kurakin, C. Xie, Y. Sharma, T. Brown, A. Roy, et al. (2016) Technical report on the cleverhans v2. 1.0 adversarial examples library. arXiv preprint arXiv:1610.00768. Cited by: §4.1.3.
  • [14] C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna (2016) Rethinking the inception architecture for computer vision. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 2818–2826. Cited by: §4.1.2.
  • [15] C. Xiao, J. Zhu, B. Li, W. He, M. Liu, and D. Song (2018) Spatially transformed adversarial examples. arXiv preprint arXiv:1801.02612. Cited by: 1st item, 2nd item.
  • [16] W. Zhang, X. Hu, X. Li, and N. Yu (2013) Recursive histogram modification: establishing equivalency between reversible data hiding and lossless data compression. IEEE transactions on image processing 22 (7), pp. 2775–2785. Cited by: §1, §2.2.
  • [17] W. Zhang, H. Wang, D. Hou, and N. Yu (2016) Reversible data hiding in encrypted images by reversible image transformation. IEEE Transactions on multimedia 18 (8), pp. 1469–1479. Cited by: §2.3.
  • [18] Z. Zhong, M. Lei, D. Cao, J. Fan, and S. Li (2017) Class-specific object proposals re-ranking for object detection in automatic driving. Neurocomputing 242, pp. 187–194. Cited by: §1.