Reversible Adversarial Examples based on Reversible Image Transformation

11/06/2019 ∙ by Hua Wang, et al. ∙ 18

Recent studies show that widely used deep neural networks (DNNs) are vulnerable to carefully crafted adversarial examples, it inevitably brings some security challenges. However, the attack characteristic of adversarial examples can be taken advantage to do privacy-preserving image research. In this paper, we make use of Reversible Image Transformation to construct reversible adversarial examples, which are still misclassified by DNNs that are utilized by illegal organizations to steal privacy of image content that we upload to the cloud or social platforms. Most importantly, the proposed method can recover original images from downloaded reversible adversarial examples with no distortion. The experimental results show that the attack success rate of the reversible adversarial examples obtained by this method can reach more than 95



There are no comments yet.


page 5

page 7

page 8

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Deep Neural Networks (DNNs) have achieved extraordinary success in different fields, ranging from image classification [8] [5], text analysis [2] to speech recognition [3]

. However, recent works show that DNNs are extremely vulnerable to adversarial examples generated by imposing subtle perturbations to inputs that lead a model to predict incorrect outputs. There are two characteristics of adversarial examples: First, the adversarial example looks like the original image; second, it can cause the machine learning model to give a wrong classification. The existence of adversarial examples poses a great danger to the deployment of security-related applications

[18] [10].

Although adversarial examples raise a huge threat to the safety of machine learning models. However, we can employ adversarial examples to protect the privacy of the image content. With the rapid development of intelligent devices and the Internet, it has produced a large amount of multimedia data, which is presented in the form of images, videos and others in everyday life. More and more people like to share multimedia data, especially images, to social networks or back up to the cloud. However, many companies take some state-of-the-art algorithms to classify images and analyze user information. At present, the most advanced algorithms are based on deep neural networks. In order to protect the privacy of users, we can take advantage of the attack characteristic of adversarial examples. Specifically, we can use attack algorithms to generate adversarial example images, then share to the social platform or back up to the cloud. Finally, this strategy will allow the illegal organizations to misclassify the images and extract wrong information for protecting our privacy. Forthemore, since the image quality of the adversarial examples and the original images are invisible to human eyes, when we upload the adversarial example images to the social platform or the cloud, it does not affect human to correctly extract the semantics of the images.

But there is a problem with this method. The users upload images to cloud backup, local images are usually deleted to save storage space. However, when the users upload adversarial examples for image privacy protection using the above method, and then delete the local images, the original image cannot be obtained after download from the cloud. Therefore, we want to hide the original image into the adversarial example so that the original image can be recovered without distortion from the maked adversarial example. But the current traditional reversible data hiding method [16] can not hide one image into another image with the same size.

Reversible image transformation [6] is a data hiding algorithm with super large payload, which can hide an image into another image with same size and the original image can be reversibly recovered. Specifically, we make use of reversible image transformation to transform the original image to the adversarial example and then obtain the maked adversarial example, which is called reversible adversarial example (RAE) in our paper. There are two functions of reversible adversarial example. Firstly, RAE can make illegal models classified incorrectly and prevent image privacy from leaking. Since reversible adversarial example is visually indistinguishable from the original image, after uploading the reversible adversarial example to the cloud or social platform, human eyes can correctly extract the image semantic information. At the same time, the illegal model is guaranteed to extract wrong information to protect user privacy. Secondly, due to reversible adversarial example is embedded with auxiliary information, we can extract it to recover the original image without any distortion by RIT recovery algorithm.

The experimental results show that the reversible adversarial example is still an adversarial example to the neural network, that is to say, the neural network will still misclassify the reversible adversarial example, and will produce the same wrong classification result as its adversarial example. Therefore, we can upload reversarial adversarial example to social platform or the cloud when we want to share or save our photos. Our method can recover the original image from reversible adversarial example without distortion. Our proposed scheme can be applied to privacy-preserving image research for social platform and the cloud.

2 Related work

In this section, we first briefly summarize several classification methods of the adversarial attacks as well as the existing adversarial attack algorithms. Then we introduce reversible image transformation algorithm used in our method.

2.1 Adversarial Example

2.1.1 Attack classification

  • Based on the adversarial goal, attacks can be classified into two categories: targeted and non-target attacks [15]. The target attack is to change the input so that the classifier produces a specific output that is different from its true label. The non-target attack is to create an arbitrary output that is different from its true label by modifying the input.

  • Based on the adversarial capabilities, these attacks can be categorized as white-box and black-box attacks [15]

    . The white-box attack has a priori knowledge of the target model, such as the network structure, parameters, hyperparameters, training methods and training data. The black-box attack has limited knowledge of the model (for example, its training process or architecture), but the model parameters are never known.

  • Based on the calculation methods of generating adversarial disturbances, the distortion can be calculated by three measures: [1]. The distance measures the number of pixels changed in the image instead of perturbing the whole image to fool the classifier. The distance measures the standard Euclidean distance of the changed pixel. The distance can keep small when there are many small changes to many pixels. The measures the maximum change to any pixels, there is no limit to the number of pixels to be modified.

In this work, we focus on the white-box setting to generate adversarial examples to verify the attack effect of reversible adversarial examples. As shown in Table 1, we show several kinds of the most advanced attack methods. In the following paragraph, we will specifically introduce the principles of these attack methods.

Method Black/White box Targeted/Non-targeted Perturbation norm Strength
FGSM White box Targeted  ***
IFGSM White box Targeted  ***
DeepFool White box Non-targeted  ****
C&W White box Targeted  *****
Table 1: Summarize the attributes of these types of attack methods. The strength higher for more asterisks.

2.1.2 Attack methods

  • Fast Gradient Sign Method(FGSM) [4] Goodfellow believes that even a small disturbance in a linear high-dimensional space can have a very large impact on the output. The original image , perturbed , adversarial example: , when there is a small enough negligible that satisfies the condition:

    , we expect the classifier to agree on the classification results of the two samples, but now consider adding the weight vector

    , then there is a formula:


    When the dimension of

    is high, the adversarial disturbances affect the activation function by

    so that the classification result is wrong.

    Based on the proposed linearization of the classifier, Goodfellow proposed a very simple method to generate adversarial example called FGSM. For an input image, the model was misclassified by adding adversarial disturbances in the direction that the DNNs gradient changed the most. The calculation formula for the disturbance in (1) as follows:



    denotes the cross entropy cost function. In short, the loss function of the model is first derived, then the sign function is taken, and the amplitude of the disturbance is multiplied to obtain the adversarial example.

  • IFGSM [9] IFGSM was proposed as an iterative version of FGSM. It applies FGSM multiple times with small disturbance instead of adding a large disturbance. The pixels are appropriately clipped after each iteration to ensure that the results remain in the neighborhood of the input image .

  • DeepFool [11] DeepFool is a non-target attack algorithm which generates an adversarial example by iteratively perturbing the image. According to the linearization of model, it explores the smallest disturbance at each iteration to cross the decision boundary, making the classification result wrong. The algorithm produces a smaller disturbance than the FGSM when generating an adversarial example.

  • Carlini and Wagner(C&W) [1] Carlini and Wagner proposed a stronger iterative attack method called C&W. It is an optimization-based attack that makes perturbations undetectable by limiting the , , norms. The advantage of this method is that the generated perturbations are small, and the disadvantage is that it takes a long time to generate an adversarial example. CW_L2 algorithm obtains adversarial examples by solving the following optimization problems:


    Where controls the confidence that the image is misunderstood by the model, i.e., the confidence gap between the sample category and the real category. is the logical output of the category .

2.2 Reversible Image Transformation

Reversible image transformation reversibly transforms an original image to an arbitrarily-chosen target image with the same size and gets a camouflage image similar to the target image, which is a new type of data hiding method with a super large payload. As shown in Fig. 1, it includes two stages: In the transformation phase, image visual transformation operation is first performed, and then embedding image visual transformation information into the transformed image to get the camouflage image. In the recovery phase, the reversible data hiding algorithm is used to extract the image visual transformation information from the camouflage image, and the transformation information can be used to recover the original image without distortion.

Figure 1: A is an original image. B is a target image. is the camouflage image. is the transformed image recovered from .

Hou et al. first proposed the concept of reversible image transformation [7]. Before image transformation, we make use of the non-uniform clustering algorithm to match the original blocks and the target blocks, which greatly reduces the amount of auxiliary information(AAI) for recording the indexs of original blocks. Not only does the visual quality of camouflage images increase a lot, but also the original image can be recovered without loss. This method has been applied to reversible data hiding in encrypted image (RDH-EI) [17]. In [6], Hou et al. raise a new reversible image transformation technique for color images. By exploring and utilizing the correlation between the three channels of the color image and compressing the transformation parameters, the AAI for restoring the original image is greatly reduced, and the original image and the target image can be divided into smaller blocks so that the visual quality of the camouflaged image is further improved.

3 Reversible Adversarial Examples based on Reversible Image Transformation

In this paper, we take advantage of reversible image transformation algorithm to hide the original image into adversarial example to get reversible adversarial example. Specifically, firstly, we generate adversarial example with the current advanced attack algorithm. Then, with reference to the adversarial example, the original image is transformed to a reversible adversarial example by reversible image transformation. We can upload the reversible adversarial example to social platform or the cloud. After downloading the uploaded image, the original image can be restored form RAE with the reversible image recovery algorithm. The architecture of the system is described in Fig. 2, and the model classification results of each image are shown below the corresponding image.

Figure 2: Architecture of the proposed scheme.

4 Evaluation and Analysis

4.1 Experimental Setup

4.1.1 Dataset

The MNIST database of handwritten digits

111; The ImageNet Large Scale Visual Recognition Challenge(ILSVRC) 222

4.1.2 Model

Use the model in [12] on MNIST and the Inception_v3 model on ImageNet [14].

4.1.3 Attack Methods

Here we generate adversarial examples as target images in the three advanced white-box methods: IFGSM, JSMA[13], DeepFool, C&W.

4.1.4 Reversible Image Transformation

On the grayscale image, we use the method in [17]. On the color image, we use the method in [6]. The block size is set to be 1*1 with the best visual quality.

4.2 Performance Evaluation

In this section, in order to evaluate the performance of the generated RAE, we test attack success rates of the RAEs on MNIST and ImageNet, respectively.

As shown in Table 2, On the MNIST dataset, we can see that the classification accuracies of the model are above 0.99, and the attack success rates of the generated adversarial examples are also above 0.97. The attack success rates of the reversible adversarial examples generated by our method are as high as 0.95 or more. Therefore, we can see that on the grayscale images, the reversible adversarial examples obtained by the reversible image transformation have similar excellent attack effect with adversarial examples. The framework of reversible image transformation on MNIST is shown in Fig. 3.

Figure 3: The reversible image transformation framework on the MNIST, the method of generating the adversarial example is deepfool.
Model Accuracy 0.9909 0.9920 0.9904 0.9911
AE Attack Success Rate 0.9886 0.9914 0.9705 0.973
RAE Attack Success Rate 0.9563 0.9887 0.9613 0.970
Table 2: Attack success rates of reversible adversarial examples obtained by reversible image transformation on MNIST datasets. AE: Adversarial Examples, RAE: Reversible Adversarial Examples

On the ImageNet dataset, we choose 2000 images that can be correctly classified by the model, use three attack algorithms to generate adversarial examples. Then, we select the samples that can successfully attack the model, use reversible image transformation algorithm to transform the original images into the target adversarial images to get the reversible adversarial examples. Finally, we use the generated reversible adversarial images to attack the model to get its attack success rate. From Table 3 we can see the final attack success rate is up to 0.95 on IFGSM, and the attack success rate on C&W and DeepFool is relatively low, 0.65 and 0.60 respectively. At the same time, we can know from the experimental results that the transformation effect on the gray image is relatively better than that on the color image. Fig. 4, the reversible adversarial examples (Fig. 4(d)) generated by the proposed method can still keep good visual quality, while there is basically no visual difference between the camouflage image and the target image.

Figure 4: (a) Original image. (b) Target image, i.e. AE. (c) Transformed image. (d) Camouflage image, i.e. RAE.
FGSM DeepFool CW_L2
RAE Attack Success Rate 0.95 0.65 0.60
Table 3: Attack success rates of reversible adversarial examples obtained by reversible image transformation on ImageNet datasets. RAE: Reversible Adversarial Examples

The attack effect is affected to some extent by the difference in the amount of auxiliary information used to recover the original image and the block size of reversible image transformation. In our experiment, we set the block size to be 1*1. In general, the addition of adversarial disturbances has more effect on the image quality than the embedding of auxiliary information. We can reduce the influence of embedding on the reversible adversarial examples by increasing the amount of disturbance, i.e., it can be said that we can enhance the attack success rate of the generated reversible adversarial examples by increasing the disturbance amounts when generating the adversarial images.

5 Conclusion

In this work, we propose an efficient privacy-preserving image scheme to generate reversible adversarial examples based on reversible image transformation, which aims to make the illegal model misjudge and ensure that it has the quality of human eyes indistinguishable from the original image. Most importantly, original images can be recovered from reversible adversarial examples without any distortion. The experimental results show that the reversible adversarisl examples can provide an excellent attack efficiency to achieve desired privacy protection goals. Our further work includes reducing the amount of auxiliary information embedding and improving RDH methods to enhance reversible image transformation algorithm so that the attack success rate of reversible adversarial examples is further improved.


  • [1] Carlini, N., Wagner, D.: Towards evaluating the robustness of neural networks. In: 2017 IEEE Symposium on Security and Privacy (SP). pp. 39–57. IEEE (2017)
  • [2]

    Collobert, R., Weston, J.: A unified architecture for natural language processing: Deep neural networks with multitask learning. In: Proceedings of the 25th international conference on Machine learning. pp. 160–167. ACM (2008)

  • [3] Geoffrey, H., Li, D., Dong, Y., George, E.D., Mohamed, A.r.: Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups. IEEE Signal Processing Magazine 29(6), 82–97 (2012)
  • [4] Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. Computer Science (2014)
  • [5]

    He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 770–778 (2016)

  • [6] Hou, D., Qin, C., Yu, N., Zhang, W.: Reversible visual transformation via exploring the correlations within color images. Journal of Visual Communication and Image Representation 53, 134–145 (2018)
  • [7] Hou, D., Zhang, W., Yu, N.: Image camouflage by reversible image transformation. Journal of Visual Communication and Image Representation 40, 225–236 (2016)
  • [8]

    Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Advances in neural information processing systems. pp. 1097–1105 (2012)

  • [9] Kurakin, A., Goodfellow, I., Bengio, S.: Adversarial examples in the physical world. arXiv preprint arXiv:1607.02533 (2016)
  • [10]

    Li, B., Vorobeychik, Y.: Scalable optimization of randomized operational decisions in adversarial classification settings. In: Artificial Intelligence and Statistics. pp. 599–607 (2015)

  • [11] Moosavi-Dezfooli, S.M., Fawzi, A., Frossard, P.: Deepfool: a simple and accurate method to fool deep neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 2574–2582 (2016)
  • [12] Papernot, N., Faghri, F., Carlini, N., Goodfellow, I., Feinman, R., Kurakin, A., Xie, C., Sharma, Y., Brown, T., Roy, A., et al.: Technical report on the cleverhans v2. 1.0 adversarial examples library. arXiv preprint arXiv:1610.00768 (2016)
  • [13]

    Papernot, N., McDaniel, P., Jha, S., Fredrikson, M., Celik, Z.B., Swami, A.: The limitations of deep learning in adversarial settings. In: 2016 IEEE European Symposium on Security and Privacy (EuroS&P). pp. 372–387. IEEE (2016)

  • [14] Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 2818–2826 (2016)
  • [15] Xiao, C., Zhu, J.Y., Li, B., He, W., Liu, M., Song, D.: Spatially transformed adversarial examples. arXiv preprint arXiv:1801.02612 (2018)
  • [16] Zhang, W., Hu, X., Li, X., Yu, N.: Recursive histogram modification: establishing equivalency between reversible data hiding and lossless data compression. IEEE transactions on image processing 22(7), 2775–2785 (2013)
  • [17] Zhang, W., Wang, H., Hou, D., Yu, N.: Reversible data hiding in encrypted images by reversible image transformation. IEEE Transactions on multimedia 18(8), 1469–1479 (2016)
  • [18] Zhong, Z., Lei, M., Cao, D., Fan, J., Li, S.: Class-specific object proposals re-ranking for object detection in automatic driving. Neurocomputing 242, 187–194 (2017)