Efficiently Constructing Adversarial Examples by Feature Watermarking

With the increasing attentions of deep learning models, attacks are also upcoming for such models. For example, an attacker may carefully construct images in specific ways (also referred to as adversarial examples) aiming to mislead the deep learning models to output incorrect classification results. Similarly, many efforts are proposed to detect and mitigate adversarial examples, usually for certain dedicated attacks. In this paper, we propose a novel digital watermark based method to generate adversarial examples for deep learning models. Specifically, partial main features of the watermark image are embedded into the host image invisibly, aiming to tamper and damage the recognition capabilities of the deep learning models. We devise an efficient mechanism to select host images and watermark images, and utilize the improved discrete wavelet transform (DWT) based Patchwork watermarking algorithm and the modified discrete cosine transform (DCT) based Patchwork watermarking algorithm. The experimental results showed that our scheme is able to generate a large number of adversarial examples efficiently. In addition, we find that using the extracted features of the image as the watermark images, can increase the success rate of an attack under certain conditions with minimal changes to the host image. To ensure repeatability, reproducibility, and code sharing, the source code is available on GitHub

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

08/05/2020

Adv-watermark: A Novel Watermark Perturbation for Adversarial Examples

Recent research has demonstrated that adding some imperceptible perturba...
05/15/2019

War: Detecting adversarial examples by pre-processing input data

Deep neural networks (DNNs) have demonstrated their outstanding performa...
05/10/2021

Adversarial examples attack based on random warm restart mechanism and improved Nesterov momentum

The deep learning algorithm has achieved great success in the field of c...
07/30/2019

Impact of Adversarial Examples on Deep Learning Models for Biomedical Image Segmentation

Deep learning models, which are increasingly being used in the field of ...
11/17/2018

Classifiers Based on Deep Sparse Coding Architectures are Robust to Deep Learning Transferable Examples

Although deep learning has shown great success in recent years, research...
10/25/2018

Evading classifiers in discrete domains with provable optimality guarantees

Security-critical applications such as malware, fraud, or spam detection...
06/21/2019

Adversarial Examples to Fool Iris Recognition Systems

Adversarial examples have recently proven to be able to fool deep learni...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Deep learning is increasingly been used in image recognition for many different applications such as monitoring of crowds, surveillance at key installations, and clinical diagnosis. Similar to any technologies, there will be attempts to find vulnerabilities and weaknesses in the technologies in order to carry out malicious or nefarious activities. Szegedy et al. [szegedy2013intriguing] revealed that minor perturbation in the image could lead to incorrect identification results in deep learning models, and introduced the concept of adversarial examples. The latter referred to images that are purposefully crafted to result in misclassification of deep learning models, for example due to perturbations.

Since the seminal work of Szegedy et al. [szegedy2013intriguing], many other researchers have proposed different schemes to generate adversarial examples, such as those using gradient variations in deep learning models or elaborate perturbations on images to generate adversarial examples. Such approaches can also be categorized into white-box approaches and black-box approaches. Examples of white-box attack approaches include the fast gradient sign method (FGSM) [goodfellow2014explaining] and the DeepFool method [moosavi2016deepfool], both of which are based on gradient design. Some approaches (e.g., Jacobian-based saliency map attack (JSMA) method [wiyatno2018maximal]) attempt to reduce the number of modified pixels in an image with minimal interference to the entire image. Examples of black-box approaches include one-pixel attack and local search attack [narodytska2016simple], both of which do not require the attacker to know about the parameters and gradient changes in the targeted deep learning model.

Existing popular methods for generating adversarial examples generally focus on using gradient variations and crafty perturbations and ensure subtle perturbations by limiting their norm. We take a different approach in this paper, as described below:

  1. We present two novel and efficient methods for adversarial example generation based on improved watermarking algorithms. One uses the improved discrete wavelet transform (DWT) watermark algorithm, and the other applies the modified discrete cosine transform (DCT) watermark algorithm.

  2. We provide a set of embedding parameters with excellent performance through large amounts of experiments to generate adversarial examples. We also propose a scheme that uses the confidence of the host images to select watermark images, with the aim of making the experiment more efficient.

  3. We explore whether the effectiveness of our scheme can be enhanced if we use the features extracted from watermark as the watermark. The findings suggest that using features as watermarks can result in a higher success rate and reduce the perturbations of embedding.

2 Related Work

2.1 Adversarial Example Generation

Since Szegedy et al.[szegedy2013intriguing] proposed the concept of adversarial examples, this domain has attracted the attention of the research community, partly evidenced by the number of methods (e.g., such as those based on gradient) proposed to generate adversarial examples. For example, Goodfellow et al. [goodfellow2014explaining] suggested FGSM can be used as directed attacks and undirected attacks against deep learning models based on gradient. DeepFool [moosavi2016deepfool] was also utilized as an undirected attack, which reportedly results in fewer perturbations than FGSM. Narodytska et al. [narodytska2016simple] proposed the single pixel attack, which can mislead deep learning models by only changing one pixel on the image. The authors also improved their method as the local search attack, which works well when the total number of pixels of an image is larger.

There has also been the focus to generate adversarial examples [akhtar2018threat]. For example, Baluja et al. [baluja2017adversarial]

proposed the adversarial transformation networks (ATN) to promptly generate adversarial examples using self-supervisor forward propagation neural networks. Such a method can provide a variety of different types of adversarial examples, targeting deep learning models. In other words, there are two approaches to generate adversarial examples using ATN. The first approach is to generate only a perturbation by reconstructing ATN in some way, and the second approach refactors the input based on regularization (e.g., add noise signal).

Xiao et al. [xiao2018generating]

presented a model based on the generative adversarial network (GAN) to generate high-quality adversarial examples in a relatively short time. Simply, GAN is a model consisting of a generator and a discriminator, both of which are deep learning systems. In their approach, the function of the generator is to generate special adversarial examples, and the discriminator is to encourage the generated special adversarial example is indistinguishable with the original data. In other words, both examples generated by the generator together with the original data used to train the target model should be fed as input to the discriminator. Finally, the generator that successfully passes the test of the discriminator can produce the special adversarial examples that the target model training by the original data cannot rightly classify.

Furthermore, Deng et al. [deng2019generate] proposed a spatial transformed attack method based on attention, which can be used to search for significative areas to be targeted. Such areas can also be subjected to spatial transformation. The approach is generally attack-efficient while decreasing the perturbations. In another separate work, Qiu et al. [qiu2019semanticadv] proposed SemanticAdv to generate a new genre of semantically realistic adversarial examples through attribute-conditioned image editing. Their approach reportedly has a high success rate on both black-box attacks and white-box attacks, and can circumvent detection approaches based on pixels and attributes.

2.2 Applications of Digital Watermarking

Digital watermarking has also been applied to deep learning, such as protecting deep learning models (e.g., see the scheme of Zhang et al. [zhang2018protecting]). In another work, Le et al.[le2019adversarial] proposed a remote watermarking extraction scheme to ensure flexibility in watermarking-enabled deep learning model protection. Deep learning models can also be applied in the watermarking domain. For example, Quiring et al. [quiring2018adversarial] used deep learning model to extract digital watermarking in images without knowing the underpinning watermarking scheme, and Wen et al. [wen2019romark] proposed using deep learning technology to build a robust watermarking system.

Another research direction is to use digital watermarking to mislead the deep learning system. For example, Shafahi et al. [shafahi2018poison] proposed using watermarking to facilitate poisoning attacks against deep learning models. Although most of the adversarial example attacks using digital watermarking focused on images, it has been suggested that watermarking can also be utilized in other types of deep learning models. For example, Chen et al. [chen2020attacking] proposed a scheme to utilize digital watermarks in text, so that text recognition produces incorrect results.

Building on the existing literature, we explore the potential of generating adversarial examples based on the workings of deep learning models. The basic principle of the deep neural network model for recognizing images is to extract the features of the images layer by layer, and we think that the features in the images can be influenced in a certain way to interfere with the recognition of the deep learning model. Embedding digital watermark information into the image means that features that are difficult to recognize by human vision are somehow added to the image. We will introduce the work related to the watermarking algorithm in the next section.

2.3 Digital Watermarking Algorithms

Bender et al. [bender1996techniques] proposed a statistical method (i.e., Patchwork) to hide information. In Patchwork, two pixel point brightness values sets P and Q are selected in the image according to the key, and the digital watermark information is carried by increasing the brightness value of each pixel point in the P set by k and decreasing the brightness value of each pixel point in the Q set by k. However, using Patchwork to embed watermarks by changing the image’s statistical characteristics only hides a very small amount of information in the image (1 bit of information).

In order to increase the capacity of hidden information, Mei et al. [jiansheng2009digital] proposed a method based on discrete cosine transform and discrete wavelet transform for embedding image type watermark into the host image. In other words, their method transforms the host image with a discrete wavelet transform while using a two-dimensional discrete cosine transform to transform the watermark image. The result of the watermark image transformation is then added to the high-frequency component of the host image transformation, and the image is restored using an inverse discrete wavelet transform. Their method also utilizes the human visual system (HVS), since the human eye is sensitive to changes in smooth areas of the image, but not to changes in contour and insensitive to small changes such as edges, making it difficult to perceive embedded watermark information.

This motivates us to explore whether any other feature in HVS can be leveraged to change the selection of set P and set Q method; thus, improving the traditional Patchwork algorithm (where image type data can be embedded into the host image as the watermark). According to the work of Zhou et al. [zhou2009blind] and the equation for HVS’ sensitivity to RGB primary colors:

where it illustrates that human vision is very sensitive to the green component, but not particularly sensitive to the red and blue components. The effects of the components on vision are superimposed similarly to the effects of the green component. Therefore, we choose to use the green component as the set P of the Patchwork algorithm, and the red and blue components together as the set Q of the Patchwork algorithm. The improved watermarking algorithms can embed more information in the watermark image into the host image. In other words, we can embed the information of the watermark image into the host image for color images.

3 Proposed Scheme

In this section, we will describe our proposed scheme designed to generate adversarial examples using digital watermarking. Specifically, we use the improved Patchwork algorithm based on the discrete wavelet transform and the modified Patchwork algorithm based on discrete cosine transform respectively to generate adversarial examples. We control two adjustable parameters in the embedding process to embed the watermark image into the host image, and propose a simple search method to find suitable watermark images. We remark that our approach for generating adversarial examples focuses on black-box attacks.

We will now give three definitions, as follows:

Embedding strength (). The embedding strength is a combination of three RGB parameters that are in both discrete wavelet transform based Patchwork algorithm and the discrete cosine transform based Patchwork algorithm. The watermarking algorithm embeds the selected watermark image into the host image, and the embedding strength is simply the strength of the host caused by the embedding of the digital watermark extent to which the image changes.

Embedding frequency (). The number of times the watermark image is embedded into the host image.

Confidence ().

A measure of the probability or certainty that an image will be recognized as a certain class by a particular deep learning model.

3.1 Embedding Watermark Process

The improved discrete wavelet transform based Patchwork algorithm reads the host image and the watermark image, separates the RGB tricolor components of both two images, and operates the separated red component, the green component, and the blue component as follows. We also remark that the process of the modified discrete cosine transform based Patchwork algorithm is similar to the following steps of the improved discrete wavelet transform based Patchwork algorithm.

  1. Transform the red component, the green component, and the blue component respectively by discrete wavelet transform.

  2. Multiply the watermark information and and add the result to the discrete wavelet transformation matrix of the corresponding host image.

  3. Transform the red component, the green component, and the blue component of the host image that have watermark information respectively by inverse discrete wavelet transform.

  4. Finally, the red component, the green component, and the blue component are combined to get the image with embedded watermark information.

3.2 Choose Host and Watermark Images

In this part, we select the host image and the watermark image, and record data about these images. The entire selection process comprises the following three steps (see also Fig 1).

  1. In the first step, we randomly select the host images in the testing dataset and then use a specific deep learning model (e.g., VGG16) to identify these host images and record the classification results of the host images with each confidence level. We select correctly classified images as host images to demonstrate the validity of our scheme, as demonstrated below.

  2. In the next step, we need to choose a class from which we will select the watermark image in the next step. This class is denoted as .

  3. In the third and final step, we need to choose a class from which we will select the watermark image. We have devised a simple method to select . For a host image , we choose the second-highest confidence of all classes as . For example, the confidence for the host image is 90% for cars, 9% for aircraft, and 1% for the remaining categories combined. Then, the class of the aircraft will be selected as .

The above step is then repeated for all other selected host images.

Based on the working principle of deep learning model recognition of images, for images from of the host image, there are potentially more partial features that can be recognized as compared to other classes. Thus, the impact of embedding the images from on the model recognition process is greater than in other categories. Similarly, using higher confidence images as watermark images potentially result in a greater likelihood of causing the model to have classification errors.

Figure 1: Choose host images and

Recall in the third step, we select the appropriate watermark image for from . To improve the efficiency of selecting watermarked images, we design a mechanism to select watermark images quickly. We use a specific deep learning model to identify all the images in . After that, recording the confidence of each image, the images are sorted according to the confidence from largest to smallest. Finally, the ten images with the highest confidence are selected as the watermark images, (). In this step, we also do the same for the other host images as well. In addition, it is possible to select more than ten images or fewer than then ten images as watermark images. However, selecting too many images will result in a decrease in the efficiency of adversarial example generation, and a small number of watermark images will lead to a lower success rate for the attacks. Therefore, the choice of the number of watermark images depends on the actual demand or application context. The concrete step is shown as Fig 2.

Figure 2: Choose watermark images

3.3 Set Parameters and Embed Watermarks

In the second part, our goal is to set the appropriate embedding parameters and then embed the watermark image into the host image to generate the adversarial example. Before starting to embed the watermarked image into the host image through the watermarking algorithm, two parameters need to be set, namely: and .

To generate adversarial examples using the improved discrete wavelet transform based Patchwork algorithm, after setting , we embed the into using this watermarking algorithm. Specifically, we perform a total of ten rounds of embedding into , with the first round embedding 5 times into , the tenth round embeds 50 times in , and five intervals of embedding between rounds. We repeat this process for other selected host images and corresponding . In addition, the difference for the modified discrete cosine transform based Patchwork algorithm is that embedding 1 times into in the first round, and embedding 10 times in in the tenth round, and 1 interval of embedding between rounds. The specific algorithms for generating adversarial examples are shown in Algorithm 1 and Algorithm 2.

Input: host images , watermark images , , , total number of the host images , name of the host image , name of the watermark image , three-level discrete wavelet transform , one-level discrete wavelet transform operation , three-level inverse discrete wavelet transform operation , three-level components , and third low-frequency component
Output: watermarked images

1:set
2:for  do
3:     
4:     for  do
5:         read as
6:         resize
7:         read as
8:         resize
9:         separate , and channels of and
10:         
11:         
12:          of ()
13:          ()
14:          of ()
15:          ()
16:          of ()
17:          ()
18:          (, , ); 
19:         save ;      
20:return
Algorithm 1 Improved DWT Based Watermarking

Input: host images , watermark images , , , total number of the host images , name of the host image , name of the watermark image , two-dimensional discrete cosine transform operation . the two-dimensional inverse discrete cosine transform operation , result matrix of performing on , result matrix of performing on , and result matrix of
Output: watermarked images

1:set
2:for  do
3:     
4:     for  do
5:         read as
6:         resize
7:         read as
8:         resize
9:          (); 
10:          (); 
11:          +
12:          (); 
13:          (); 
14:          (); 
15:          +
16:          (); 
17:          (); 
18:          (); 
19:          +
20:          (); 
21:          (, , ); 
22:         save ;      
23:return
Algorithm 2 Modified DCT Based Watermarking

4 Experimental Setup and Results

To demonstrate that our solution is sufficiently fast and scalable in processing images in the dataset, we implement a prototype of our proposed scheme in this section and evaluate its performance. Prior to the commencement of recognition and embedding operations, we resize both the original images and the watermark images to adapt batch operation.

In the evaluation, we use a pre-trained model in Keras for image recognition. Findings of the evaluation on the Kaggle cat-dog dataset and the CIFAR-10 dataset for five popular deep learning models (i.e., VGG16, ResNet50, Xception, InceptionV3, and CNN) will be presented. The findings also include multiple single host images of the variation trend of

with increasing , and the success rate of our attacks in the Kaggle dataset and the CIFAR-10 on these five deep learning models on the dataset.

We will also present the results for adversarial example generation of DWT based method and DCT based method respectively in the Kaggle dataset and the CIFAR-10 dataset in Sections 4.1 and subsection:Results on DCT Based Method, respectively. Additionally, based on the findings of the extensive experiments we performed, we found that fixing and then adjusting is an efficient way and yields a higher success rate.

4.1 Results on DWT Based Method

We set in the discrete wavelet transform based Patchwork algorithm to generate the adversarial examples as

Then, we test the watermarking method for generating adversarial examples based on DWT in the Kaggle dataset and the CIFAR-10 dataset (see Sections 4.1.1 and 4.1.2).

4.1.1 Kaggle Dataset

On the ResNet50 model, we choose six successful attacks as examples (i.e., three cat and three dog images) to show the variation trend of with an increase of on the single host image. The results are shown in Fig 3 and Fig 5. The watermark images are selected according to the processes in Fig 1 and Fig 2. Since there are two classes in this dataset, when of the host image is less than 50%, it indicates that an error in model identification has occurred.

Figure 3: The variation trend in of the three host images with using DWT based method on ResNet50 in the Kaggle dataset (take host images of cats as examples).

We use the blue polyline in Fig 3 and its corresponding host images with different in Fig 4 as an example. From the coordinates of the points in Fig 3, the blue polyline corresponds to the host image which is embedded 5 and 10 times by the watermark image that results in errors for ResNet50.

Figure 4: The results of the single image after embedding the watermark image with different times correspond to the blue polyline in Fig 3.

In addition, we choose the red polyline in Fig 5 and its corresponding results of the host images in Fig 6 to visualize the variation trend. We can see from the coordinates of the points in Fig 5 that the red polyline corresponds to the host image that is embedded 20, 25, 30, 35, 40, 45 and 50 times by the watermark image leads the erroneous recognition in ResNet50.

Figure 5: The variation trend in of the three host images with using DWT based method on ResNet50 in the Kaggle dataset (take host images of dogs as examples).
Figure 6: The results of the single image after embedding the watermark image with different times correspond to the red polyline in Fig 5.

We also evaluate our method on VGG16, InceptionV3 and Xception in Kaggle dataset, whose findings are shown in Fig 7 and Table LABEL:tab1. The results from using the Kaggle dataset illustrates that the success rate of adversarial examples generation improves with an increase in . We also know from the results that the total success rate of the attack is highest for ResNet50 and lowest for VGG16.

Figure 7: The success rate variation trend in the Kaggle dataset with different models applying DWT based method.

The total success rate here means if the host image embedded with the watermark image has misled the deep learning model at least once during ten rounds, we will record it as a successful example. We observe from Fig 3 that embedding the watermark image into the host image for more rounds may lead to failure in generating the adversarial example (e.g., the blue polyline and the red polyline in Fig 3). That is also the reason that the total success rate is higher than the tenth round’s success rate.

4.1.2 CIFAR-10 Dataset

We now present the evaluation findings of RestNet50, VGG16 and CNN using the CIFAR-10 dataset. The results of the single images in the CIFAR-10 dataset are shown in Fig 8 and Fig 10. For the variation trend of the single images, we select two successful examples to display in CIFAR-10 (one is from ResNet50 and the other is from VGG16). The watermark images are selected according to the processes in Fig 1 and Fig 2.

Figure 8: The variation trend in of the host image with using DWT based method on ResNet50 in the CIFAR-10 dataset.

For ResNet50, the result of recognizing the host image in Fig 9 as the second class is . This means that the host image is identified as the second class (automobile) by ResNet50. Additionally, the tenth class (truck) with the is for this host image on ResNet50.

Figure 9: The results of the single image after embedding the watermark image with different times respond to Fig 8.

The variation trend of of the host image with on ResNet50 is shown in Fig 8. It can be seen from Fig 8 that when of the second class (automobile) of the host image is no longer the maximum among all classes, it indicates that the attack is successful. Thus, the host image which is embedded 15, 20, 25 and 30 times by the watermark image are successful examples. The corresponding images are shown in Fig 9.

We will also describe the result of a single host image on VGG16 in the CIFAR-10 dataset below. The host image is recognized as the ninth class (ship) with of 97.1%. Thus, the host image is classified as the ninth class and the of which is the first class (airplane) with of 1.3%.

The variation trend of the of the host image with on VGG16 is shown in Fig 10. When of the eighth class (ship) of the host image is not the highest among ten categories, this implies the success of the attack. Therefore, the host images embedded 30, 35, 40, 45 and 50 times by the watermark image are successful examples (see Fig 11).

Figure 10: The variation trend in of a host image with using DWT based method on VGG16 in the CIFAR-10 dataset.
Figure 11: The results of the single image after embedding the watermark image with different times respond Fig 10.

We test the variation trend of the success rate with different models using the DWT based method, and the corresponding total success rate using the CIFAR-10 dataset. The experimental results of the DWT based method in the CIFAR-10 are shown in Fig 12 and Table LABEL:tab1.

Figure 12: The success rate variation trend in the CIFAR-10 dataset with different models applying DWT based method.
48.5% 38.1% 42.3% 41.0% -
72.4% 70.3% - - 88.2%
Table 1: Total success rate in the Kaggle and the CIFAR-10 datasets (DWT)

The results in the CIFAR-10 indicate that, similar to the results in the Kaggle dataset, the success rate of generating adversarial examples goes up as increases, and the total success rate of the attack is highest for CNN and lowest for VGG16.

4.2 Results on DCT Based Method

We set in the discrete cosine transform based Patchwork algorithm for adversarial example generation as

We test the DCT based watermarking method for adversarial examples generation using both Kaggle and CIFAR-10 datasets. There are ten classes in the CIFAR-10 dataset. When of the host image is not the highest among ten classes, the deep learning model outputs false identification results.

Our test results of the single images in the CIFAR-10 dataset echo those of the Kaggle dataset. Thus, we show only the variation trend of the success rate with different models of the DCT based method, and the corresponding total success rate next.

4.2.1 Kaggle Dataset

The results in the Kaggle dataset in Fig 13 and Table LABEL:tab2 illustrate that the success rate of adversarial examples generation increases as increases. We also know from the results that the total success rate of the attack is highest for ResNet50 and lowest for VGG16.

Figure 13: The success rate variation trend in the Kaggle dataset with different models applying DCT based method.
Figure 14: The success rate variation trend in the CIFAR-10 dataset with different models applying DCT based method.

4.2.2 CIFAR-10 Dataset

The results in the CIFAR-10 in Fig 14 and Table LABEL:tab2 indicate that the success rate of generating adversarial examples increases with . The total success rate of the attack is also shown to be highest for CNN and lowest for VGG16.

34.0% 17.6% 20.4% 30.1% -
39.0% 36.8% - - 69.9%
Table 2: Total success rate in the Kaggle and the CIFAR-10 datasets (DCT)

5 Analysis and Evaluation

5.1 DWT and DCT Methods Comparison

Through extensive experiments, we determined that the DCT based Patchwork watermarking algorithm generates candidate adversarial examples at a fast pace and requires fewer iterations to embed the watermark image into the host image. In contrast, the DWT based Patchwork watermarking algorithm is slower in generating adversarial examples, and the successful generation of adversarial examples requires more time in the number of watermark images embedded into the host images. Specifically, the DCT and the DWT algorithm produce a candidate image on an average of 0.01 to 0.10 seconds and 0.50 to 2.00 seconds, respectively.

However, the success rate of the DWT based watermarking algorithm is significantly higher in the datasets we tested on the corresponding deep learning model than on the DCT watermarking algorithm. Also, for some combinations of the host image and the watermark image, the use of the DCT based watermarking algorithm did not result in the successful generation of adversarial examples. In the case of using the same host image and the watermark image, the DWT based watermarking algorithm can successfully generate the adversarial example.

Therefore, we can combine these two algorithms into one system. Specifically, we first use the DCT based watermarking algorithm to generate an adversarial example, and if that fails, we will use the DWT based watermarking algorithm to generate the adversarial example. This new combined system will improve the overall success rate of our scheme (i.e., generating adversarial examples), at the expense of time and computing overheads.

Figure 15: Extract feature from the watermark image.
17.6% 61.6%
14.7% 34.8%
Table 3: Total success rate on VGG16 in Kaggle by embedding the images and the features with different ratio of RGB based on the DCT based watermarking
Figure 16: Results of embedding the image and the feature with .
Figure 17: Results of embedding the image and the feature with .

5.2 Enhancement by Embedding Features

Based on a large number of experiments, we also found that in the DWT based watermarking algorithm and the DCT based watermarking algorithm, when the proportion of three RGB parameters in the is (, ), the success rate of generating adversarial examples is relatively high, and has relatively little effect on the original host image.

In order to enhance the success rate of the DCT based watermarking algorithm, we attempted to adjust in the algorithm. We set in the DCT based watermarking algorithm as

We tested the new settings of the DCT based watermarking algorithm on VGG16 using the Kaggle dataset (since from the evaluation findings, our designed attack has the weakest performance for VGG16).

The effect due to embedding of the watermark image into the host image is reported in Fig 15 (where we used functions in Keras to extract the features from the middle layer of deep learning models). From Table LABEL:tab3, we observe that the success rate becomes higher than before when we only change . However, the embedding is less effective than before in Fig 16. To enhance our scheme, we considered embedding the feature images of the watermark images into the host images. This is because while feature images have less information in total, they have more useful information. This can ensure the success of our attacks while reducing the effect on the host images when changing in Fig 17 (these four examples in Fig 16 and Fig 17 embedded the watermark image ten times).

In Table LABEL:tab3 and Fig 17, we can see that the results of embedding the feature images into the host images are clearly better than the original experiments in the above section (i.e., the improved success rate of our attack, even though changes to the image are minimized).

In summary, the key point of our method is to adjust the balance among , and the type of embedding, in order to maintain the high success rate and effectiveness of watermarking algorithms in generating adversarial examples. One potential future research is to explore other alternative parameters or parameter settings.

6 Conclusion

In this paper, we presented a novel approach for generating adversarial examples. The approach utilizes the discrete wavelet transform based Patchwork watermarking algorithm and discrete cosine transform based Patchwork watermarking algorithm to embed watermark images into the host images to generate adversarial examples. Such an approach has applications in many real-world settings, such as adversarial scenarios such as battlefields. Specifically, in our work, we designed a simple scheme to filter the watermark images and a corresponding algorithm to embed the watermark images into the host images.

The experimental results showed that our approach can generate adversarial examples quickly and effectively. The highest success rate achieved using this discrete wavelet based watermarking method is on the Kaggle and the CIFAR-10 datasets, respectively at 48.5% and 88.2%. The discrete cosine based watermarking approach has the highest success rate of 34.0% and 69.9% on the Kaggle dataset and the CIFAR-10 dataset, respectively.

We also examined the results of embedding the host image effect relative to the original scheme after replacing the watermarked image with its feature image. Hence, future research includes extending our approach to generating adversarial examples for other content such as audio and video.

References