Log In Sign Up

Go Wide or Go Deep: Levering Watermarking Performance with Computational Cost for Specific Images

by   Zhaoyang Jia, et al.

Digital watermarking has been widely studied for the protection of intellectual property. Traditional watermarking schemes often design in a "wider" rule, which applies one general embedding mechanism to all images. But this will limit the scheme into a robustness-invisibility trade-off, where the improvements of robustness can only be achieved by the increase of embedding intensity thus causing the visual quality decay. However, a new scenario comes out at this stage that many businesses wish to give high level protection to specific valuable images, which requires high robustness and high visual quality at the same time. Such scenario makes the watermarking schemes should be designed in a "deeper" way which makes the embedding mechanism customized to specific images. To achieve so, we break the robustness-invisibility trade-off by introducing computation cost in, and propose a novel auto-decoder-like image-specified watermarking framework (ISMark). Based on ISMark, the strong robustness and high visual quality for specific images can be both achieved. In detail, we apply an optimization procedure (OPT) to replace the traditional embedding mechanism. Unlike existing schemes that embed watermarks using a learned encoder, OPT regards the cover image as the optimizable parameters to minimize the extraction error of the decoder, thus the features of each specified image can be effectively exploited to achieve superior performance. Extensive experiments indicate that ISMark outperforms the state-of-the-art methods by a large margin, which improves the average bit error rate by 4.64 (from 4.86


De-END: Decoder-driven Watermarking Network

With recent advances in machine learning, researchers are now able to so...

SABMIS: Sparse approximation based blind multi-image steganography scheme

Steganography is a technique of hiding secret data in some unsuspected c...

MBRS : Enhancing Robustness of DNN-based Watermarking by Mini-Batch of Real and Simulated JPEG Compression

Based on the powerful feature extraction ability of deep learning archit...

Adaptive Blind Watermarking Using Psychovisual Image Features

With the growth of editing and sharing images through the internet, the ...

A Robust Document Image Watermarking Scheme using Deep Neural Network

Watermarking is an important copyright protection technology which gener...

Enhanced Digital Halftoning via Weighted Sigma-Delta Modulation

In this paper, we study error diffusion techniques for digital halftonin...

1. Introduction

1.1. Digital Watermarking : Go Wide or Go deep?

Figure 1. We break the robustness-invisibility trade-off by introducing computational cost in, and the proposed image-specified watermarking framework (ISMark) can greatly lever watermarking performance with computational cost. In the right we show an example, where both the visual quality (PSNR) and the robustness (bit error rate, BER) are enhanced with the image-specified optimization. Best view the details of the images in zoom in.

Digital watermarking is an important technology for intellectual property protection. By embedding the identification information (i.e., a watermark) into the cover image, the ownership of the image can be confirmed. In this process, there are two basic requirements: 1) invisibility, which requires that no serious visual distortion occurs during watermark embedding so that the watermark will not be simply observed, and 2) robustness, which ensures the accuracy of watermark extracting even if the image is severely distorted.

Recently, a growing number of schemes have been proposed to embed watermarks using either handcrafted modules(Van Schyndel et al., 1994; Tirkel et al., 1995; Bao and Ma, 2005; Karybali and Berberidis, 2006; Nasir et al., 2007; Bi et al., 2007; Makbol and Khoo, 2013; Wang et al., 1998; Ma et al., 2021)

or learning-based neural networks

(Zhu et al., 2018; Liu et al., 2019; Jia et al., 2021; Fang et al., 2022) and have achieved impressive results. These schemes aim to ”go wide” to design an universal rule for watermarking. That is, they study on a set of images or possible attacks to design one general embedding mechanism for all images. But it will limit the designed watermarking scheme into a robustness-invisibility trade-off, since under these general embedding mechanisms, the improvement of robustness can only be achieved by the increase of embedding intensity, which causes the reduction of visual quality.

However, a new scenario comes out at this stage that many businesses wish to give high level protection to specific valuable images (e.g. carefully designed promotion pictures), which requires high robustness and high visual quality at the same time. Apparently, existing watermarking scheme cannot be well applied to such scenario due to the limitation of the robustness-invisibility trade-off.

To meet the needs of such scenarios, the watermarking schemes should ”go deep” to be customized to specific images. It means we should exploit image-specified features to design watermarks according to characteristic of each image to achieve better performance. To achieve so, we propose a novel image-specified watermarking framework (ISMark), which introduce the computation cost in as a new metric to break the robustness-invisibility trade-off of digital watermarking. As shown in Fig.1, with the proposed ISMark, both high invisibility and robustness can be achieved.

1.2. Image-Specified Watermarking Framework

Unlike previous schemes that use a general embedding mechanism to all images, ISMark exploits image-specified features to design watermarks for each image. In detail, we apply an optimization procedure (OPT) in embedding, which regards the cover image as optimizable parameters to minimize the extraction error of the decoder. With OPT embedding, we can enhance the watermarking performance by increasing the optimization steps, and as a result the robustness-invisibility can be both achieved with the increment of computation cost.

Based on the OPT embedding algorithm, ISMark adopts an auto-decoder-like structure instead of previous auto-encoder-like structure, as shown in Fig.2. In ISMark, the decoder could be any extraction mechanism, for example, decoders pre-trained in previous schemes (Zhu et al., 2018; Liu et al., 2019). It’s worth noting that a good decoder may result in better performance not only in extraction, but also in embedding. To fully exploit the potential of OPT embedding, we propose a decoder enhancing algorithm to finetune the pre-trained decoder.

Experiments show that ISMark achieves superior performance in watermarking, exceeding state-of-the-art scheme(Fang et al., 2022) by 4.64% (average bit error rate from to ) in terms of robustness and by 2.20dB (PSNR from dB to dB) in terms of invisibility. Compared with baseline(Liu et al., 2019), ISMark improves error rate from 3.95% to 0.11% and improves PSNR from 27.74dB to 37.85dB. In addition, ISMark can embed longer watermarks (bpp from to ) while keeps high performance (average bit error rate= and PSNR=dB).

In summary, our technical contributions in this paper are :

  1. We introduce computational cost in watermarking to lever the watermarking performance, and propose a novel auto-decoder-like image-specified framework. ISMark can effectively exploit the image-specified features to achieve high invisibility and robustness at the same time.

  2. We design an optimization-based embedding algorithm to embed robust, invisible and large-capacity watermarks.

  3. We propose a decoder enhancing algorithm to future enhance the extraction capability of the decoder.

  4. ISMark achieves the state-of-the-art performance on COCO dataset, with an average bit error rate of

    and PSNR of 34.70 dB.

Figure 2. Framework structure comparison. Previous methods adopt an auto-encoder-like structure and embed watermarks using an encoder, while our auto-decoder-like structure embed watermarks through optimization.

2. Related Works

Figure 3. An overview of the optimization-based(OPT) embedding algorithm (upper) and the correspondent extraction process (lower). OPT embedding aims to embed watermark into the cover image and generate the encoded image as output. Given the simulated noise layers and decoder , the encoded image is optimized to minimize the objective function Eq.7. The optimization is performed on each given cover image and watermark, resulting in full exploitation of image-specified features and high performance.

2.1. Traditional watermarking schemes

The watermarking technology is first researched by Ron van Schyndel in 1994(Van Schyndel et al., 1994)

. They propose to embed the watermark in the least significant bit (LSB) of the image pixels, which guarantees the invisibility but cannot survive from distortions. To improve the robustness, many algorithms have been proposed to embed the watermarks in the frequency domain. They focus on the specific transform coefficients of the frequency transform, such as DCT domain

(Fang et al., 2018; Kang et al., 2010), DFT domain(Hamidi et al., 2018) and wavelet domain(Bao and Ma, 2005; Bi et al., 2007; Makbol and Khoo, 2013; Wang et al., 1998). Recently, A symmetry-based watermark synchronization process has been proposed to resist local geometric distortions in spatial domain(Ma et al., 2021). These methods greatly enhance the robustness, but the performance is still limited by the shallow handcrafted features.

2.2. DNN-based watermarking schemes

In recent years, with the development of the deep learning algorithms, many DNN-based watermarking schemes have been proposed

(Zhu et al., 2018; Ahmadi et al., 2020; Liu et al., 2019; Zhang et al., 2021; Tancik et al., 2020). Zhu et al.(Zhu et al., 2018) propose a DNN-based auto-encoder-like architecture to jointly train the encoder and decoder with a noise layer, and then many training strategies were proposed to solve existing problems in such DNN-based watermarking schemes. Liu et al.(Liu et al., 2019) proposed a two-stage separable algorithm for robustness against arbitrary noises, and Jia et al.(Jia et al., 2021) use real and simulated JPEG compression in different mini-batch to enhance the robustness of JPEG compression. Recently, an encoded feature enhancement scheme is proposed by Fang et al.(Fang et al., 2022)

, which enhances the watermark signal in the Fourier-transform domain and greatly improves the robustness. However, all these schemes are limited by the robustness-invisibility trade-off, which cannot adapt to the scenarios where both high visual quality and robustness are required. It motivates us to go deep to exploit image-specified features to break the robustness-invisibility trade-off.

2.3. Learning-based Algorithm

With the development of deep learning, many learning-based algorithms have been proposed. Some of them are designed to solve classification problems like image classification(He et al., 2016; Liu et al., 2021) and semantic segmentation(Chen et al., 2018), while others are proposed to solve regression problems such as denoising(Zhang et al., 2017)

and super-resolution

(Dong et al., 2015). These learning-based schemes aim to fit a function by training a neural network on a large-scale dataset, and once the training is complete, the network can predict the result with a single feed-forward pass. Previous learning-based watermark embedding schemes(Zhu et al., 2018; Liu et al., 2019; Jia et al., 2021; Fang et al., 2022) aim to fit a function to map the cover image domain to the encoded image domain, so they can be categorised as the regression problems.

2.4. Optimization-based Algorithm

Unlike learning-based algorithms, optimization-based methods depend on iterative optimization to solve complexity problems and achieve excellent performance. Optimization-based methods are widely adopted in adversarial attack(Goodfellow et al., 2014; Kurakin et al., 2018; Chakraborty et al., 2021)

, where the adversary examples can be generated through optimization on the input images to manipulate the predictions of the DNN-based classifiers. And recently, optimization-based algorithms have achieved remarkable performance in image style transfer

(Gatys et al., 2015), novel view synthesis(Mildenhall et al., 2020), 3D shape modeling(Park et al., 2019) and image compression(Zhao et al., 2021), which shows their potential in deep learning. In this paper, we also introduce an optimization-based embedding algorithm for digital watermarking.

3. Methods

Aiming to break the robustness-invisibility trade-off and exploit the features of each specified image, we propose an image-specified watermarking framework (ISMark). The framework adopts an opti-
mization-based(OPT) embedding algorithm and an enhanced DNN-based decoder.

3.1. Optimization-based Embedding

Previous watermarking schemes model watermark embedding process as a feed-forward function to map the cover image and watermark to the encoded image :


For learning-based schemes, such functions are modeled by neural networks and then optimized by learning on large-scale datasets. It means that they have to solve a regression problem by learning from the distribution of the datasets, but the feature of each specified image is not fully exploited. To address such limitations, we regard embedding as an optimization process and propose a novel optimization-based (OPT) algorithm.

Given a decoder with sparameters , a cover image , and a watermark , the OPT embedding algorithm searches for the optimal encoded image in all possible images :


Here denotes the selection of all possible pixel values, the watermark is represented as a set of bits of length , and is the objective function. An overview of OPT embedding is depicted in Fig.3.

3.1.1. Watermark Extraction Process

OPT embedding is based on optimization on the extraction process, so we first define the extraction process of . In ISMark, is a set of neural networks with different parameters , each of which can extract a part of the watermark . The whole watermark can be written as , so the total extracting function can be formulated as :


where signal ”” denotes concatenation of bits.

In practice, an image may be distorted during transmission or processing, resulting in a noised image . So the decoder must extract robustly from to generate . We can simulate different kinds of distortions by a set of noise layers , so the noised images can be simulated by . With all above definitions, we can formulate the extraction process in OPT embedding as :


where denotes watermark extracted from image with noise layer . We will demonstrate our settings of noise layers in Section 4.1

3.1.2. Optimization Objective Function

The basic requirements of watermarking algorithms are robustness, invisibility and capacity. In OPT embedding, the capacity is determined by the number of decoder and the capacity of each decoder . The optimization objective is to maximize robustness and invisibility given a watermark capacity .

The robustness of watermark can be measured by the extraction error under different distortions. As discussed in Section 3.1.1, we simulate kinds of distortions with noise layers . For each noise layer , the extraction error is defined as the distance between the watermark and the extracted watermark , so the total extraction error can be defined as the average extraction error of these distortions:


The invisibility of watermark can be measured with the distance between the cover image and image :


So the optimization objective function is :


where and are the weight factors to balance the trade-off between robustness and invisibility. We can reformulate Eq.2 with for the final optimization objective:


3.1.3. Optimization Strategy

Since the decoder and noise layers are differential, we can optimize through standard back-propagation algorithm(Rumelhart et al., 1986). The detailed process is demonstrated in Algorithm 1. Given cover image and watermark , is initialised as to speed up the convergence (line 1). The objective function is computed by Eq.7 (line 3-8), and then back-propagation algorithm is adopted to compute the gradient and update (line 9). Following such steps, we update for iterations and generate as output (line 11). can be updated by SGD(Cherry et al., 1998) or more advanced momentum-based algorithms, for example, Adam(Kingma and Ba, 2014). For simplification, we only introduce SGD in Algorithm 1.

0:    The cover image, ;The watermark, ;
0:    The encoded image, ;
1:  ;
2:  for iterations do
4:        for in {1,2,…,m} do
5:              ;
6:              add to ;
7:        end for
8:        ;
9:         = - ;
10:  end for
12:  return  ;
Algorithm 1 OPT embedding. The number of iteration , and the optimization step length

are pre-defined hyperparameters.

is the number of the noise layers. The loss function

and and their weight factors and are demonstrated in Section 3.1.2.

3.2. Decoder Enhancing

As demonstrated above, the performance of OPT embedding is greatly related to the decoder . So the keypoint is: how to learn the parameters ?

A straightforward idea is following existing learning-based schemes such as (Zhu et al., 2018; Liu et al., 2019) to train in an auto-encoder-like structure. That is, we can directly adopt the pre-trained decoders in (Zhu et al., 2018; Liu et al., 2019) for OPT embedding. However, there are two problems when using such pre-trained decoders:

  1. The extraction capabilities are not powerful enough to fully exploit the potential of OPT embedding. In previous schemes, the watermark embedding methods are not robust enough so that the embedded watermark features cannot be well preserved after distortions. As a result, the decoders are trained to extract watermark with distorted watermark features, leading to the lack of capabilities to distinguish different watermarks thus causes a lower optimization upper bound. We visualize it by PCA(Pearson, 1901)(Fig.4 left) and result shows that the decoder pre-trained with noise layers cannot discriminate different watermarks well (e.g., blue points and brown points in the figure), which results in poor performance in OPT embedding. However, when we pre-train the decoder with complete information by removing the noise layers(Fig.4 middle), the decoder can better discriminate watermarks thus leads to slight performance improvement. But such a decoder is not trained for resist distortions, so it still suffers from the poor robustness.

  2. Mismatch between simulated noise layers and real scenario distortions. In OPT embedding, we optimize the extraction error against simulated noise layers, but a mismatch occurs when the simulated distortion differs from the real distortion. For example, we simulate JPEG compression with a differential function(Shin and Song, 2017), which differs from JPEG compression in real scenarios and leads to performance drop of the decoder. Typically, OPT embedding obtains strong robustness against the simulated JPEG (with bit error rate of 0%) but almost fail on real JPEG compression (with bit error rate of 31%).

In order to eliminate such limitations, a decoder enhancing manner is proposed to train a powerful decoder for OPT embedding. In the decoder enhancing scheme, OPT embedding is adopted to generate training data with complete watermark information, and the decoder is trained on the noised images distorted by real distortions for strong robustness. As a result, the effect of mismatch is partially removed, and the extraction capability of decoders is well enhanced. As shown in right of Fig.4, through decoder enhancing, the decoder can fully distinguish different watermarks and the performance is much better than using the pre-trained decoders.

Figure 4. Feature visualisation with PCA(Pearson, 1901). We embed three kinds of watermarks (shown in different colors) into 100 images from COCO dataset(Lin et al., 2014) through OPT embedding, and visualize the extracted features of different decoders to compare their discriminative capability. We find that the decoder pre-trained with noise layers(left) cannot discriminate different watermarks, which leads to worse extraction accuracy than pre-training without noise layer(middle). However, through decoder enhancing(right) the decoder can well distinguish watermarks and achieves the best performance.

We can denote the distortions in real scenarios as , so the extraction process in decoder enhancing can be formulated as :


And the loss function is defined as the extraction error :


The details of decoder enhancing are shown in Algorithm 2. is initialized as a pre-trained decoder (line 1), then is refined for iterations. In each training iteration, a batch of cover images and watermarks are given to generate the encoded images by OPT embedding with (line 3-4). is updated for sub-iterations to minimize loss function in Eq. 10 (line 6-9), in each sub-iteration the real scenario distortion is randomly selected from all distortions to generate the noised images.

Once the decoder enhancing manner finishes, the parameters will be fixed. Then for any given images, we can use the same decoder to embed and extract watermarks. It means such decoder enhancing algorithm will not increase the computational cost in embedding.

0:    All cover images in the dataset;An initial parameter ;
0:    The enhanced parameter ;
1:  Initialize ;
2:  for iterations do
3:        random select and ;
4:         OPT embedding ;
5:        for iterations do
6:              random select ;
7:              ;
8:              ;
9:              ;
10:        end for
11:  end for
12:  return  ;
Algorithm 2 Decoder enhancing algorithm. The number of iterations and the number of sub-iterations are pre-defined hyperparameters. The loss function are defined in Eq.10.
Figure 5. The testing results of our IS-HiDDeN model. We show the visual quality and the distortions by 8 images, which are randomly selected from COCO datasets. The cover image , the encoded images (generated by the combined-noise model), the noised image and the residual are displayed in the upper of figure. We show the testing results for both combined and specified

cases. Bit error rate (BER) and peak signal to noise ratio (PSNR) are displayed in the bottom of figure, and both are the average value measured on the test dataset.

4. Experiments

Model bpp () PSNR(dB) Average Crop Resize Gaussian Noise
BER 3.5% 16% 36% 64% 0.6 0.8 1.2 1.4 0.01 0.005 0.001
HiDDeN(Zhu et al., 2018) 1.83 27.81 6.59 7.4 2.5 2.1 2.0 1.8 2.2 1.8 1.8 2.8 2.1 1.7
TSDL(Liu et al., 2019) 1.83 27.74 3.95 6.2 0.3 0.2 0.1 0.2 0.1 0.1 0.1 1.3 0.3 0.1
IS-HiDDeN 1.83 37.68 0.15 2.0 0 0 0 0 0 0 0 0.05 0 0.02
IS-TSDL 1.83 37.85 0.11 1.5 0 0 0 0 0 0 0 0.2 0.03 0
LGDR(Ma et al., 2021) 3.91 32.60 12.76 33.8 22.9 12.9 9.6 6.3 4.6 3.8 3.8 7.2 7.6 7.3
EFE(Fang et al., 2022) 3.91 32.50 4.86 - - - - 1.6 1.6 0.9 1.3 9.5 6.7 7.2
IS-HiDDeN 3.91 34.60 0.18 2.9 0 0 0 0 0 0 0 0.1 0 0.1
IS-TSDL 3.91 34.70 0.22 2.4 0 0 0 0.1 0 0 0 0.7 0 0
Model bpp () Gaussian Blur Middle Blur Salt&Pepper JPEG
1 0.5 0.2 7 5 3 8% 5% 2% 50 70 90
HiDDeN(Zhu et al., 2018) 1.83 2.6 1.9 2.1 16.0 8.7 1.9 7.5 3.0 1.7 33 30 15
TSDL(Liu et al., 2019) 1.83 0.5 0.1 0.1 13.6 3.7 0.2 4.7 0.9 0.1 28 22 8
IS-HiDDeN 1.83 0 0 0 0.01 0 0 1.3 0.06 0 0.02 0.01 0.01
IS-TSDL 1.83 0 0 0 0 0 0 0.6 0.08 0 0.04 0.03 0.03
LGDR(Ma et al., 2021) 3.91 16.2 9.8 4.6 47.3 30.3 12.7 7.9 10.5 7.1 10.0 9.6 7.8
EFE(Fang et al., 2022) 3.91 7.9 9.6 1.2 6.0 5.5 4.7 3.1 3.0 2.7 8.5 6.3 5.0
IS-HiDDeN 3.91 0 0 0 0 0 0 1.2 0.1 0 0 0 0
IS-TSDL 3.91 0 0 0 0 0 0 1.3 0.3 0 0.3 0.1 0.1
Table 1. Comparison with SOTA. BER(%) and PSNR are shown in the table. EFE(Fang et al., 2022) are not tested on crop attack because it is not designed for such robustness.

4.1. Implementation Details

In OPT embedding, we optimize for iterations by default. We simulate 7 kinds of normal distortions as the noise layers , including crop, resize, Gaussian noise, Gaussian blur, middle blur, salt and pepper, and JPEG compression. Details of the definition and parameters of these noises are demonstrated in the supplementary material. An additive identity layer is also adopted, which sets no distortion on the encoded image. For the weight factors of the objective function Eq.7, we choose and . We use Adam algorithm(Kingma and Ba, 2014) with a learning rate of to update by default.

In decoder enhancing, the decoder is trained on 10,000 images from the ImageNet dataset

(Deng et al., 2009). Watermarks are sampled randomly at each bit. decoders are trained separably to save memory. We use Adam algorithm with the same learning rate at in pre-training ( for IS-HiDDeN and for IS-TSDL, as introduced in Section 4.3) to optimize the decoder. The batch size is set to 6. The decoder is optimized for iterations, each with sub-iterations, which takes about 15 days on one single NVIDIA RTX 2080Ti GPU. More training details can be found in the supplement material.

4.2. Evaluation

ISMark is evaluated on 500 images from COCO dataset(Lin et al., 2014). We use the bit error rate (BER) of the extracted watermarks to measure the robustness, use PSNR to measure the invisibility, and use bit per pixel (bpp) to measure the watermark capacity. The executing time is also measured for the OPT embedding, which is reported in Section 4.6.4.

4.3. Baseline

The baseline for comparison are HiDDeN(Zhu et al., 2018), TSDL(Liu et al., 2019), LGDR(Ma et al., 2021) and EFE(Fang et al., 2022)

. LGDR is the SOTA traditional watermarking scheme, and EFE is the SOTA learning-based scheme. HiDDeN and TSDL are two learning-based watermarking schemes and we train their decoders without noise layers for 300 epochs as the initial paramterers

of ISMark. According to the backbone of the decoder, we call models trained by our scheme as IS-HiDDeN and IS-TSDL correspondingly.

To verify the capability of ISMark against non-differential distortions, we also compare with ASL(Zhang et al., 2021), StegaStamp(Tancik et al., 2020) and MBRS(Jia et al., 2021) that achieve the state-of-the-art performance in terms of robustness of JPEG compression. We also adopt MBRS as backbone to train a IS-MBRS for comparison.

4.4. Quantitative Results

First we perform the quantitative results to evaluate the performance of our method. As demonstrated in Section 4.1, we use 7 kinds of distortions and an additive identity layer for both OPT embedding and decoder enhancing. Similar to (Liu et al., 2019), We train 8 specified models each adopting one noise layer to obtain specified robustness, and a combined model to adopt all 8 layers for robustness against all distortions. Fig.5 shows some examples of high-intensity noises with the robustness and invisibility of the IS-HiDDeN model.

As shown in Fig.5, we have achieved both high invisibility and strong robustness for all distortions. In particular, our model achieves BER with a average PSNR of dB for all specified noises, and achieves BER with PSNRdB for combined noises, which shows the high performance of our method. Beneficial from the decoder enhancing algorithm, our model gets strong robustness against non-differential JPEG Compression with BER= in combined case. And we also note that the robustness is little weaker for the crop attack and salt & pepper noise, with BER = and correspondingly. It is partly because these noises are independent to the pixel value of image thus the gradient computed on the distorted pixels are zero, which makes it difficult to optimize the extraction error. Still they are much stronger than the previous methods, as is shown in the following section.

We further perform an experiment to lever the watermarking performance by increasing computational cost. We test IS-TSDL under Gaussian noise () with optimization step , and for each we change the optimization step length to plot the PSNR-Accuracy curve. As shown in Fig.6, with the increment of , the watermarking performance is well enhanced. It means our method can achieve better performance by simply increasing the optimization step .

Figure 6. Levering watermarking performance with computational cost.

4.5. Comparison with SOTA

In this section, we perform a comparison experiments with the SOTA methods. We first compare ISMark with HiDDeN(Zhu et al., 2018), TSDL (Liu et al., 2019), LGDR(Ma et al., 2021) and EFE(Fang et al., 2022) under combined noises setting. When the optimization step is set to , while for it is set to since it is more difficult to embed larger watermarks.

As is shown in Table.1, our models exceeds SOTA in both robustness and invisibility by a large margin. We show an average improvement of 4.64% in terms of BER (from 4.86% of EFE to 0.22% of IS-TSDL) while the PSNR increases by approximately 2.20 dB (from 32.50 dB of EFE to 34.70dB of IS-TSDL). The improvement mainly comes from the fact that our method can exploit the image-specified features for watermark embedding, while baselines only utilize the statistic clues.

Model bpp () PSNR(dB) BER(%)
ASL(Zhang et al., 2021) 1.83 27.84 9.27
StegaStamp(Tancik et al., 2020) 0.63 32.36 0.24
MBRS(Jia et al., 2021) 3.91 36.50 0.0092
IS-HiDDeN 1.83 37.68 0.02
IS-TSDL 1.83 37.86 0.04
IS-MBRS 3.91 37.43 0.0031
Table 2. Comparison with SOTA under JPEG compression with quality factor .

We also compare with ASL(Zhang et al., 2021), StegaStamp(Tancik et al., 2020) and MBRS(Jia et al., 2021) under JPEG compression specified setting. As is shown in Table.2, our IS-HiDDeN and IS-TSDL can achieve better performance than ASL and StegaStamp but is a little weaker than MBRS in terms of robustness. However, our IS-MBRS that utilize the MBRS as decoder backbone even outperforms MBRS by 0.0061% in terms of BER and 0.97 dB in terms of PSNR, which shows the superiority of ISMark.

4.6. Ablation Study

In ISMark there are lots of hyper-parameters, so we perform an ablation study to show their effects. We study four factors : decoder enhancing algorithm, the sub-optimization iteration in decoder enhancing algorithm, the watermark capacity , and the computation cost.

Figure 7. Ablation study on the optimization iteration in decoder enhancing (left) and the watermark capacity (right).

4.6.1. Decoder enhancing algorithm

In decoder enhancing, we optimize the decoder initialized with pre-trained parameters . To see the effect, we first remove the decoder enhancing and test the performance using the pre-trained in OPT embedding. As is shown in Table.3, ISMark achieves stronger robustness with decoder enhancing, which illustrates the necessity to enhance the pre-trained parameters . We also train the decoder from scratch, and results show that a randomly initialed decoder can also achieve high robustness and invisibility, while training with pre-trained parameters achieves the best performance.

4.6.2. Sub-optimization iteration in decoder enhancing

In decoder enhancing, the decoder is optimized for sub-iterations for each batch. We change from 20 to 140 with an interval of 20, and plot the PSNR and average BER in the left of Fig.7. We find that with the increase of , the robustness first increases and then decreases, and the best performance is reached around . And there is no obvious change for PSNR, which varies from 37.74dB to 38.36dB. This means that the change in iteration number has no obvious effect on invisibility.

4.6.3. Watermark capacity

As demonstrated in Section 3.1.1, we can use a set of decoders to embed multiple watermarks, leading to different capacity . We test OPT embedding with capacity bits, and the results are shown in right of Fig.7. We find that our ISMark achieves high robustness and invisibility even with a large capacity of bits. Both metrics decrease as increases, because it is difficult to guarantee the performance of large-capacity watermarks.

w/o decoder enhancing 37.73 / 11.76% 36.10 / 10.26%
w/o pre-trained 37.68 / 0.47% 38.12 / 0.46%
baseline  37.68 / 0.15% 37.85 / 0.11%
Table 3. Ablation study of decoder enhancing algorithm. For each item we show PSNR(dB)/Average BER in the table.
k 50 100 150 200
embedding time(s) 2.9 5.2 7.9 10.6
extracting time(s) 0.005 0.005 0.005 0.005
Table 4. Cost time with different optimization steps in OPT embeding for IS-TSDL.

4.6.4. Computation cost

The computation cost (measured by runtime) can be controlled by changing the optimization steps in OPT embedding. We test the embedding time and extracting time for different in Table.4. We can find the embedding time is about to increase lineally with , while the extracting time is not influenced.

5. Limitations

While ISMark enjoys the high robustness under various digital editing methods like crop, resize and JPEG compression, how to obtain robustness against physical attacks (e.g, print or photograph) is still a problem. Many recent works(Liu et al., 2019; Fang et al., 2022) have focused on such robustness for learning-based watermarking schemes, and we will learn from these methods to further improve ISMark on physical attack robustness.

6. Conclusion

In this paper, we introduce computational cost in watermarking to lever the watermarking performance, and propose a novel auto-decoder-like image-specified watermarking framework. We regard watermark embedding as a optimization process, and propose an optimization-based embedding algorithm to fully exploit the image-specified features. With the proposed decoder enhancing algorithm, our method achieves the state-of-the-art performance in terms of robustness, invisibility and watermark capacity. We believe that ISMark can provide a new perspective for practical digital watermarking.


  • M. Ahmadi, A. Norouzi, N. Karimi, S. Samavi, and A. Emami (2020) ReDMark: framework for residual diffusion watermarking based on deep networks. Expert Systems with Applications. Cited by: §2.2.
  • P. Bao and X. Ma (2005)

    Image adaptive watermarking using wavelet domain singular value decomposition

    IEEE transactions on circuits and systems for video technology. Cited by: §1.1, §2.1.
  • N. Bi, Q. Sun, D. Huang, Z. Yang, and J. Huang (2007) Robust image watermarking based on multiband wavelets and empirical mode decomposition. IEEE Transactions on Image Processing. Cited by: §1.1, §2.1.
  • A. Chakraborty, M. Alam, V. Dey, A. Chattopadhyay, and D. Mukhopadhyay (2021) A survey on adversarial attacks and defences. CAAI Transactions on Intelligence Technology. Cited by: §2.4.
  • L. Chen, Y. Zhu, G. Papandreou, F. Schroff, and H. Adam (2018) Encoder-decoder with atrous separable convolution for semantic image segmentation. In

    Proceedings of the European conference on computer vision (ECCV)

    Cited by: §2.3.
  • J. M. Cherry, C. Adler, C. Ball, S. A. Chervitz, S. S. Dwight, E. T. Hester, Y. Jia, G. Juvik, T. Roe, M. Schroeder, et al. (1998) SGD: saccharomyces genome database. Nucleic acids research. Cited by: §3.1.3.
  • J. Deng, W. Dong, R. Socher, L. Li, K. Li, and L. Fei-Fei (2009) Imagenet: a large-scale hierarchical image database. In

    2009 IEEE conference on computer vision and pattern recognition

    Cited by: §4.1.
  • C. Dong, C. C. Loy, K. He, and X. Tang (2015) Image super-resolution using deep convolutional networks. IEEE transactions on pattern analysis and machine intelligence. Cited by: §2.3.
  • H. Fang, Z. Jia, H. Zhou, Z. Ma, and W. Zhang (2022) Encoded feature enhancement in watermarking network for distortion in real scenes. IEEE Transactions on Multimedia. Cited by: §1.1, §1.2, §2.2, §2.3, §4.3, §4.5, Table 1, §5.
  • H. Fang, W. Zhang, H. Zhou, H. Cui, and N. Yu (2018) Screen-shooting resilient watermarking. IEEE Transactions on Information Forensics and Security. Cited by: §2.1.
  • L. A. Gatys, A. S. Ecker, and M. Bethge (2015) A neural algorithm of artistic style. arXiv preprint arXiv:1508.06576. Cited by: §2.4.
  • I. J. Goodfellow, J. Shlens, and C. Szegedy (2014) Explaining and harnessing adversarial examples. arXiv preprint arXiv:1412.6572. Cited by: §2.4.
  • M. Hamidi, M. El Haziti, H. Cherifi, and M. El Hassouni (2018) Hybrid blind robust image watermarking technique based on dft-dct and arnold transform. Multimedia Tools and Applications. Cited by: §2.1.
  • K. He, X. Zhang, S. Ren, and J. Sun (2016) Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, Cited by: §2.3.
  • Z. Jia, H. Fang, and W. Zhang (2021) MBRS: enhancing robustness of dnn-based watermarking by mini-batch of real and simulated jpeg compression. In Proceedings of the 29th ACM International Conference on Multimedia, Cited by: §1.1, §2.2, §2.3, §4.3, §4.5, Table 2.
  • X. Kang, R. Yang, and J. Huang (2010) Geometric invariant audio watermarking based on an lcm feature. IEEE Transactions on Multimedia. Cited by: §2.1.
  • I. G. Karybali and K. Berberidis (2006) Efficient spatial image watermarking via new perceptual masking and blind detection schemes. IEEE Transactions on Information Forensics and security. Cited by: §1.1.
  • D. P. Kingma and J. Ba (2014) Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980. Cited by: §3.1.3, §4.1.
  • A. Kurakin, I. J. Goodfellow, and S. Bengio (2018) Adversarial examples in the physical world. In Artificial intelligence safety and security, Cited by: §2.4.
  • T. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollár, and C. L. Zitnick (2014) Microsoft coco: common objects in context. In European conference on computer vision, Cited by: Figure 4, §4.2.
  • Y. Liu, M. Guo, J. Zhang, Y. Zhu, and X. Xie (2019) A novel two-stage separable deep learning framework for practical blind watermarking. In Proceedings of the 27th ACM International Conference on Multimedia, Cited by: §1.1, §1.2, §1.2, §2.2, §2.3, §3.2, §4.3, §4.4, §4.5, Table 1, §5.
  • Z. Liu, Y. Lin, Y. Cao, H. Hu, Y. Wei, Z. Zhang, S. Lin, and B. Guo (2021) Swin transformer: hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Cited by: §2.3.
  • Z. Ma, W. Zhang, H. Fang, X. Dong, L. Geng, and N. Yu (2021) Local geometric distortions resilient watermarking scheme based on symmetry. IEEE Transactions on Circuits and Systems for Video Technology. Cited by: §1.1, §2.1, §4.3, §4.5, Table 1.
  • N. M. Makbol and B. E. Khoo (2013)

    Robust blind image watermarking scheme based on redundant discrete wavelet transform and singular value decomposition

    AEU-International Journal of Electronics and Communications. Cited by: §1.1, §2.1.
  • B. Mildenhall, P. P. Srinivasan, M. Tancik, J. T. Barron, R. Ramamoorthi, and R. Ng (2020) Nerf: representing scenes as neural radiance fields for view synthesis. In European conference on computer vision, Cited by: §2.4.
  • I. Nasir, Y. Weng, and J. Jiang (2007) A new robust watermarking scheme for color image in spatial domain. In 2007 Third International IEEE Conference on Signal-Image Technologies and Internet-Based System, Cited by: §1.1.
  • J. J. Park, P. Florence, J. Straub, R. Newcombe, and S. Lovegrove (2019) Deepsdf: learning continuous signed distance functions for shape representation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Cited by: §2.4.
  • K. Pearson (1901) LIII. on lines and planes of closest fit to systems of points in space. The London, Edinburgh, and Dublin philosophical magazine and journal of science. Cited by: Figure 4, item 1.
  • D. E. Rumelhart, G. E. Hinton, and R. J. Williams (1986) Learning representations by back-propagating errors. nature. Cited by: §3.1.3.
  • R. Shin and D. Song (2017) Jpeg-resistant adversarial images. In

    NIPS 2017 Workshop on Machine Learning and Computer Security

    Cited by: item 2.
  • M. Tancik, B. Mildenhall, and R. Ng (2020) Stegastamp: invisible hyperlinks in physical photographs. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Cited by: §2.2, §4.3, §4.5, Table 2.
  • A. Z. Tirkel, R. G. van Schyndel, and C. Osborne (1995) A two-dimensional digital watermark. In Dicta, Cited by: §1.1.
  • R. G. Van Schyndel, A. Z. Tirkel, and C. F. Osborne (1994) A digital watermark. In Proceedings of 1st international conference on image processing, Cited by: §1.1, §2.1.
  • H. M. Wang, P. Su, and C. J. Kuo (1998) Wavelet-based digital image watermarking. Optics Express. Cited by: §1.1, §2.1.
  • C. Zhang, A. Karjauv, P. Benz, and I. S. Kweon (2021) Towards robust deep hiding under non-differentiable distortions for practical blind watermarking. In Proceedings of the 29th ACM International Conference on Multimedia, Cited by: §2.2, §4.3, §4.5, Table 2.
  • K. Zhang, W. Zuo, Y. Chen, D. Meng, and L. Zhang (2017) Beyond a gaussian denoiser: residual learning of deep cnn for image denoising. IEEE transactions on image processing. Cited by: §2.3.
  • J. Zhao, B. Li, J. Li, R. Xiong, and Y. Lu (2021) A universal encoder rate distortion optimization framework for learned compression. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Cited by: §2.4.
  • J. Zhu, R. Kaplan, J. Johnson, and L. Fei-Fei (2018) Hidden: hiding data with deep networks. In Proceedings of the European conference on computer vision (ECCV), Cited by: §1.1, §1.2, §2.2, §2.3, §3.2, §4.3, §4.5, Table 1.