Distortion Agnostic Deep Watermarking

01/14/2020 ∙ by Xiyang Luo, et al. ∙ 17

Watermarking is the process of embedding information into an image that can survive under distortions, while requiring the encoded image to have little or no perceptual difference from the original image. Recently, deep learning-based methods achieved impressive results in both visual quality and message payload under a wide variety of image distortions. However, these methods all require differentiable models for the image distortions at training time, and may generalize poorly to unknown distortions. This is undesirable since the types of distortions applied to watermarked images are usually unknown and non-differentiable. In this paper, we propose a new framework for distortion-agnostic watermarking, where the image distortion is not explicitly modeled during training. Instead, the robustness of our system comes from two sources: adversarial training and channel coding. Compared to training on a fixed set of distortions and noise levels, our method achieves comparable or better results on distortions available during training, and better performance on unknown distortions.



There are no comments yet.


page 1

page 2

page 3

page 7

page 8

page 12

page 13

page 14

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Digital watermarking [cox2002digital] is the task of embedding information into an image in a visually imperceptible fashion, where the message can be reliably extracted under image distortions. There are two key factors to measure the performance of a digital watermarking system, imperceptibility and robustness. Given an image and a message, a good watermarking system produces an encoded image that is nearly identical to the original image, while carrying a message payload that will survive under a variety of distortions such as cropping, blurrying, or JPEG compression. Traditional approaches found creative ways of hiding information in texture rich areas [bender1996techniques]

or the frequency domain


. More recently, convolutional neural networks (CNNs) have been used to provide an end-to-end solution to the watermarking problem. Zhu

et al. [zhu2018hidden] proposed HiDDeN, a unified framework for digital watermarking and image steganography.

Figure 1: Bit accuracy of our model compared to models trained with explicit image distortions. Each column corresponds to a type of image distortion at test time, and each row corresponds to the image distortion used to train the watermarking model (with the exception of our model which requires no distortion model). The left half of the columns (separated by the black line) are known distortions, i.e., distortions included in training for the HiDDeN combined model [zhu2018hidden], and the right half of the columns unknown distortions, i.e., a held-out set of commonly used distortions not used to train the HiDDeN combined model. See Section 4.1 for more details.
Figure 2: Example of original image, encoded image and difference between the two images from our model.
Figure 3: Overview of proposed architecture. The input message is first fed through the channel encoder to produce a redundant message , which is then combined with the input image to generate the encoded image by the watermark encoder . The decoder produces a decoded message , where it is further processed by the channel decoder to produce the final message . The attack network generates adversarial examples , which are fed to the image decoder to obtain . , , are trained on a combination of the image loss which includes both proximity to the cover image and perceptual quality as in Equation 1, the message loss as in Equation 2, and the message loss on the decoded adversarial message as in Equation 4. The attack network is trained to minimize the adversarial loss as in Equation 3. The training updates and the , in an alternating fashion.

Most CNN based watermarking methods explicitly model the image distortions during training. However, training on a specific image distortion can easily lead to overfitting, and generalize poorly across other types of distortions [zhu2018hidden]. This is undesirable since a practical watermarking system should be robust towards a wide variety of image distortions, not only the ones included in training. This can be mitigated by including a combination of distortions during training [zhu2018hidden], but it requires carefully tuning the type and magnitude of the distortions in order to reach a good performance. Moreover, the problem of poor generalization persists if the distortions at test time are far from training.

To this end, we propose a framework for adding robustness to a watermarking system without any prior knowledge on the type of image distortions during training. We achieve this by applying differentiable adversarial training with CNN generated perturbations, and using channel coding to inject redundancy in the encoded message. To our knowledge, this is the first paper explore distortion agnostic methods for deep watermarking. Empirically, our model achieves comparable performance on distortions known during training, and generalization better to unknown distortions.

Our main contributions are the following.

  • We apply adversarial training to improve model robustness in a distortion agnostic fashion. In particular, our CNN generated adversarial examples implicitly incorporates a rich collection of image distortions that co-adapt with training.

  • We propose augmenting the watermarking system with channel coding, adding an additional layer of robustness through channel redundancy.

  • We combine the two ideas above and achieve comparable results to models trained with explicit distortion, and better generalization to unknown distortions.

2 Related Work

There are three main areas of research relevant to this work: watermarking, adversarial training, and channel coding. We give a brief review for each topic in the subsections below.

Figure 4: Visualization of adversarial examples generated by the attack network . Top: Encoded image . Bottom: Adversarial examples generated from the attack network . We observe a diverse set of image manipulations generated from the attack network, consisting of a combination of blur, color change, and other types of distortions.

2.1 Watermarking

Digital watermarking [cox2002digital, katzenbeisser2000digital, bender1996techniques, jiansheng2009digital, o1997rotation, singh2013survey, tanaka1990embedding, hamidi2015blind] have been an active research area with many important applications such as content copyright protection. More recently, deep learning based approaches have been applied to train an end-to-end watermarking system [zhu2018hidden, mun2017robust, ahmadi2018redmark, liu2019novel, zhang2019steganogan] with impressive results. HiDDeN [zhu2018hidden] was one of the first deep learning solutions for image watermarking. RedMark [ahmadi2018redmark]

introduced residual connections with a strength factor for embedding binary images in the transform domain. Deep watermarking has since been generalized to both video 

[weng2019high, zhang2019robust] and audio [tegendal2019watermarking]. Modeling more complex and realistic image distortions also broadened the scope in terms of application [wengrowski2019light, tancik2019stegastamp].

There are several works that applied attacks to the encoded image when training the watermarking system. Mun et al. [mun2017robust] iteratively simulated attacks to the watermarking system. RedMark [ahmadi2018redmark] introduced an attack layer which consists of random combinations of a fixed set of distortions. However, these attacks are not adversarial since they do not adapt with the watermarking model during training. Recently, ROMark [wen2019romark] applied a simple form of adversarial training where the distortion type and distortion strength are adaptively selected to minimize the decoding accuracy.

One key distinction of our method from the above is that we do not generate our attacks from a fixed pool of common distortions. Instead, the adversarial examples are generated from a trained CNN. This also has the benefit that the watermarking training is end-to-end differentiable, which is not true for ROMark [wen2019romark].

2.2 Adversarial Training

Deep neural networks are susceptible to certain tiny perturbations in the input space. Since the discovery of adversarial examples by Szegedy et al. [szegedy2013intriguing], a variety of methods have been proposed for both adversarial attack [athalye2018obfuscated, kurakin2016adversarial, papernot2017practical], and adversarial defense [goodfellow2014explaining, hosseini2017blocking, shafahi2019adversarial, xie2017mitigating]. One of the earliest and most effective defense mechanism against adversarial attacks is adversarial training [goodfellow2014explaining], but is computationally expensive on large datasets. Many attempts have since been made to reduce the cost of adversarial training, e.g., using approximations to the optimization step [shafahi2019adversarial], or using generative models in place of iterative optimization [baluja2017adversarial, lee2017generative].

2.3 Channel coding

Channel coding is a mechanism for detecting and correcting errors during signal transmission [bossert1999channel]. Shannon’s capacity theorem [shannon1948mathematical] gives the theoretical limit to transfer data through a noisy channel, and channel coding is designed to approach this limit. In implementation, various classical methods such as the Reed-Solomon (RS) codes [wicker1999reed], low-density parity-check (LDPC) codes [ryan2004introduction], turbo codes [sklar1997primer], and polar codes [trifonov2012efficient], have been widely applied in the field of telecommunication. More recently, learning based solutions have gained attention in this field as well [aoudia2019model, choi2018necst, fritschek2019deep].

3 Proposed Method

3.1 Motivation

In designing a general purpose watermarking model, the distortions at test time could be any image manipulation that still preserves some image content. A typical solution would involve identifying a set of representative distortions, and applying a carefully tuned combination of distortions during training.

Motivated by the recent success of using CNNs to perform various image manipulation tasks, e.g., style transfer [gatys2016image], HDRNet [li2019hdrnet], we propose automating the distortion tuning process by training a CNN to generate distortions that exploits the weakest link in the current watermarking model. Figure 4 shows some samples of distorted images generated by our attack CNN, which contain a rich and complex combination of distortions.

The use of channel coding is motivated by the idea of injecting extra redundancy to the system. Shannon’s capacity theorem tells us that redundancy is necessary in order to achieve robustness. In the HiDDeN architecture, spatially repeating the input message is an example of adding redundancy. Channel coding simply provides another alternative on top of the current methods.

3.2 Method Overview

Figure 3 gives an overview of our overall architecture. Our method adds two key components on top of the watermarking encoder / decoder networks and in  [zhu2018hidden]:

  • We replace the distored image with , where is an adversarial example generated from a convolutional neural network trained to maximize the message loss.

  • We replace the input message with a longer binary message generated from channel coding.

3.3 Adversarial Training

Adversarial training generates distortions that co-adapt with the training of our watermarking model, actively strengthening the weakest point of the current model. Adversarial training was first introduced by Goodfellow et al. [goodfellow2014explaining] as a method to defend against adversarial attacks. In our context, adversarial training equates to minimizing the message loss given the worst-case distortion in an -ball. This is expressed as the following min-max problem,


where are the model parameters for watermarking encoder/decoder networks , , and is the input message. Here we consider the norm to constrain , the perturbation to the encoded image . But more semantically meaningful measures such as distance on VGG [simonyan2014very] activations could also be used.

A direct optimization of Equation 1 is both computationally expensive and overly restrictive for the watermarking model. Instead, we relax Equation 1 by restricting the set of distortions to be generated from some class of convolutional neural network .


Using CNN generated adversarial examples have the benefit of retaining the ability to generate a diverse set of image distortions, as shown in Figure 4. An alternative is to generate the adversarial samples via the Fast Gradient Sign Method (FGSM) as in [goodfellow2014explaining]. But we found this yielded less diverse examples compared to CNN generated examples, and resulted in poorer overall robustness against distortion.

To train the attack network , we minimize the following adversarial training loss:


where is the adversarial example, is the message loss which we set as the loss in this paper, and , are the scalar weights. controls the strength of the distortion generated by the attack network , while controls the strength of the message loss for .

For the network , we use a two-layer CNN,


In general, we find that finding the right balance of attack strength, controlled by the complexity of and the ratio between and , is important for training. An overly strong attack results in slow training and a failure of the watermarking network to adapt to the adversarial examples, while an overly simple attack results in less robustness of the trained model. A detailed analysis can be found in Section 4.3.

3.4 Channel coding

Figure 5: Illustration of channel coding. Given an input message , the channel encoder produces a redundant message of longer length. The redundant message is transmitted through a noisy channel and received by the decoder as . Finally the decoder recovers the input from the corrupted message .

Channel coding provides an additional layer of robustness through injecting redundancy to the system. Given a binary message of length , a channel encoder produces a redundant message of length , which can be used to recover through the channel decoder given reasonable amounts of channel distortion to , as shown in Figure 5.

In this paper, we generate a channel code from the input message , before passing to the watermarking encoder as shown in Figure 3. The channel distortions in this context are the errors from the watermarking model, between and

. Given that we do not explicitly model the image distortions, it is impossible to know the true channel distortion model. Instead, we use a binary symmetric channel (BSC) to approximate the channel distortion. BSC is a standard channel model which assumes each bit is independently and randomly flipped with probability

. Even though this assumption is not strictly satisfied in our case, we find using BSC works well in this application.

Figure 6: Channel noise strength versus decoder bit accuracy for various redundant message lengths. The input message length is fixed at , where the redundant message length is varied from to . All models are trained on random binary input with BSC noise. The training noise level is uniformly sampled from , and at test time.

Conceptually, any standard error correcting code such as low-density parity-check (LDPC) codes 

[ryan2004introduction] can be used to generate

. However, traditional codes such as LDPC require the decoder to have an estimate of the channel noise strength, which is impractical in our application since the noise strength can vary greatly from image to image. Therefore, we use NECST 

[choi2018necst], a learning based solution for joint source and channel coding to cover a broad range of channel distortion strengths. We use BSC for training the channel model, where the input message is randomly sampled, and the channel noise strength is chosen from the interval uniformly at random. Figure 6 shows the bit accuracy of the NECST model on a range of BSC channel noise.

We emphasize here that the channel coding model is not jointly trained with the rest of the watermarking model. This decoupling prevents the channel models from co-adapting with the image models during training, which results in overfitting and less robustness across a wide spectrum of image distortions.

3.5 Watermarking Training and Losses

We give a detailed description of the algorithms for training the watermarking models. We first define the training losses, using the same notations as in Figure 3.

Image loss


Message loss


Attack network training loss


Watermarking training loss


The image loss in Equation 5 consists of an loss, and a GAN loss with spectral normalization [miyato2018spectral] to control the perceptual quality of the encoded image. This is similar to the adversarial loss defined in the HiDDeN network [zhu2018hidden]. For the message loss , we use the loss between the decoded message and input. Equation 7 defines the loss used to train the attack network . Finally, Equation 8 defines the overall loss for training and . The various s are the weights for each loss. Training alternates between updating the attack network and the watermarking networks , detailed in Algorithm 1.

1:procedure Watermarking Train
Input: , .
Output: Trained networks .
Training Variables: .
2:     while Step  do
3:         Compute
4:         for i = 1 to  do
5:              Compute
6:              Update          
7:         Update
8:         Update      
Algorithm 1 Watermarking Training

4 Experiments

For comparison, we train two versions of HiDDeN [zhu2018hidden] as the baseline, one without image distortion which we name the identity model, and another trained on a combination of standard image distortions which we name the combined model. We note here that our methodology is agnostic to the specific architecture of the watermarking networks. We use the original HiDDeN architecture throughout the experiments since it is a well studied model and a commonly used benchmark, but other architectures such as RedMark [ahmadi2018redmark] could also be used as well.

We compare the bit accuracy on distortions seen during training and those that have not, and also report the peak signal-to-noise ratio (PSNR) of the encoded images. All models are trained and evaluated on the MS COCO dataset 

[lin2014microsoft] resized to , where a random selection of 3000 images are used for evaluation. Unless otherwise stated, we use for the encoded message size, and for the redundant message size. For the watermarking networks, we use the same architecture as used in HiDDeN, with the exception that the embedded message size is instead of due to the increased message length from channel coding. Detailed training parameters can be found in the supplementary materials.

Figure 7: Comparison of our model with HiDDeN identity model and combined noise model for different types of image distortions.
Model Identity Combined Ours
RGB- PSNR 40.3 32.3 33.7
Y- PSNR 47.5 34.2 35.7
U- PSNR 44.5 39.7 40.7
V- PSNR 43.1 39.5 40.1
Table 1: Comparison of encoded image quality. The PSNR values in both RGB and YUV are reported for our model, as well as the HiDDeN identity and combined distortion model.

4.1 Comparison with HiDDeN

We compare our method with both the HiDDeN identity and combined models. For the combined model, we use JPEG (), dropout (), crop (), Gaussian blur (), where the JPEG distortion is approximated by the differentiable JPEG function [shin2017jpeg]. We also compare with specialized models trained only on a single type of distortion with the noise levels in Figure 1. For a fair comparison, we adjust to obtain a slightly higher PSNR compared to the combined model, as shown in Table 1.

Figure 1 shows the bit accuracy of our model and those trained with explicit image distortion. Each row corresponds to a different watermarking model, and each column a specific type of distortion applied at evaluation time. The top five rows (specialized models) clearly show a diagonal pattern, indicating poor generalization to other types of image distortions. From the bottom two rows, we see that both the combined model and our adversarially trained model are robust to distortions used to train the combined model (first five columns).

Figure 7 gives a more comprehensive comparison across a range of distortion levels. Our model reaches comparable performance on crop and dropout, outperforms the combined model on JPEG, and underperforms on Gaussian blur. For small distortion strengths, our accuracy is nearly identical to the combined model. On all noise levels and distortion types, we outperform the identity model by a wide margin.

In terms of visual quality, our model is less prone to small artifacts in flat regions of an image. A qualitative comparison can be found in Figure 11.

4.2 Generalization to Unknown Distortions

A practical watermarking system must be robust to a wide range of image distortions, not just the distortions seen during training. Therefore, we compare the performance of our model and the combined model on a held-out set of commonly used image distortions. To attain better coverage, we choose six types of distortions, i.e., saturation, hue, resize, Gaussian noise, salt and pepper noise, GIF encoding from four broad categories: color adjustment, pixel-wise noise, geometric transformations, and compression. For each type of distortion, we evaluate the models on three different values of distortion strengths. Figure 8 gives a visualization of the additional distortions. We choose the range of distortion strength strong enough to differentiate the performance between different models, but in a regime where the distorted image still resembles the original.

Figure 8: Visualization of additional image distortions.
Method Identity Combined Ours
Gaussian Noise (0.06) 74.6 93.5 95.6
Gaussian Noise (0.08) 67.7 87.2 93.5
Gaussian Noise (0.10) 63.2 80.4 89.5
Salt and Pepper (0.05) 99.1 97.2 95.7
Salt and Pepper (0.10) 93.1 89.4 85.0
Salt and Pepper (0.15) 83.4 79.6 77.1
Adjust Hue (0.2) 65.1 70.8 94.0
Adjust Hue (0.4) 34.0 45.3 70.7
Adjust Hue (0.6) 18.1 28.8 42.4
Adjust Saturation (5.0) 96.3 98.1 99.9
Adjust Saturation (10.0) 94.8 96.0 99.6
Adjust Saturation (15.0) 93.4 94.2 98.5
GIF (64) 87.1 96.5 97.6
GIF (32) 76.8 93.4 95.7
GIF (16) 65.0 88.6 91.7
Resize Width (0.9) 99.3 99.7 99.9
Resize Width (0.7) 85.3 84.9 88.4
Resize Width (0.5) 66.5 67.3 67.1
Average 78.37 84.26 88.30
Table 2: Comparison of our model with HiDDeN identity model and combined noise model on additional image distortions. We report the bit accuracy of our model, the HiDDeN combined and identity model. When computing the average, results lower than are truncated to since they are no better than random chance.
Figure 9: Samples of encoded and cover images for the watermarking algorithm. First row: Cover image with no embedded message. Second row: Encoded image from HiDDeN combined distortion model. Third row: Encoded images from our model. Fourth row: Normalized difference of the encoded image and cover image for the HiDDeN combined model. Fifth row: Normalized difference for our model.

Table 2

reports the bit accuracy of our model on these additional distortions. Overall, our model performs better on the unknown distortions, especially on the category of color change. We also note that the overall variance of bit accuracy across distortions is less compared to both the identity and combined model, indicating a more stable performance across different types of distortions. Furthermore, we see that the performance gap of the combined model and the identity model shrinks on these unknown distortions, which aligns with the intuition that generalization issue persists even when training with a combination of distortions.

4.3 Detailed Analysis

4.3.1 Ablation Study

Table 3 reports the individual effect of channel coding and adversarial training. We see that adversarial training contributes to a large portion of the model robustness, while channel coding further boosts performance in terms of accuracy. Table 3 also shows that channel coding alone does not provide enough robustness without a robust watermarking model. However, combined with adversarial training channel coding further boosts the performance of the watermarking system especially if the bit accuracy is already high.

JPEG (Q=50) Crop (p=0.09) Blur (=1.0) Dropout (p=0.3)
Identity 50.2 53.0 59.6 81.3
Channel 51.3 60.5 50.2 90.3
Adv. 85.0 90.6 86.2 95.0
Both 81.7 93.5 92.8 97.9
Table 3: Model ablation study. We report the bit accuracy for models trained with only channel coding, only adversarial training, both, and the identity model. For models trained with only adversarial training, the input message length to the watermarking model is instead of .

4.3.2 Attack Complexity

We study the effect of varying the complexity and architecture of the attack network . On top of adjusting the network size and depth, we also consider two variants of the attack network: the residual network (Res) where we add a skip connection from the input, and a capped network (Capped) where we limit the maximum pixel difference by setting . We also report the results from the fast gradient sign method (FGSM) for completeness.

JPEG (Q=50) Dropout (p=0.3) Blur (=1.0) Acc. (adv.)
Conv (3,16) 81.7 97.9 92.8 90.6
Conv (3,32) 80.5 98.0 84.9 78.7
Conv (3,32,32) 75.0 95.3 81.5 72.0
Res (3,16) 84.5 96.3 86.3 95.0
Capped (0.03) 57.3 93.9 77.3 96.5
Capped (0.06) 53.2 94.6 78.1 99.5
FGSM 50.1 86.2 50.1 98.0
Table 4: Performance when varying attack network complexity. Each row corresponds to models trained with a different configuration of attack network. The first three columns show the bit accuracy on various image distortions. The last column shows the bit accuracy on adversarial message .

From Table 4, we observe that the bit accuracy on the adversarial example decreases as the attack network complexity increases, causing a slight degradation in the final result. Capping the attack network yielded poor results on JPEG and Gaussian blur, indicating that this approach over-restricts the attack network. The residual network yielded very similar performance to the regular convolutional model, slightly underperforming on Gaussian blur. Finally, FGSM yielded poor results on all of the distortions, since the image networks quickly overfits to this specific type of distortion.

5 Conclusion

We propose a distortion agnostic watermarking method that does not explicitly model the image distortion at training time. Our method consists of two core components, adversarial training and channel coding, to improve the robustness of our system. Compared with conventional methods of improving model robustness, our methods do not require the explicit modeling of the image distortions at training time. Through empirical evaluations, we validate that our model reaches comparable performance to the combined distortion model on distortions seen during training, and better generalization to unseen distortions. In future work, we would like to improve upon our current methodology to further increase model robustness, and explore deeper the connections between watermarking and adversarial attacks.


6 Appendix

6.1 Training Details

We list the hyper-parameters used for training the watermarking model. For our model, we set , and . For the HiDDeN combined model and identity model, we set . The message size for our watermarking model is instead of , due to the addition of the channel coding layer. We use the same network architecture as in HiDDeN. Namely, the input image is first processed by 4 Conv-BN-ReLU blocks with 64 units per layer. This is then concatenated along the channel dimension with an

spatial repetition of the input message. The combined blocks are then passed to two additional Conv-BN-ReLU blocks to produce the encoded image. For the encoder, we symmetrically pad the input image and use ’VALID’ padding for all convolution operations to reduce boundary artifacts of the encoded image. The encoded image is clipped to

before passing to the decoder. The decoder consists of seven

Conv-BN-ReLU layers of size, where the last two layers have stride 2. A global pooling operation followed by a fully-connected layer is used to produce the decoded message.

For both our model and the combined model, the training warm-starts from a pre-trained HiDDeN identity model and stops at k iterations. We use ADAM with a learning rate of for all models.

For the channel model, we use a two fully connected layers with 512 units each, and train with BSC noise where the noise strength is uniformly sample from .

6.2 Encoded Image Samples

Figure 10: Samples of encoded image from HiDDeN and our model.
Figure 11: More samples of encoded image from HiDDeN and our model.

6.3 Adversarial Example Samples

Figure 12: Samples of adversarial examples generated by the attack network.