A Compact Neural Network-based Algorithm for Robust Image Watermarking

by   Hong-Bo Xu, et al.
Nankai University

Digital image watermarking seeks to protect the digital media information from unauthorized access, where the message is embedded into the digital image and extracted from it, even some noises or distortions are applied under various data processing including lossy image compression and interactive content editing. Traditional image watermarking solutions easily suffer from robustness when specified with some prior constraints, while recent deep learning-based watermarking methods could not tackle the information loss problem well under various separate pipelines of feature encoder and decoder. In this paper, we propose a novel digital image watermarking solution with a compact neural network, named Invertible Watermarking Network (IWN). Our IWN architecture is based on a single Invertible Neural Network (INN), this bijective propagation framework enables us to effectively solve the challenge of message embedding and extraction simultaneously, by taking them as a pair of inverse problems for each other and learning a stable invertible mapping. In order to enhance the robustness of our watermarking solution, we specifically introduce a simple but effective bit message normalization module to condense the bit message to be embedded, and a noise layer is designed to simulate various practical attacks under our IWN framework. Extensive experiments demonstrate the superiority of our solution under various distortions.


page 1

page 4

page 6

page 7

page 8

page 9


Robust watermarking with double detector-discriminator approach

In this paper we present a novel deep framework for a watermarking - a t...

Robust Watermarking Using Inverse Gradient Attention

Watermarking is the procedure of encoding desired information into an im...

CropDefender: deep watermark which is more convenient to train and more robust against cropping

Digital image watermarking, which is a technique for invisibly embedding...

FastStamp: Accelerating Neural Steganography and Digital Watermarking of Images on FPGAs

Steganography and digital watermarking are the tasks of hiding recoverab...

Robust Spatial-spread Deep Neural Image Watermarking

Watermarking is an operation of embedding an information into an image i...

MBRS : Enhancing Robustness of DNN-based Watermarking by Mini-Batch of Real and Simulated JPEG Compression

Based on the powerful feature extraction ability of deep learning archit...

FaceSigns: Semi-Fragile Neural Watermarks for Media Authentication and Countering Deepfakes

Deepfakes and manipulated media are becoming a prominent threat due to t...

I Introduction

With the rapid development of digital information processing technologies, various digital media content have been widely used in many areas. Because digital media is easy to propagate, copy and modify, how to protect the copyright of digital media has become a crucial but also practical problem. Digital watermarking aims to solve this problem by embedding extra information into the digital media and extracting such extra data for authorized access. Nowadays, digital watermarking has been widely used in many applications include broadcast monitoring [11], copy control [21], and device control [9]. In this paper, we focus on digital image watermarking. In particular, the expected image watermarking algorithm asks for embedding the message (i.e. watermark) into the cover image (i.e. the image requiring authorized access) to obtain the watermarked image. In addition, the image watermarking algorithm needs to recover the original message as much as possible from the watermarked image.

Although digital image watermarking has been widely studied in the academic community, it remains a challenging issue. There are three key factors to measure the performance of the digital image watermarking algorithm, namely the robustness, imperceptibility, and capacity. The robustness requires the message embedded into the image to survive under malicious and non-malicious attacks. The imperceptibility needs the watermarked image to be as identical as possible to the original one, and it emphasizes that when changing the original image it is negligible for people to detect such activity. The capacity refers to the amount of messages that can be embedded. Besides that, the security [42] and complexity [48] aspects are also considered under certain conditions in the expected watermarking scheme, although in many cases they are with much less priority. Moreover, these key factors are conflicted between each other, and it is impossible to satisfy these features simultaneously [57]. Existing applications of watermarking algorithms usually focus on some special features or intend to make a trade-off for the above-mentioned features. For instance, watermarking for copyright protection considers the better robustness, while watermarking for broadcast monitoring concerns a larger capacity. For most existing deep learning-based robust image watermarking systems [72, 56, 44], the better robustness and imperceptibility are more important, but how to make a trade-off to satisfy those conflict features of watermarking is still one of the main challenges in this research domain.

Fig. 1: Illustration of traditional Encoder-Decoder solutions and our compact architecture.

Traditional image watermarking techniques usually embed watermarks in the spatial domain or frequency domain 

[49]. For the spatial domain-based watermarking techniques, one of the advantages is the computational efficiency when directly changing pixel values of the image, while it easily suffers from robustness. On the contrary, frequency domain-based watermarking solutions would obtain higher robustness by manipulating frequency coefficients of the image in the frequency domain, and they are usually with higher computational complexity. The main drawback of traditional watermarking methods is that they are specified on some prior constraints or targets, making them difficult to be generalized for novel types of attacks [10]. This significantly constraints them in some limited applications. In recent years, deep neural networks have already been applied to digital image watermarking [32, 72, 60, 61, 56, 31, 66, 71]. Because of the strong representation abilities of deep neural networks, these approaches have achieved better robustness and imperceptibility than traditional methods. In addition, neural networks can be retrained to resist novel types of attacks, or to focus on particular features such as the robustness and imperceptibility without designing a new specialized sophisticated algorithm, enabling them possible to develop an adaptable and generalized framework for various watermarking applications [72]. However, most of them use the Encoder-Noiser-Decoder framework [72, 61, 56, 31, 66, 71], as shown in Fig. 1 (a). In general, this framework employs a separate encoder and decoder to embed and extract watermark respectively. This asks for careful construction for both message embedding and extraction, and the training of two separate neural networks needs complicated parameter tuning.

In this paper, we propose a novel digital image watermarking scheme named invertible watermarking network (IWN) using invertible neural network (INN). Inspired by that from the perspective of reversible image conversion (RIC), INN can alleviate the information loss problem better than classic neural network architecture [13], we thus consider watermark embedding and extracting as a pair of inverse problems, and we effectively solve them with INN. Different from existing Encoder-Decoder based deep watermarking networks, our compact IWN respectively applies watermark embedding and extracting in the forward and reverse process of INN sharing all network parameters, as shown in Fig. 1 (b). As already demonstrated that INN is an effective tool for embedding and extracting a large amount of information [43], our IWN achieves high imperceptibility benefited from the strictly invertible property of INN [3]. To enhance the robustness, we introduce a well-designed bit message normalization module and a noise layer in our system. The former also ensures that different lengths of the bit messages can be easily adapted with a high recovery accuracy in our IWN. With the noise layer which is used to simulate various attacks, the strong fitting ability of our IWN enables us to effectively learn the robustness against various practical distortions. Extensive experiments show that our method achieves better results than the most commonly used baseline. In addition, we are the first to introduce INN into the field of watermarking, and we hope to enlighten the follow-up research.

In summary, the main contributions of this paper are:

  • To our knowledge, we are the first to introduce invertible neural networks into digital watermarking, and we propose an invertible watermarking network (IWN) for robust and blind digital image watermarking.

  • We introduce a bit message normalization module for condensing the messages and a noise layer for simulating various attacks, respectively, with which the watermarking robustness is significantly improved.

  • We provide extensive experiments to demonstrate the superiority of our method under a variety of distortions.

The rest of this paper is organized as follows. We review the related work in Sec. II. The proposed method is described in Sec. III, followed by extensive experiments in Sec. IV. Finally, the conclusion and future work are given in Sec. V.

Ii Related Work

Since the terminology digital watermarking first appeared in [59], it has been an active research area [34, 14, 15] with many applications such as copyright protection and owner identification. Besides natural images, digital watermarking has also been used in other fields like medical image watermarking [26], video watermarking [5], dynamic software watermarking [45], 3D watermarking [22, 28], audio watermarking [41], neural network watermarking [62] and so on. In this paper, we focus on robust digital image watermarking and we briefly review two main research areas that are most relevant to our work, i.e. digital image watermarking and invertible neural networks, in this section.

Ii-a Digital Image Watermarking

Traditional digital image watermarking techniques usually embed messages in spatial domains or frequency domains [49]. In general, those methods of the spatial domain directly embed watermarks by manipulating bitstreams or pixel values [55, 37, 17]. Among them, Least-Significant-Bit (LSB) [59] is a representative work of this subcategory. However, it easily suffers from low capacity and sensitivity to various image processing attacks [10]. On the other hand, frequency domain-based watermarking techniques modify the frequency coefficients when embedding watermarks. Compared with the spatial domain-based watermarking methods, these solutions further improve the robustness, imperceptibility, capacity, fidelity, and security with the cost of higher computational complexity [16, 36]. In this class of watermarking methods, the commonly used frequency domains include Discrete Cosine Transform (DCT) domain [29]

, Discrete Fourier Transform (DFT) domain 

[27], Discrete Wavelet Transform (DWT) domain [25] and contourlet domain [6, 8]. For instance, Kang et al. [33] propose to embed the spread-spectrum watermark in the coefficients of the LL subband in the DWT domain, and Sadreazami et al. [52] design to embed the watermark in the contourlet domain. They observe the robustness against JPEG compression of the low frequency component in the wavelet domain and the contour component in the contourlet domain, respectively. This excellent idea of finding robust invariant under attacks is also utilized to resist geometric distortions including translation, rotation, and cropping [68, 47, 63, 58]. The main drawback of these traditional watermarking methods is that they are specified on some prior constraints or targets, making them difficult to be generalized for novel types of attacks [10]. In other words, these techniques can only handle some limited tasks.

Recently many researchers apply neural networks to digital image watermarking, and indeed some novel methods bring superior robustness and imperceptibility over traditional methods. For example, Kandi et al. [32]

first introduce convolutional neural networks (CNNs) to non-blind watermarking. Mun 

et al. [46] further propose a blind watermarking architecture based on CNN to embed and extract watermarks. Zhu et al. [72] propose an end-to-end neural network with adversarial training for both steganography and robust blind watermarking. ROMark [60] simplifies adversarial training by using a min-max formulation for robust optimization. After that, RedMark [2]

uses two Fully Convolutional Neural Networks (FCNs) with residual connections to embed watermarks in the frequency domain without adversarial training. Different from the dependent deep hiding methods (DDH) 

[61, 56, 31, 66, 71], which adapt the watermark to the original cover image, UDH [67] proposes a universal deep hiding method to embed the watermark independent of the cover image. These existing works have demonstrated a variety of neural network structures to effectively realize the message embedding and extraction, ensuring that the watermarked image and the cover image have little or even no perceptual differences.

For existing deep learning-based watermarking methods, the noise layer is usually introduced to the networks for dealing with various distortions. However, in order to train the entire network in an end-to-end manner, the noise layer must be differentiable. For non-differentiable distortions including JPEG compression, some methods [72, 2, 56] turn to simulate them with a differentiable approximation, allowing the network to be trained in an end-to-end style. In [44] and [61], some distortions are generated by a trained CNN instead of explicitly modeling distortions from a fixed pool during training, which is another way to deal with non-differentiable and hard modeled distortions. In addition, Liu et al. [39] design a redundant two-stage separable deep learning framework to address the problems in one-stage end-to-end training, such as image quality degradation and difficulty to simulate noise attacks using differentiable layers. Although many strategies have been proposed to deal with various distortions, how to ensure the robustness of the digital watermark in various situations is still a problem that needs to be solved well.

Besides the noise layer, most existing robust image watermarking systems based on deep learning use Encoder-Noiser-Decoder frameworks [72, 61, 56, 31, 66, 71]. Among them, the encoder embeds the watermark into the cover image in an imperceptible manner, and the decoder recovers the watermark message from the distorted watermarked image. This kind of architecture usually asks for sophisticated designing of both the encoder and decoder, resulting in much complex training with carefully tuning parameters. Different from those previous works where the encoder and decoder are two independent networks, we adopt a bijective INN for watermark embedding and extraction.

Ii-B Invertible Neural Network (INN)

In recent years, INN has attracted much attention because of their efficient inversion. INN is usually proposed for the flow-based generative model, where a stable invertible mapping is learned between the complex data distribution and a simple latent distribution . NICE [18] and RealNVP [19] propose the additive and the affine coupling layers, respectively. These coupling layers are the basic component of INN, which satisfy the requirements of efficient inversion and a tractable Jacobian determinant. In [23, 3] the explanation is specially explored for the invertibility. In [54], flexible INN is constructed with masked convolutions under some composition rules. An unbiased flow-based generative model is also introduced in [12]. Besides, Glow [35], FJORD [24], i-RevNet [30] and i-ResNet [7] achieve better generation results by continuously improving the network representation capacity.

Fig. 2: Overview of our IWN, which contains a bit message normalization module, an invertible neural network (INN) and a noise layer. The preprocessed bit message and the cover image are fed into the bijective INN for embedding to obtain the watermarked image . By introducing various noises and distortions, the noised image

is served as the input of the INN’s reverse mapping, the revealed message tensor

and the image are restored simultaneously. The watermark message is finally obtained through postprocessing.

In this context, INN has been used for a variety of challenging tasks due to their powerful fitting ability. For example, a conditional invertible neural network (cINN) is introduced for guided image generation [4]

, including MNIST digits generation and image colorization. cINN is also used for network-to-network translation 

[50] and image-to-video synthesis [20]. In addition, there are different solutions specified for image scaling [64], image compression [65]

, image or video super-resolution 

[73], image denoising [40], underexposed image enhancement [69] and image color adjustment [70]. As the latest work, Cheng et al. [13] also propose a generic framework for the reversible image conversion, namely IICNet, which aims to encode a series of input images into a single image and decode them. Particularly, Lu et al. [43] firstly introduce INN into large-capacity image steganography, where up to 5 images are successfully embedded into a host image with the same spatial and color resolution. These latest advances demonstrate that INN has great potential in data embedding and extraction. However, they all ignore the robustness issue that the embedded image may be manipulated under image compression and other distortion conditions. On the contrary, our approach focuses on solving this robustness challenge.

Iii Proposed Method

Iii-a Overview

Instead of employing a cascading Encoder-Noise-Decoder architecture that is widely used in existing methods, here we propose an invertible watermarking network IWN, where a bijective INN is used to embed and extract the message. As shown in Fig. 2, our compact IWN contains three components: 1) the invertible neural network, 2) bit message normalization module which includes the preprocessing and postprocessing sub-modules, and 3) noise layer . To efficiently represent the bit message, the bit message normalization module is used to convert the original bit sequence into a normalized tensor . After that, INN is used as our backbone for efficient message embedding and extraction. In this component, the forward process of INN takes the cover image and the preprocessed message as input, and it generates the watermarked image which is as similar as possible to the original image and which is lost information. The noise layer is then introduced to deal with various noises and distortions produced by practical image operations. Our solution combines different noises to the watermarked image and obtains the simulated noised image . To extract the watermark message, the noised image is fed into the reverse process of INN to generate the output message tensor and the revealed cover image . With the message postprocessing sub-module of bit message normalization, we finally extract the bit message sequence from . More details about the introduced notations are summarized in Tab. I.

Notation Description
Cover image
Watermarked image
Noised image produced by distortion simulation
Revealed cover image
Original bit message
Extracted bit message
Preprocessed message tensor for INN’s forward input
Revealed message tensor of INN’s reverse output
Lost information
Constant matrix for recovering message
TABLE I: Introduced notations.

Iii-B Invertible Neural Network (INN)

INN is powerful and effective in dealing with reversible problems, especially for data hiding and recovery [43]. Intuitively, watermarking is a special application of image steganography when a bit sequence message is taken as the hidden data. In this sense, the original bit message should be converted into a tensor with the same spatial resolution as the cover image by preprocessing sub-module of our bit message normalization. After that, we use the forward process of INN for message embedding and its reverse process for message extraction. As shown in Fig. 2, in the forward process, the message tensor and the cover image are served as inputs, and the corresponding outputs are the watermarked image and a matrix which is just to satisfy the structural consistency of INN and will not be used for reverse mapping. As for the reverse process of our INN, the noised image and a predefined constant matrix are fed in, then the revealed cover image and the message tensor are extracted. Finally, the message is obtained by the postprocessing sub-module of our bit message normalization.

As shown in Fig. 2, our INN consists of several invertible blocks with the same structure, and each block includes three sub-modules. INN contains and two branches, corresponding to the hidden message and the cover image, respectively. For the -th invertible block, the input is [ , ] and its output is [], where is the concatenation operator in channel dimensions. Formally, the forward process is calculated as follows:


where , and are convolution operations, is the Exponential function and is the Hadamard product. Accordingly, the reverse process in the -th invertible block is calculated as follows:


In other words, given = [], we can accurately calculate = [] according to Eq. (2). By cascading, given the reverse input [], our output [] can be solved. It is worth noticing that the three sub-modules , and , which contain the learnable parameters, appear both in the  Eq. (1) of forward process and the  Eq. (2) of reverse processe. That is to say, INN shares all parameters during its forward and reverse mapping operations. Benefiting from this architecture, INN performs stable and efficient inversion operations, which is exactly what we need in the watermarking task.

To optimize the network, we calculate the loss function for the above four items of outputs, respectively. One of our goals is that there is no visual difference between the watermarked image

and the original cover image , so we introduce the loss function to achieve that:


where is the combination of norm and norm. Similarly, we introduce to ensure that and are as close as possible:


In addition, we also add the constraints and for the matrix and the revealed cover image , respectively:


Finally, the total loss of our system is formulated as:


where , , , are the weights of the corresponding losses presented above. Note that under the Crop and Cropout attacks, we only calculate in the cropped region, and multiply the corresponding ratio of the origin shape to the cropped region shape. Please refer to Sec. III-D for more details on how we deal with the Crop and Cropout attacks.

Iii-C Bit Message Normalization

In general, there is a conflict between the robustness and the capacity of watermarking, i.e. , when embedding more information, the watermarking scheme is more vulnerable to attacks. Therefore, we introduce a novel bit message normalization module which contains preprocessing and postprocessing sub-modules to normalize the bit message in a simple but effective way, with which the robustness is significantly improved. Specifically, different with HiDDeN [72] that uses one channel to represent each bit of message, our bit message normalization module can represent many bits with just one channel. Moreover, this module allows us to flexibly adjust the watermarking capacity according to different practical applications, without changing the network architecture of our system. Here we introduce the details of our bit message normalization module.

Fig. 3: An example of our bit message normalization, where (a) and (b) are treated by the preprocessing and postprocessing sub-modules, respectively. Here the binary number is composed of 3 bits {0, 1, 0} from a message when . Through left shifting bits and adding offset , we obtain the decimal number to represent {0, 1, 0} in one channel. Through our postprocessing sub-module, the number of can be reversibly converted to 3 bits {0, 1, 0} accordingly.

Preprocessing. The main purposes of this sub-module are to improve robustness and convert a sequence of bits into a tensor for the use of convolutional networks, including the following two steps: bit transformation and broadcasting. For a bit message sequence of length , we divide it into groups. In each group, there exists bits, which is treated as a binary number. For the convenience of training, we convert the grouped binary numbers into the corresponding 8-bit integers aligned with the most common 8-bit color depth. In other words, we transform the grouped binary numbers into their corresponding decimal numbers. In order to encode the bit information into the highest bits for higher error tolerance rather than on lower bits positions, we specially left shift each binary number by bits. Then by treating the shifted binary numbers as 8-bit integers, we add an offset on them, ensuring that the mean value of all integers generated from a random bit sequence equals to 128, which is the median of color pixel values between 0 and 255. In Fig. 3 (a) we show an example for encoding 3 bits into one channel, i.e.  . In order to spread the watermark message over all image pixels, the message with shape is then broadcast to the input message as (), where and are the height and width of the cover image , respectively. Interestingly, the bit message processing method in HiDDeN [72] can be regarded as a special case of ours when .

Postprocessing. Our goal is to obtain a bit message sequence from the tensor (), so we need to get integers at first. Obviously, for the elements in each channel, we just need to map them to a single decimal number. As shown in Fig. 4 (a), the original data distribution of one channel from

presents a single peak state. To eliminate the interference of outliers for each channel, we convert all numbers to their corresponding nearest ground-truth values, which share

candidates as shown in Fig. 4 (b), and then we take the mode of converted numbers as the final extracted number. After that, the integers are converted to a bit message sequence according to the inverse process of preprocessing, as shown in Fig. 3 (b). Specifically, it includes offset subtraction, right shift, and binary conversion operations.

Fig. 4: Data distribution statistics for a single channel of the revealed message tensor when . (a) is the original data distribution in [0, 255], where the red line represents the ground-truth. (b) is the recovered result after quantizing them into the categories.

Iii-D Noise Layer

Fig. 5: Illustration of different noises. The first row is the watermarked image , the second row is the noised image . The third row is the magnified difference with .
Identity Crop Cropout Dropout Gaussian JPEG Combined
(p=0.035) (p=0.3) (p=0.3) (=2) (Q=50)
HiDDeN [72] 36.74 32.70 31.94 34.39 30.38 32.64 32.92
Ours 37.88 34.30 32.26 30.31 37.03 36.16 32.99
TABLE II: Objective comparison for HiDDeN [72] and our model trained with different noise layers. Here we list the average PSNR metric between the watermarked image and the cover image . The last column refers to the combined model for all noises, and the rest columns refer to specialized models for specific noises.
Fig. 6: Robustness comparison against different distortions. Each cluster corresponds to a special distortion. ’combined’ refers to training and testing on all 6 types of distortion, and ’specialized’ means training and testing on the specific distortion type.
Fig. 7: The accuracy of bit message recovery under five common distortions with various intensities. Here the compared Identity model is trained without noise, while the combined model is trained on all distortion types.
Fig. 8: Visual comparison on some watermarked images. The first row is the cover image, regarded as ground-truth (GT). The second and the fourth row are the watermark images generated by HiDDeN [72] and our method, respectively. The third and the fifth row are the difference between the watermark image and its GT, which is magnified for visualization.

In practice, watermarked images would suffer from various distortions during compression, transmission and interactive editing operations. The robustness against these different attacks (or noises), which may destroy the embedded watermarks in the real world, is one of the important issues for a digital image watermarking algorithm. In order to improve the robustness, here we introduce the noise layer to simulate various image distortions, including Identity, Crop, Cropout, Dropout, Gaussian and JPEG operations. See Fig. 5 for more details and examples. Specifically, as a component of IWN, we expect the proposed noise layer is also differentiable so that the whole network can be trained in an end-to-end style. Next, we will detailed discuss these different distortions in terms of whether they are differentiable or not.

Differentiable Noises. Most watermarking noises, including Identity, Crop, Cropout, Dropout, and Gaussian, are inherently differentiable, so we add them to our framework directly. For the Identity noise, we do not change the watermarked image at all. Crop refers to producing a rectangle by randomly cropping from the watermarked image, and the calculated percentage is introduced to control the remaining ratio of the watermarked image. Cropout means randomly replacing the rectangle of the cover image with counterpart of the watermarked image, and similarly, the ratio . Dropout also replaces some cover image pixels with the watermarked image pixels similar to Cropout, while the difference is that instead of replacing a whole area, the former randomly selects some pixels for replacement based on the remaining ratio . Finally, Gaussian blurs the watermarked image with a gaussian kernel of the given width .

Quantization. Watermarked images are being widely applied in storage and transmission, so they must be converted into commonly used image formats, such as the 8-bit RGB format (i.e. 

8 bits for each color channel). In the practical implementation, we need a differentiable quantization module to convert the floating-point values of the INN’s outputs to 8-bit unsigned integers. To ensure the gradients back propagation during training, here the rounding operation is used as the quantization module, and the Straight-Through Estimator 

[7] is adopted when calculating the gradients. In our solution, we combine this quantization noise in all our training and testing experiments, such that our watermarking system could effectively deal with quantization error.

JPEG compression. Similar to the quantization operator, the noise produced by JPEG compression is non-differentiable due to the quantization step in the compression framework. To solve the gradients back propagation problem during training, we follow [53] to simulate the quantization step in the standard JPEG compression through the following equation,


where is the rounding function and is the differentiable approximation of the rounding function which has non-zero derivatives nearly everywhere. Note that in our solution we use the real JPEG compression instead of constructing a JPEG simulator during testing. By transforming the non-differentiable part into an approximate representation that is derivable, we construct a completely differentiable noise layer, with which our IWN can be efficiently trained in an end-to-end way.

Iv Experiments

Iv-a Experimental Setup

We implement our IWN with PyTorch and train our model with the Adam Optimizer. The learning rate is set to 2e-4 and the batch size is 6. We use 16 invertible blocks for embedding a

bits message and the message is divided into groups. In addition, all elements of constant matrix are set to 0.5. Our IWN is trained on DIV2K [1] and Flickr2K [38]

datasets, and it is tested on a subset of ImageNet

[51] which contains 1000 images. All training images are cropped into 480480 patches and resized to 128128 during training, while the test images are all resized to 128128. Flipping and rotation are randomly used for data augmentation. The quality factor Q of the JPEG simulator is uniformly sampled from {50, 60, 70, 80, 90}.

In our system, the loss weights are specified as , , and . Besides, the weight of varies according to different training stages. For instance, when the noise layer is with the Identity layer, which means that no distortions are applied on the watermarked image, is set to 32. In other cases, is first set to 0.1 until the system is converged, and then is refined as 48.0 until convergence. In other words, we first train the robustness against various distortions when extracting the watermark, and then train the imperceptibility of our watermarking solution. All experiments are conducted on two Nvidia RTX 2080Ti GPUs.

Block number PSNR Identity Crop Cropout Dropout Gaussian JPEG
(dB) (p=0.035) (p=0.3) (p=0.3) (=2) (Q=50)
4 30.80 0.8621 0.7987 0.8181 0.6428 0.7523 0.5433
8 30.21 0.9762 0.8187 0.9143 0.7329 0.8449 0.6306
12 30.97 0.9949 0.8594 0.9655 0.7138 0.9364 0.6604
16 32.99 0.9994 0.8331 0.9471 0.7529 0.8611 0.7687
TABLE III: Ablation experiments for the number of the invertible blocks. We list the PSNR of the watermark images (the second column) and the accuracy of bit message recovery under 6 distortions (the last 6 columns).
& PSNR Identity Crop Cropout Dropout Gaussian JPEG
(dB) (p=0.035) (p=0.3) (p=0.3) (=2) (Q=50)
30 & 10 32.99 0.9994 0.8331 0.9471 0.7529 0.8611 0.7687
40 & 10 31.34 0.9187 0.7055 0.8211 0.6297 0.8087 0.7313
50 & 10 31.96 0.7494 0.6708 0.7342 0.6129 0.6879 0.6585
60 & 10 30.50 0.7212 0.6275 0.6749 0.5913 0.6256 0.6403
30 & 30 30.36 0.7470 0.7271 0.7415 0.6306 0.7455 0.5788
TABLE IV: Ablation experiments for the proposed bit message normalization module.

Iv-B Metrics

We evaluate our method mainly on robustness and imperceptibility which are more important than capacity for watermarking algorithm generally. Specifically, we measure the imperceptibility using peak signal-to-noise ratio (PSNR) between the cover image and watermarked image. And we measure robustness using bit accuracy, which is the percentage of identical bits between the original message

and the extracted message to total bits of the message.

Iv-C Comparison

We take HiDDeN [72] as the baseline method for comparison since it is a well studied model and a commonly used benchmark. We reconduct the experiments of HiDDeN [72]

with the open source code. Following HiDDeN 

[72], watermarked images are exposed to the following 6 distortions: Identity, Crop, Cropout, Dropout, Gaussian, and JPEG compression. We respectively control the intensity of distortions with the following scalars: the remaining ratio for Crop, Cropout and Dropout, the kernel width for Gaussian, and the quality factor Q for JPEG compression. These scalars are identical with those adopted in HiDDeN [72]

during testing. Specialized models are optimized to be resistant to specific distortions aforementioned, and the final combined model is trained to be robust against all kinds of distortions. In order to further improve the robustness against JPEG compression from the real world, the noise layer is randomly sampled from set {Identity, Crop, Cropout, Dropout, Gaussian, JPEG} with a probability distribution of {0.05, 0.05, 0.1, 0.15, 0.65} for each mini-batch during training the combined model. We report both the bit accuracy and PSNR when various distortions are applied on watermarked images. It is worth noticing that these two metrics may present conflicting evaluation results. For instance, a higher PSNR value usually means that the embedded message changes less information to the cover image, which makes it more difficult to accurately recover the bit sequence message, and the corresponding bit accuracy would decrease. For a well-designed watermarking algorithm, it should not be with a high PSNR but low bit accuracy or high bit accuracy but low PSNR. Therefore, we deliberately avoid this conflicting situation for the fair comparison.

Iv-C1 Quantitative Results

In Tab. II, it shows the PSNR metric between cover images and watermarked images produced by 6 specialized models against the corresponding noises and 1 combined model. Moreover, Fig. 6 illustrates the bit accuracy of different models against 6 distortions. In general, when compared with those specialized models, our 5 models, i.e. Identity, Crop, Cropout, Gaussian, and JPEG, have higher PSNR values. Meanwhile, our solution obtains higher bit accuracy than the baseline method. Especially, our method achieves both +3.52 dB gains than the baseline for the imperceptibility under JPEG compression, and 18.4% higher bit accuracy in terms of robustness. When comparing the combined model, we have almost the same performance for the PSNR metric, but the robustness of our algorithm is much better than the baseline against the Identity, Cropout, Gaussian, and JPEG compression distortions, among which 18.6% higher bit accuracy is achieved under the JPEG compression distortion. Besides, these two metrics, i.e. the PSNR and the bit accuracy, demonstrate that our method achieves a better balance than that of HiDDeN [72] between the imperceptibility and the robustness of watermarking.

Fig. 7 provides a more comprehensive comparison in the bit accuracy against various intensities of distortions. In general, our combined model achieves better performance than HiDDeN [72] for resisting distortions. Although our method fails in Dropout (p=0.3) according to Fig. 6, the curve of Dropout in Fig. 7 shows our method surpasses HiDDeN [72] when the remaining ratio . When comparing the identity model to combined model, we can see that the noise layer has obvious benefits for Gaussian and JPEG compression distortions, as the identity model generally fails under such two attacks.

Iv-C2 Qualitative Results

Fig. 8 provides visual comparison of watermarked images produced by combined models. Both the watermark images and the corresponding magnified differences compared to the original images demonstrate that the bit message is successfully embedded in the images in an imperceptible way. The watermark images generated by our model and HiDDeN [72] look very similar to the corresponding cover images, which is consistent with the PSNR metric reported in Tab. II.

Iv-D Ablation Study

Here we discuss how the hyper-parameters, the number of invertible blocks and the length of the bit message affect the performance of our method. Because our goal is to get a robust watermarking model, the experiments in this part are conducted on the combined model rather than any specialized one.

Firstly, we discuss the number of the invertible blocks in our INN module, and we report the performance in Tab. III. In general, our solution gets better performance when with more blocks. It is reasonable to understand, as more invertible blocks imply more trainable parameters. It is particularly worth noting that when the number of blocks changes from 12 to 16, the bit accuracy under JPEG compression significantly increases by 10% (see the last two rows of the last column of Tab. III), and the PSNR of the watermarked images has also increased by 2.02 dB. The bottleneck of our method is the robustness against JPEG compression, as when using 8 or 12 blocks, the bit accuracy is above 0.7 over other distortions except for JPEG compression. On the other hand, although the model of 16 invertible blocks does not always bring the best performance, it achieves the highest PSNR values and the best robustness against the JPEG compression. Thus, our final model chooses 16 invertible blocks.

Secondly, we verify the effectiveness of our bit message normalization module. To this end, we carry out different experiments by changing two variables, i.e. , the bit length of the embedded message , and the number for dividing the message into groups. The detailed experimental results are shown in Tab. IV, which includes two kind of well-trained models with as 10 and 30, respectively. To study the performance of our model fluctuated with different bit message length , we test the models with message groups when is set to 30, 40, 50, and 60, i.e. each channel represents 3, 4, 5 and 6 bits, respectively. The other experimental settings are the same as we mentioned in Sec. IV-A. Obviously, the bit accuracy decreases when the message becomes longer. This follows the conflict between the capacity and robustness of watermarking algorithm. We further carry out an experiment without the bit message normalization module. Specifically, we directly treat the binary bit message as float numbers (0.0 or 1.0) like HiDDeN [72], and both and

are thus set to 30. Without our bit message normalization, all the evaluation metrics drop dramatically, as shown in the last row of Tab. 


Iv-E Limitations and Future Work

Fig. 9: Limitations of our solution. The watermarked images produced by our method may contain some visual artifacts in the smooth region.

In order to resist the practical distortions especially introduced by JPEG compression, our watermarked images may generate some visual artifacts when there exist many smooth regions. In Fig. 9, the background (case 1) and the sky (case 2) presents this phenomenon, respectively. Embedding information in a smooth area is inherently difficult till now. This issue may be improved through embedding the watermark by paying more attention to the edges or rich texture areas of the image. Besides, introducing some smoothing loss for the watermarked image during training, such as the Fourier transform loss and structural similarity index measure (SSIM) loss, may also be helpful to alleviate this problem. We will further explore how to remove these visual artifacts in the future.

V Conclusion

In this paper, we have presented an invertible watermarking network (IWN) for robust blind digital image watermarking. Our compact IWN utilizes the invertible neural network (INN) to embed and extract the watermark message with an end-to-end training style. To promote the watermarking robustness against various practical distortions, we specifically introduce a noise layer to simulate various attacks. Moreover, we propose a simple but effective bit message normalization module to further enhance the watermarking robustness. Extensive experiments demonstrate the superiority of our method against the commonly used baseline. In the future, we will also explore the application of our framework in cross-media channels, such as printing and photographing, screen photographing, and audio-visual watermarking in other multimedia domains.


  • [1] E. Agustsson and R. Timofte (2017) Ntire challenge on single image super-resolution: methods and results. In IEEE Conf. Comput. Vis. Pattern Recog. Worksh., pp. 114–125. Cited by: §IV-A.
  • [2] M. Ahmadi, A. Norouzi, N. Karimi, S. Samavi, and A. Emami (2020) ReDMark: framework for residual diffusion watermarking based on deep networks. Expert Systems with Applications 146, pp. 113157. Cited by: §II-A, §II-A.
  • [3] L. Ardizzone, J. Kruse, C. Rother, and U. Köthe (2018) Analyzing inverse problems with invertible neural networks. In Int. Conf. Learn. Represent., Cited by: §I, §II-B.
  • [4] L. Ardizzone, C. Lüth, J. Kruse, C. Rother, and U. Köthe (2019) Guided image generation with conditional invertible neural networks. arXiv preprint arXiv:1907.02392. Cited by: §II-B.
  • [5] M. Asikuzzaman, M. J. Alam, A. J. Lambert, and M. R. Pickering (2014) Imperceptible and robust blind video watermarking using chrominance embedding: a set of approaches in the dt cwt domain. IEEE Trans. Inf. Forensics Security. 9 (9), pp. 1502–1517. Cited by: §II.
  • [6] P. Bao and X. Ma (2005)

    Image adaptive watermarking using wavelet domain singular value decomposition

    IEEE Trans. Circuits Syst. Video Technol. 15 (1), pp. 96–102. Cited by: §II-A.
  • [7] J. Behrmann, W. Grathwohl, R. T. Chen, D. Duvenaud, and J. Jacobsen (2019) Invertible residual networks. In ICML, pp. 573–582. Cited by: §II-B, §III-D.
  • [8] N. Bi, Q. Sun, D. Huang, Z. Yang, and J. Huang (2007) Robust image watermarking based on multiband wavelets and empirical mode decomposition. IEEE Trans. Image Process. 16 (8), pp. 1956–1966. Cited by: §II-A.
  • [9] R. S. Broughton and W. C. Laumeister (1989-February 21) Interactive video method and apparatus. Google Patents. Note: US Patent 4,807,031 Cited by: §I.
  • [10] O. Byrnes, W. La, H. Wang, C. Ma, M. Xue, and Q. Wu (2021) Data hiding with deep learning: a survey unifying digital watermarking and steganography. arXiv preprint arXiv:2107.09287. Cited by: §I, §II-A.
  • [11] B. Chen and G. W. Wornell (2001) Quantization index modulation: a class of provably good methods for digital watermarking and information embedding. IEEE Trans. Inf. Theory 47 (4), pp. 1423–1443. Cited by: §I.
  • [12] R. T. Chen, J. Behrmann, D. K. Duvenaud, and J. Jacobsen (2019) Residual flows for invertible generative modeling. In Adv. Neural Inform. Process. Syst., pp. 9916–9926. Cited by: §II-B.
  • [13] K. L. Cheng, Y. Xie, and Q. Chen (2021) IICNet: a generic framework for reversible image conversion. In Int. Conf. Comput. Vis., pp. 1991–2000. Cited by: §I, §II-B.
  • [14] I. J. Cox, M. L. Miller, J. A. Bloom, and C. Honsinger (2002) Digital watermarking. Vol. 53, Springer. Cited by: §II.
  • [15] I. Cox, M. Miller, J. Bloom, J. Fridrich, and T. Kalker (2007) Digital watermarking and steganography. Morgan kaufmann. Cited by: §II.
  • [16] P. Dabas and K. Khanna (2013-06) A study on spatial and transform domain watermarking techniques. International Journal of Computer Applications 71, pp. 38–41. Cited by: §II-A.
  • [17] C. Deng, X. Gao, X. Li, and D. Tao (2010) Local histogram based geometric invariant image watermarking. Signal Processing 90 (12), pp. 3256–3264. Cited by: §II-A.
  • [18] L. Dinh, D. Krueger, and Y. Bengio (2014) NICE: non-linear independent components estimation. arXiv preprint arXiv:1410.8516. Cited by: §II-B.
  • [19] L. Dinh, J. Sohl-Dickstein, and S. Bengio (2016) Density estimation using real NVP. arXiv preprint arXiv:1605.08803. Cited by: §II-B.
  • [20] M. Dorkenwald, T. Milbich, A. Blattmann, R. Rombach, K. G. Derpanis, and B. Ommer (2021-06) Stochastic image-to-video synthesis using cinns. In IEEE Conf. Comput. Vis. Pattern Recog., pp. 3742–3753. Cited by: §II-B.
  • [21] M. Faundez-Zanuy, M. Hagmüller, and G. Kubin (2007) Speaker identification security improvement by means of speech watermarking. Pattern Recognition 40 (11), pp. 3027–3034. Cited by: §I.
  • [22] Y. Gao, W. Wang, Y. Jin, C. Zhou, W. Xu, and Z. Jin (2021) ThermoTag: a hidden id of 3d printers for fingerprinting and watermarking. IEEE Trans. Inf. Forensics Security. 16, pp. 2805–2820. Cited by: §II.
  • [23] A. C. Gilbert, Y. Zhang, K. Lee, Y. Zhang, and H. Lee (2017) Towards understanding the invertibility of convolutional neural networks. In IJCAI, pp. 1703–1710. Cited by: §II-B.
  • [24] W. Grathwohl, R. T. Chen, J. Bettencourt, I. Sutskever, and D. Duvenaud (2018) FFJORD: free-form continuous dynamics for scalable reversible generative models. arXiv preprint arXiv:1810.01367. Cited by: §II-B.
  • [25] H. Guo and N. D. Georganas (2002) Digital image watermarking for joint ownership. In ACM Int. Conf. Multimedia, pp. 362–371. Cited by: §II-A.
  • [26] S. Haddad, G. Coatrieux, A. Moreau-Gaudry, and M. Cozic (2020) Joint watermarking-encryption-jpeg-ls for medical image reliability control in encrypted and compressed domains. IEEE Trans. Inf. Forensics Security. 15, pp. 2556–2569. Cited by: §II.
  • [27] M. Hamidi, M. El Haziti, H. Cherifi, and M. El Hassouni (2018) Hybrid blind robust image watermarking technique based on dft-dct and arnold transform. Multimedia Tools Appl. 77 (20), pp. 27181–27214. Cited by: §II-A.
  • [28] J. Hou, D. Kim, and H. Lee (2017) Blind 3d mesh watermarking for 3d printed model by analyzing layering artifact. IEEE Trans. Inf. Forensics Security. 12 (11), pp. 2712–2725. Cited by: §II.
  • [29] Y. Huang, B. Niu, H. Guan, and S. Zhang (2019) Enhancing image watermarking with adaptive embedding parameter and psnr guarantee. IEEE Trans. Multimedia 21 (10), pp. 2447–2460. Cited by: §II-A.
  • [30] J. Jacobsen, A. Smeulders, and E. Oyallon (2018) i-RevNet: deep invertible networks. In Int. Conf. Learn. Represent., Cited by: §II-B.
  • [31] J. Jia, Z. Gao, K. Chen, M. Hu, X. Min, G. Zhai, and X. Yang (2020) RIHOOP: robust invisible hyperlinks in offline and online photographs. IEEE Trans. Cybern.. Cited by: §I, §II-A, §II-A.
  • [32] H. Kandi, D. Mishra, and S. R.K. S. Gorthi (2017) Exploring the learning capabilities of convolutional neural networks for robust image watermarking. Computers & Security 65, pp. 247–268. External Links: ISSN 0167-4048, Document Cited by: §I, §II-A.
  • [33] X. Kang, J. Huang, Y. Q. Shi, and Y. Lin (2003) A dwt-dft composite watermarking scheme robust to both affine transform and jpeg compression. IEEE Trans. Circuits Syst. Video Technol. 13 (8), pp. 776–786. Cited by: §II-A.
  • [34] S. Katzenbeisser and F. Petitcolas (2000) Digital watermarking. Artech House, London 2. Cited by: §II.
  • [35] D. P. Kingma and P. Dhariwal (2018) Glow: generative flow with invertible 1x1 convolutions. In Adv. Neural Inform. Process. Syst., pp. 10215–10224. Cited by: §II-B.
  • [36] C. Kumar, A. K. Singh, and P. Kumar (2018) A recent survey on image watermarking techniques and its application in e-governance. Multimedia Tools Appl. 77 (3), pp. 3597–3622. Cited by: §II-A.
  • [37] X. Li, B. Li, B. Yang, and T. Zeng (2013) General framework to histogram-shifting-based reversible data hiding. IEEE Trans. Image Process. 22 (6), pp. 2181–2191. Cited by: §II-A.
  • [38] B. Lim, S. Son, H. Kim, S. Nah, and K. Mu Lee (2017) Enhanced deep residual networks for single image super-resolution. In IEEE Conf. Comput. Vis. Pattern Recog. Worksh., pp. 136–144. Cited by: §IV-A.
  • [39] Y. Liu, M. Guo, J. Zhang, Y. Zhu, and X. Xie (2019) A novel two-stage separable deep learning framework for practical blind watermarking. In ACM Int. Conf. Multimedia, pp. 1509–1517. Cited by: §II-A.
  • [40] Y. Liu, Z. Qin, S. Anwar, P. Ji, D. Kim, S. Caldwell, and T. Gedeon (2021) Invertible denoising network: a light solution for real noise removal. In IEEE Conf. Comput. Vis. Pattern Recog., pp. 13365–13374. Cited by: §II-B.
  • [41] Z. Liu, Y. Huang, and J. Huang (2018) Patchwork-based audio watermarking robust against de-synchronization and recapturing attacks. IEEE Trans. Inf. Forensics Security. 14 (5), pp. 1171–1180. Cited by: §II.
  • [42] N. A. Loan, N. N. Hurrah, S. A. Parah, J. W. Lee, J. A. Sheikh, and G. M. Bhat (2018) Secure and robust digital image watermarking using coefficient differencing and chaotic encryption. IEEE Access 6, pp. 19876–19897. Cited by: §I.
  • [43] S. Lu, R. Wang, T. Zhong, and P. L. Rosin (2021) Large-capacity image steganography based on invertible neural networks. In IEEE Conf. Comput. Vis. Pattern Recog., pp. 10816–10825. Cited by: §I, §II-B, §III-B.
  • [44] X. Luo, R. Zhan, H. Chang, F. Yang, and P. Milanfar (2020) Distortion agnostic deep watermarking. In IEEE Conf. Comput. Vis. Pattern Recog., pp. 13548–13557. Cited by: §I, §II-A.
  • [45] H. Ma, C. Jia, S. Li, W. Zheng, and D. Wu (2019) Xmark: dynamic software watermarking using collatz conjecture. IEEE Trans. Inf. Forensics Security. 14 (11), pp. 2859–2874. Cited by: §II.
  • [46] S. Mun, S. Nam, H. Jang, D. Kim, and H. Lee (2017) A robust blind watermarking using convolutional neural network. arXiv preprint arXiv:1704.03248. Cited by: §II-A.
  • [47] S. Pereira and T. Pun (2000) Robust template matching for affine resistant image watermarks. IEEE Trans. Image Process. 9 (6), pp. 1123–1129. Cited by: §II-A.
  • [48] K. Pexaras, I. G. Karybali, and E. Kalligeros (2019) Optimization and hardware implementation of image and video watermarking for low-cost applications. IEEE Trans. Circuits Syst. I 66, pp. 2088–2101. Cited by: §I.
  • [49] A. Ray and S. Roy (2020) Recent trends in image watermarking techniques for copyright protection: a survey. International Journal of Multimedia Information Retrieval, pp. 1–22. Cited by: §I, §II-A.
  • [50] R. Rombach, P. Esser, and B. Ommer (2020) Network-to-Network Translation with Conditional Invertible Neural Networks. In Adv. Neural Inform. Process. Syst., Cited by: §II-B.
  • [51] O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpathy, A. Khosla, M. Bernstein, et al. (2015) Imagenet large scale visual recognition challenge. Int. J. Comput. Vis. 115 (3), pp. 211–252. Cited by: §IV-A.
  • [52] H. Sadreazami and M. Amini (2018) A robust image watermarking scheme using local statistical distribution in the contourlet domain. IEEE Trans. Circuits Syst. II 66 (1), pp. 151–155. Cited by: §II-A.
  • [53] R. Shin and D. Song (2017) Jpeg-resistant adversarial images. In

    NIPS 2017 Workshop on Machine Learning and Computer Security

    Vol. 1. Cited by: §III-D.
  • [54] Y. Song, C. Meng, and S. Ermon (2019) Mintnet: building invertible neural networks with masked convolutions. In Adv. Neural Inform. Process. Syst., pp. 11004–11014. Cited by: §II-B.
  • [55] Q. Su and B. Chen (2018) Robust color image watermarking technique in the spatial domain. Soft Computing 22 (1), pp. 91–106. Cited by: §II-A.
  • [56] M. Tancik, B. Mildenhall, and R. Ng (2020) Stegastamp: invisible hyperlinks in physical photographs. In IEEE Conf. Comput. Vis. Pattern Recog., pp. 2117–2126. Cited by: §I, §I, §II-A, §II-A, §II-A.
  • [57] H. Tao, L. Chongmin, J. M. Zain, and A. N. Abdalla (2014) Robust image watermarking theories and techniques: a review. Journal of applied research and technology 12 (1), pp. 122–138. Cited by: §I.
  • [58] H. Tian, Y. Zhao, R. Ni, L. Qin, and X. Li (2013) LDFT-based watermarking resilient to local desynchronization attacks. IEEE Trans. Cybern. 43 (6), pp. 2190–2201. Cited by: §II-A.
  • [59] R. G. Van Schyndel, A. Z. Tirkel, and C. F. Osborne (1994) A digital watermark. In Proceedings of 1st international conference on image processing, Vol. 2, pp. 86–90. Cited by: §II-A, §II.
  • [60] B. Wen and S. Aydöre (2019) ROMark: a robust watermarking system using adversarial training. ArXiv abs/1910.01221. Cited by: §I, §II-A.
  • [61] E. Wengrowski and K. Dana (2019) Light field messaging with deep photographic steganography. In IEEE Conf. Comput. Vis. Pattern Recog., pp. 1515–1524. Cited by: §I, §II-A, §II-A, §II-A.
  • [62] H. Wu, G. Liu, Y. Yao, and X. Zhang (2020) Watermarking neural networks with watermarked images. IEEE Trans. Circuits Syst. Video Technol.. Cited by: §II.
  • [63] S. Xiang, H. J. Kim, and J. Huang (2008) Invariant image watermarking based on statistical features in the low-frequency domain. IEEE Trans. Circuits Syst. Video Technol. 18 (6), pp. 777–790. Cited by: §II-A.
  • [64] M. Xiao, S. Zheng, C. Liu, Y. Wang, D. He, G. Ke, J. Bian, Z. Lin, and T. Liu (2020) Invertible image rescaling. Eur. Conf. Comput. Vis.. Cited by: §II-B.
  • [65] Y. Xie, K. L. Cheng, and Q. Chen (2021) Enhanced invertible encoding for learned image compression. In ACM Int. Conf. Multimedia, pp. 162–170. Cited by: §II-B.
  • [66] C. Yu (2020)

    Attention based data hiding with generative adversarial networks

    In AAAI, Vol. 34, pp. 1120–1128. Cited by: §I, §II-A, §II-A.
  • [67] C. Zhang, P. Benz, A. Karjauv, G. Sun, and I. S. Kweon (2020) Udh: universal deep hiding for steganography, watermarking, and light field messaging. Adv. Neural Inform. Process. Syst. 33, pp. 10223–10234. Cited by: §II-A.
  • [68] H. Zhang, H. Shu, G. Coatrieux, J. Zhu, Q. J. Wu, Y. Zhang, H. Zhu, and L. Luo (2011)

    Affine legendre moment invariants for image watermarking robust to geometric distortions

    IEEE Trans. Image Process. 20 (8), pp. 2189–2199. Cited by: §II-A.
  • [69] L. Zhao, S. Lu, T. Chen, Z. Yang, and A. Shamir (2021) Deep symmetric network for underexposed image enhancement with recurrent attentional learning. In Int. Conf. Comput. Vis., pp. 12075–12084. Cited by: §II-B.
  • [70] R. Zhao, T. Liu, J. Xiao, D. P. Lun, and K. Lam (2021) Invertible image decolorization. IEEE Trans. Image Process. 30, pp. 6081–6095. Cited by: §II-B.
  • [71] X. Zhong, P. C. Huang, S. Mastorakis, and F. Y. Shih (2020) An automated and robust image watermarking scheme based on deep neural networks. IEEE Trans. Multimedia. Cited by: §I, §II-A, §II-A.
  • [72] J. Zhu, R. Kaplan, J. Johnson, and L. Fei-Fei (2018) Hidden: hiding data with deep networks. In Eur. Conf. Comput. Vis., pp. 657–672. Cited by: §I, §I, §II-A, §II-A, §II-A, Fig. 8, §III-C, §III-C, TABLE II, §IV-C1, §IV-C1, §IV-C2, §IV-C, §IV-D.
  • [73] X. Zhu, Z. Li, X. Zhang, C. Li, Y. Liu, and Z. Xue (2019) Residual invertible spatio-temporal network for video super-resolution. In AAAI, Vol. 33, pp. 5981–5988. Cited by: §II-B.