Log In Sign Up

Generative Steganography Network

by   Ping Wei, et al.
FUDAN University

Steganography usually modifies cover media to embed secret data. A new steganographic approach called generative steganography (GS) has emerged recently, in which stego images (images containing secret data) are generated from secret data directly without cover media. However, existing GS schemes are often criticized for their poor performances. In this paper, we propose an advanced generative steganography network (GSN) that can generate realistic stego images without using cover images. We firstly introduce the mutual information mechanism in GS, which helps to achieve high secret extraction accuracy. Our model contains four sub-networks, i.e., an image generator (G), a discriminator (D), a steganalyzer (S), and a data extractor (E). D and S act as two adversarial discriminators to ensure the visual quality and security of generated stego images. E is to extract the hidden secret from generated stego images. The generator G is flexibly constructed to synthesize either cover or stego images with different inputs. It facilitates covert communication by concealing the function of generating stego images in a normal generator. A module named secret block is designed to hide secret data in the feature maps during image generation, with which high hiding capacity and image fidelity are achieved. In addition, a novel hierarchical gradient decay (HGD) skill is developed to resist steganalysis detection. Experiments demonstrate the superiority of our work over existing methods.


page 6

page 7

page 8


ℓ_1SABMIS: ℓ_1-minimization and sparse approximation based blind multi-image steganography scheme

Steganography plays a vital role in achieving secret data security by em...

SABMIS: Sparse approximation based blind multi-image steganography scheme

Steganography is a technique of hiding secret data in some unsuspected c...

Generative Steganography by Sampling

In this paper, a new data-driven information hiding scheme called genera...

Generative Steganography with Kerckhoffs' Principle based on Generative Adversarial Networks

The distortion in steganography that usually comes from the modification...

Secret-to-Image Reversible Transformation for Generative Steganography

Recently, generative steganography that transforms secret information to...

Multitask Identity-Aware Image Steganography via Minimax Optimization

High-capacity image steganography, aimed at concealing a secret image in...

Halftone Image Watermarking by Content Aware Double-sided Embedding Error Diffusion

In this paper, we carry out a performance analysis from a probabilistic ...

1. Introduction

Steganography is a technique that hides secret data in cover media for converting communication(Fridrich, 2009). Various types of cover media have been investigated for steganography(Hussain et al., 2018), including the digital audio (Yi et al., 2019), image (Tao et al., 2019), video (Xu et al., 2014) and text (Borges et al., 2008), where image is the most popular media for data hiding. Early steganographic methods often modify the pixel values of the cover image’s least significant bits to hide secret data(Provos and Honeyman, 2003). Later, researchers paid more attention to the syndrome trellis coding (STC) framework(Filler et al., 2011; Li et al., 2014), which aims to minimize the distortion caused by modifying cover images. These schemes can be called the carrier-modified-based methods, as cover images are modified to hide secret data.

Recently, a few deep learning (DL) based steganographic methods have been proposed

(Subramanian et al., 2021). Tang (Tang et al., 2017)

applies a network to learn the probability map of pixel alteration for data embedding. In

(Hayes and Danezis, 2017), an adversarial training strategy is employed according to the 3-players game. Chu(Chu et al., 2017) studies how CycleGAN(Zhu et al., 2017) hides a source image into the generated images in an imperceptible way. Zhu(Zhu et al., 2018) and Zhang(Zhang et al., 2019a) propose two novel networks to hide random binary secret data in the cover images. Baluja(Baluja, 2020) presents a system to hide color images inside another with minimal quality loss. Zhang(Zhang et al., 2020a) proposes a novel universal deep hiding architecture (UDH) to disentangle the encoding of secret images from the cover image. Yu(Yu, 2020) introduces the attention mechanism in the data hiding process. Jing(Jing et al., 2021) and Lu(Lu et al., 2021) propose two reversible networks to hide secret images in a cover image. The schemes above all require cover images for data embedding. However, modifying the cover images will cause visual or statistical distortions, making the stego images easily detectable by steganalysis tools(Goljan et al., 2014; Ye et al., 2017; Boroumand et al., 2018). Once detected, the behavior of covert communication fails.

To solve the problem, a new steganography manner called generative steganography (GS) emerges (Qin et al., 2019). Instead of modifying the cover image to embed secret data, it aims to synthesize stego images directly from secret data, as illustrated in Fig.1. Cover images are not required in GS, thus, steganalysis tools will become ineffective. Several approaches(Otori and Kuriyama, 2009; Qian et al., 2017; Xu et al., 2015; Li and Zhang, 2018) have been proposed to synthesize some particular stego images to hide secret data, like texture image(Otori and Kuriyama, 2009; Wu and Wang, 2014) and fingerprint image(Li and Zhang, 2018). They can be called tailored GS methods for short. Nevertheless, only some special image types can be synthesized by them. Their data hiding capacities are much lower compared to traditional carrier-modified based works.

To make it more practical, researchers now use neural networks to generate natural stego images

(Qin et al., 2020). They often map the secret data to the input labels(Liu et al., 2017; Zhang et al., 2020c)

or noise vectors

(Hu et al., 2018; Yu et al., 2021) of GANS by a pre-built mapping rule. Then stego images can be generated with the assigned noise vectors or labels according to the mapping rule. Data receivers can extract the hidden secret data from received stego images by a pre-trained extractor. For simplicity, we term such schemes as deep learning (DL) based GS solutions. Compared with tailored GS schemes, these DL-based GS methods can generate more natural stego images. However, their data hiding capacities are much lower due to the limitation of label number and noise vector size. Meanwhile, stego images generated by these methods are often visually poor. Transmitting such images may arouse suspicion of the monitor in covert communication.

In this paper, we propose a novel generative steganography network (GSN) integrated with mutual information mechanism. To disguise transmitting secret messages through stego images, we propose a flexible image generator that can flexibly synthesize cover images or stego images according to the inputs. The main contributions of our work are:

  1. We propose a holistic steganography solution for covert communication, in which mutual information is first introduced in generative steganography.

  2. We hide the function of generating stego images in a flexibly constructed generator, which can generate either cover or stego images depending on the inputs.

  3. A novel technique called hierarchical gradient decay (HGD) is proposed to improve the steganalysis resistance.

  4. Proposed GSN achieves better performances than state-of-the-art works. It can synthesize realistic stego images with high hiding capacity, secret extraction accuracy, and security.

2. Related Works

Most of the published steganographic works are carrier-modified based solutions that require cover images(Subramanian et al., 2021; Baluja, 2020)

. But this paper focuses on generative steganography (GS) that doesn’t require cover images. Existing GS schemes can be roughly classified into two categories: tailored GS and deep learning (DL) based GS.

Tailored generative steganography In these methods, stego images are synthesized according to some handcrafted procedures, where the secret data is encoded into specific textures or patterns. Otori(Otori and Kuriyama, 2009) first proposes encoding secret data into dotted patterns, then stego texture images are synthesized by painting these patterns. In (Wu and Wang, 2014), the authors propose a secret-oriented texture synthesis solution, where secret data is embedded by pasting proper source texture on different locations of the synthesized images, referring to an index table. In (Xu et al., 2015), the authors suggest using marbling images to hide secret data. Secret messages are printed on the background of an image, and then this image is deformed into different marbling patterns using reversible functions. In (Li and Zhang, 2018), secret data is encoded as the positions and polarities of minutia points in fingerprint images. Then, stego fingerprint images can be constructed using a phase demodulation model using these encoded minutiae positions. The main disadvantage of these tailored GS schemes is that their generated stego image contents are unnatural, which is unsafe or even suspicious in covert communication.

DL-based generative steganography To produce natural stego images, some researchers propose to synthesize images with networks. Works(Chu et al., 2017; Duan and Song, 2018) use GANs to transform secret images into meaning-normal stego images, and these stego images can be converted back with another image generator. But they cannot convey random secret data as other steganographic methods do. It is more practical to transmit secret data in steganography. Therefore, Liu(Liu et al., 2017) and Zhang(Zhang et al., 2020c) propose to map binary secret data to the class labels of ACGAN(Odena et al., 2017). By which, different stego images can be generated with the corresponding labels according to the given secret data and mapping rule. The labels of stego images can be extracted using a classifier, and then hidden secret data is recovered according to the mapping rule. Similarly, Hu(Hu et al., 2018) establishes a mapping rule between the secret data and the input noise vector of DCGAN(Radford et al., 2015). Then stego images can be generated with the mapped noise values per the mapping rule and given secret data. In work (Zhang et al., 2019b)

, the authors propose an image inpainting-based GS solution. Where secret messages are embedded in the remaining region of a corrupted image with Cardan grille, then this image is fed into a pre-trained generator for stego image generation. Wang

(Wang et al., 2018) proposes to generate stego images from the concatenation of binary secret data and noise vector using GANS, in which secret data is input to the generator directly. Though the DL-based GS schemes can produce realistic stego images, they are rather rudimentary with poor performance. The stego images generated by them are often of low visual quality. In addition, their hiding capacities are often limited to several hundred bits per image. They can not be improved with the increase of stego image size due to the limitation of GANs’ input dimensions. For example, only 6 bits of secret data are conveyed by every 28×28 stego image in work (Liu et al., 2017), and each 32×32 stego image carries 400 bits of secret data in (Wang et al., 2018).

3. Proposed Method GSN

The architecture of the proposed GSN is given in Fig.2, which consists of a generator (G), a discriminator (D), a steganalyzer (S) and an extractor (E). D and S are used as two discriminators in GANs, which can ensure the visual quality and reduce the difference between generated cover/stego images, respectively. The inputs of GSN include a latent vector z, a noise matrix n or a three dimensional matrix of secret data d. The generator can produce either a cover image or stego image , depending on which of (z, n) and (z, d) is input. Then, real image and the generated stego image are sent to the discriminator to decide whether they are real or fake. Meanwhile, the generated cover/stego images are fed into the steganalyzer for differentiation. The generated stego images are input to the extractor, and is the predicted secret.

Figure 2. The overall framework of our proposed GSN. A cover/stego image can be generated when (z, n)/(z, d) is input. D and S act as dual discriminators to ensure the visual quality and statistical imperceptibility of cover/stego images. E aims to reveal the hidden secret from generated stego image.

3.1. Problem Formulation

In our scheme, a stego image can be generated with secret data d and latent z, i.e, . Secret data influences the image content, which should be recovered exactly from the generated stego image . From the perspective of information theory, the mutual information between d and is expected to be maximized, i.e.,

. That is to say, input secret data and the generated stego images are closely related. Different stego images should be generated when the input secret data is varied, and the hidden secret is hoped be extracted accurately from generated stego images. Therefore, we incorporate the mutual information into GANs for data hiding. The loss function can be defined as:


where, is the adversarial loss of GANs, i.e, . Here, stands for the expectation. is the mutual information between secret data and the generated stego image. G wants to minimize while D expects to maximize this loss function.

But mutual information is hard to be achieved as it requires the posterior distribution . Inspired by Infogan(Chen et al., 2016), a variation lower bound is used to approximate :


In fact, can be computed by G and E:


where, is the entropy of secret data, which has a constant value. is the KL divergence. means sampling secret data d from distribution (). means sampling latent z

from normal distribution

(). means sampling stego images.

is an auxiliary distribution used to approximate the true posterior probability


can be calculated with the Monte Carlo simulation: randomly sample a secret tensor

d and a noise tensor z to synthesize a stego image using the generator G, and then extract the hidden secret with extractor E. Both G and E hope to maximize the lower bound. means extracting the hidden secret from generated stego image, the values of which fall in (0, 1) after a Sigmoid operation. Only when the extracted secret is equal to the input binary secret data d, the maximum value of is reached, and thus mutual information is maximized.

In our scheme, G can synthesize cover image () and stego image (). A steganalysis algorithm SR-net(Boroumand et al., 2018) is used as the backbone of steganalyzer S, which aims to minimize the statistical difference between generated cover/stego images. The adversarial loss between G and S is written as . Different to , S hopes to output the correct predictions ([0,1]or[1,0]) with an binary cross-entropy loss, while G wants S to output [0.5, 0.5] for both cover and stego images, as described in Eq.7 and Eq.10. Both G and S aim to minimize these two losses.

To generate realistic stego images with high secret extraction rate and good undetectability, we combine the loss functions above and set the overall optimization object as:


here, both G, S and E expect to minimize , while D wants to maximize it. and are two hyper-parameters.

3.2. Loss Functions

In this section, we decompose into specific loss functions for each sub-network.

Generator’s loss The loss of generator takes two adversarial training processes and a regularization item into consideration:


here, is the adversarial loss between and . is the adversarial loss of G against S, which ensures the outputs of are close to 0.5 for both cover and stego images (i.e., could not distinguish the source of images). is a regularization item used in (Karras et al., 2020) to improve the training stability and disentangle the dlatents space. and are two hyper parameters; , and refer to the distributions of latent, noise and real image; is a randomly generated noise image with normal distribution, and is a dlatents variable as shown in Fig.3; is the Jacobian matrix of with respect to , and is a weighted value calculated with mean ; means L-2 norm.

Discriminator’s loss Discriminator’s loss is defined as:


here, is the adversarial loss as Eq.5. is a hyper parameter and is the R1 regularization item given by (Mescheder et al., 2018); means the squared gradient of discriminator’s output with respect to input real image x, and is a constant.

Steganalyzer’s loss We adapt the binary cross-entropy loss in steganalyzer (S). S outputs a two-dimension vector rather than a scalar like GANs, which is trained to output the correct predictions([0, 1] or [1, 0]) for input cover/stego images.


here, / refers to the ground truth of cover/stego image.

Secret extraction loss The loss of extracting hidden secret is computed by the binary cross entropy, with the prediction result (as shown in Fig.4), added noise and the input binary data d:



denotes the Bernoulli distribution of binary secret data.

force the results to fall into (0, 1). Here, we add random noise () to generated stego images for improving the robustness. Both G and E are optimized to minimize this loss.

3.3. Training strategy

To effectively train our GSN, sub-networks , , and are optimized sequentially as illustrated in Algorithm 1. and are optimized simultaneously to improve the secret extraction accuracy, where a hierarchical gradient decay (HGD) skill (will be introduced in Sec.3.7) is applied to improve the resistance against steganalysis methods. We optimize G with and separately, mainly to decrease the differences between generated cover/stego images. The real images are only used to train D.

1:A set of real images, secret data d, noise n
2:The trained GSN model
3:for each step do
4:     Generate m pairs of cover and stego images with ;
5:     optimize to minimize ;
6:     optimize to minimize ;
7:     optimize to minimize ;
8:     optimize + to minimize , apply HGD skill in .
Algorithm 1 Training strategy
Figure 3. The architecture of proposed generator. The generator can synthesize either cover or stego image, when secret block is input with n or d, respectively.

3.4. Structure of Generator

To facilitate covert communication, we propose a flexible image generator that can produce either cover or stego images. Fig.3 illustrates the architecture of proposed generator. It is improved based on stylegan2(Karras et al., 2020), which consists of a mapping network and a synthesis network. The former network maps the input latent vector z into an intermediate dlatents vector using eight fully connected (FC) layers. Next, is passed through a module called weight demodulation, which controls the image style. The synthesis network contains several general blocks and a newly designed secret block. The input of G is a 512×4×4 trainable tensor initialized with normal distribution. Higher resolution feature maps can be generated after up-sampling. The general blocks are made up of an upsample operation and several convolution layers. Noise matrix N (, shape: ) is added to the feature maps.

We design a module called secret block to enable the generator to synthesize cover and stego images, as shown in Fig.3. It contains three 1×1 convolution layers (followed by a LeakyRelu operation), three 3×3 convolution layers (with LeakyRelu), two upsample operations and four data merging operations (denoted as red ) and a low pass filter. The weight demodulation module regulates each convolutional layer to adjust the image content. Here, we add a noise matrix n ( , shape: ) into feature maps to generate cover images as original stylegan2, and add a three-dimension secret matrix d (, ) to generate stego images. The generator can synthesize common cover images without secret as stylegan2, or it can be used to generate stego images when necessary. After training, realistic cover/stego images are generated, which can hardly be distinguished by eyes or steganalysis tools. The sender can pretend to transmit synthetic cover images while sending secret messages through stego images, which improves covert communication security.

The function of the secret block lies on two aspects: 1) it enables the generator to synthesize cover or stego images when noise or secret data is input; 2) it removes the defects on image contents and improves the quality of stego images. The input secret data can be written as . Here is the height/width of d , which is of the same size as output stego images. refers to the channel number, which determines the payload of generated stego images. In the secret block, convolutional feature maps of each layer can be expressed as , here N denotes the total channel number of feature maps. The data merging operation aims to add d to feature maps F, which can be described as:


where, is the th () channel of the input secret data d, is the th () channel of feature maps, and is a parameter automatically learned to adjust the strength of merging (see Fig.3 ). ”” denotes the mathematical addition operation. and are of the same height and width with output stego images. The subscript , , and is a non-negative integer.

When random noise n is input, the data merging operation shrinks to pixel-wise addition between n and each feature map . As a result, a cover image without secret data is generated.

Secret data d and noise n may introduce mosaics or defects into the generated images. To mitigate their impacts, we adapt a low pass filter in the secret block:


here, denotes the mathematical matrix multiplication.

3.5. Structure of Extractor

We newly designed a data extractor to extract the hidden secret data from generated stego images. As shown in Fig.4

, it contains several data extraction blocks, two 1×1 convolution layers, and a binarization operation. Each data extraction block includes some convolution/Lrelu operations. After

n ( is set to 3 in our scheme) data extraction blocks, the output result is convolved with 1×1 kernels to produce feature map , which owns the same size with input secret data (B × H × W). A binarization operation is applied to , and is the predicted secret data:


where is the rounding operation.

Figure 4. The architecture of proposed data extractor.

3.6. Steganalyzer and Discriminator

To ensure the steganalysis imperceptibility of the generated stego images, we incorporate D and S as two discriminators for adversarial training. Specifically, we adopt the original discriminator of stylegan2 (Karras et al., 2020) to ensure the image’s visual quality. To generate statistically indistinguishable stego images in steganalysis detection, we use the steganalysis algorithm SR-net(Boroumand et al., 2018) as our steganalyzer S. As shown in Fig.2, the inputs of are the synthetic cover and stego images, and the inputs of are real images and synthetic stego images (or cover images, they have similar results).

3.7. Hierarchical Gradient Decay

Updating the gradients of generator G automatically without constraints will result in significant differences between generated cover/stego images (as shown in Fig.11), which leads to the failure in steganalysis detection. Therefore, we propose the hierarchical gradient decay (HGD) skill to reduce the differences between generated cover/stego images. As illustrated in line 6 of Algorithm1, when optimizing and to minimize , the HGD skill is applied in G, which reduces the gradients of the generator hierarchically as the resolution of the feature map decreases. As a result, the differences between generated cover/stego images are significantly reduced, and the ability to resist steganalysis detection is improved.

It works because the HGD skill force G to hide secret data in high-frequency image details. Steganalysis tools often use the differences of cover/stego images for classification. Once their differences are minor and untraceable enough, steganalysis algorithms are hard to detect the stego images. We find that the proposed secret block is responsible for generating image details while the earlier layers tend to generate low-frequency signals, like shapes and colors of images. Thus, reducing the earlier layers’ gradients of forces to hide the secret in high-frequency image details. As a result, the differences between generated cover/stego images are diminished and more randomly distributed. Thus, our generated cover/stego images have a solid ability for anti-steganalysis detection.

The HGD skill gradually decreases the backward gradients from the final secret block to the first general block. Unlike the learning rate decay schedule that adjusts the gradients according to training steps, the HGD skill regulates the updated gradients of different convolution layers according to the size of the feature map. In particular, for feature maps whose size is , we replace their gradients by:


where refers to the original backward gradient, and is the updated gradient. is the height/width of output images, and is a hyper-parameter. downgrades as the feature size decreases.

4. Experimental Results

Our GSN is trained on TensorFlow 1.14 with four Nvidia 1080Ti GPU. It is evaluated on datasets CelebA

(Liu et al., 2015) and Lsun-bedroom(Yu et al., 2015). Adam is used as the optimizer. We set for in Eq.5; , for in Eq.9; for the HGD skill in Eq.15. Input noise’s distribution is , where is set to 1 for training and 0.1 for testing.

Frechet inception distance (Fid) (Heusel et al., 2017), extraction accuracy (Acc) and detection error (Pe) are used to evaluate the visual quality of generated stego images, secret extraction accuracy and the security of our work, respectively. Lower Fid means better image quality. Acc is calculated as: where, and are the input and extracted secret data. is the element-wise operation. Pe is a common indicator to evaluate the undetectability of stego images, which is defined as: where, and are the false alarm rate and missed detection rate. Pe ranges in [0, 1], and its optimal value is 0.5. When Pe is equal to 0.5, the steganalysis tool cannot distinguish the source of images. All our generated images are saved in PNG format. If the length of input secret data is less than 1 bpp, we can add zeros after it.

4.1. Performance of Proposed GSN

We evaluate our GSN with metrics Acc, Fid and Pe. Different secret payloads are obtained by changing the size of input secret data d (), i.e., vary the value of B (1 to 8) with H and W fixed, as shown in Fig.3. Different GSN models have been trained from scratch with the same setting apart from the payloads and datasets. Then the three metrics are tested with each trained GSN model. Their best performances are recorded in table 1. The Pe values are obtained by SR-net(Boroumand et al., 2018) trained individually in each configuration.

Datasets bpp (B) 1 2 4 6 8
CelebA 128×128 Acc (%) 97.53 81.61 70.14 61.15 59.28
Fid 13.29 15.17 16.21 16.83 18.16
Pe 0.5 0.479 0.499 0.501 0.502 0.498
Bedroom 256×256 Acc (%) 97.25 83.19 72.13 64.17 60.94
Fid 13.21 14.56 15.77 16.89 18.80
Pe 0.5 0.500 0.499 0.502 0.488 0.499
Table 1. The performance of proposed GSN.
Figure 5. 128×128 stego images of faces with various payloads. Images from top to bottom row are with the payload of 1 bpp (bits per pixel), 2bpp, 4bpp, 6bpp and 8bpp.
Figure 6. 256×256 stego images of bedrooms with various payloads. Images from the leftmost to the rightmost column are with the payload of 1 bpp, 2bpp, 4bpp, 6bpp and 8bpp.

As shown in table 1, secret data’s extraction accuracy (Acc) decreases gradually as the payload increases. When the payload is 1 bpp, the Acc values are over 97% for both datasets. We can employ error correction codes further to increase the Acc values in real applications. Our scheme achieves excellent undetectability regardless of payloads, where the Pe values are close to optimal 0.5. Fig.5 and Fig.6 give some stego image samples. These stego images look real and are hard to be distinguished from real ones.

Figure 7. ROC curves of different steganalysis algorithms.

4.2. Security and Steganalysis Resistance

To verify the security of the proposed method, we test our generated stego images with two advanced steganalysis methods, namely SR-net(Boroumand et al., 2018) and Ye-net(Ye et al., 2017). Firstly, we trained different GSN models of various payloads on the dataset CelebA. Secondly, we used each GSN model to generate 5000 cover/stego images randomly for steganalysis training and 1000 random cover/stego images for validation and testing. Thirdly, we trained SR-net and Ye-net on each training dataset from scratch, then each well-trained steganalysis model was tested on the corresponding test dataset. At last, the detection ROC curves are plotted in Fig.7. For steganography, the optimal ROC curve is the counter diagonal (dashed line in the figure), and the optimal AUC value is 0.5. As we can see, our ROC curves and AUC values are close to the ideal results, which indicates our proposed GSN is very safe.

4.3. Influence of Different Inputs on Stego Image

Our stego images are generated with two inputs, i.e., latent vector z and secret data d (see Fig.2). In this section, we analysis their impacts on the generated stego images.

Influence of secret d   We generate two stego images (payload: 1 bpp) with different d but the same z, using a well-trained GSN model. As shown in Fig.8, when d is varied, these two stego images only differ in image details, such as texture and edges.

(a) stego 1
(b) stego 2
(c) color diff
(d) gray diff
Figure 8. Two stego images generated with different d but the same z. The right two images show their colored and gray-scaled differences.

Influence of latent z   Fig.9 gives four stego images (1 bpp) generated with different z but the same d. By modifying the input latent vector z, the appearances of stego images are changed drastically.

Figure 9. Four stego images containing the same secret d.

4.4. Ablation Study

Configurations CelebA, 128×128
Fid Acc (%) Pe 0.5
  Baseline 19.45 99.23 0
 +   Low Pass Filter 12.71 99.56 0
 +   Steganalyzer 13.03 99.67 0.035
 +   Hierarchical gradient decay 13.29 97.53 0.479
Table 2. The performance of different GSN models.

We apply an ablation study to demonstrate the effectiveness of proposed modules and techniques, i.e., the secret block, steganalyzer, and hierarchical gradient decay (HGD) skill. We rebuild a new GS model without the low pass filter, steganalyzer and HGD skill, named baseline. To verify the function of the secret block, we cancel the low pass filter (defined in Eq.13) in it. Then these three modules/techniques are added to the baseline one by one cumulatively, and the performance of each new model is demonstrated in Tab.2. All the results are acquired from 128×128 stego images (1 bpp) on CelebA, and the Pe values are detected by SR-net(Boroumand et al., 2018).

Figure 10. Stego images without (left) / with (right) the low pass filter in secret block.
(a) cover images
(b) stego images
(c) difference × 5
Figure 11. Differences between the generated cover/stego images before (top) and after (bottom) using the hierarchical gradient decay skill. The mean absolute error of cover/stego image pairs is 6.32 (top) and 1.41 (bottom).

By incorporating the low pass filter in the secret block, defects on images are removed, and image quality is enhanced significantly. Fig.10

further demonstrates the image comparison. Adding the steganalyzer will slightly improve the undetectability of stego images with higher Pe. However, it still cannot resist steganalysis detection. The HGD skill improves the Pe value drastically to 0.479 (the ideal value is 0.5), at the cost of reducing Fid and Acc slightly. Fig.

11 compares the differences of cover/stego image pairs before and after using this skill. As we can see, the mean absolute error decreases from 6.32 to 1.41 after employing the HGD skill. Meanwhile, the differences are more randomly distributed in image contents.

4.5. Comparison with State-of-the-art

We compare our work with two types of steganographic methods: 1) Traditional steganography (TS) schemes based on deep learning (DL), which need cover images for data embedding. They are termed DL-based TS for short; 2) Generative steganography (GS) works that can generate stego images without cover media, including tailored GS and DL-based GS methods.

Types Methods
DL-based TS Hidden(Zhu et al., 2018) natural binary 1.83e-3 99.47 0.41
SteganoGAN(Zhang et al., 2019a) natural binary 1 99.90 0.01
Hiding-net(Baluja, 2020) natural image 24 7.16 0.03
HiNet(Jing et al., 2021) natural image 24 56.84 0.02
UDH(Zhang et al., 2020a) natural image 24 20.34 0.01
Tailored GS Wu(Wu and Wang, 2014) texture binary 3.28e-2 100
Li(Li and Zhang, 2018) fingerprint binary 1.34e-3 100
DL-based GS Liu(Liu et al., 2017) natural binary 3.05e-4 70.56 0.48
Zhang(Zhang et al., 2020b) natural binary 3.05e-4 71.85 0.48
GSS(Zhang et al., 2019b) natural binary 8.8e-2 63.50 0.46
Hu-1(Hu et al., 2018) natural binary 1.83e-2 90.50 0.49
Hu-2(Yu et al., 2021) natural binary 7.32e-2 91.73 0.51
Our GSN natural binary 1 97.53 0.51
Our GSN natural binary 2 81.61 0.50
Table 3. Performance comparison with SOTA works.

4.5.1. Comparison with DL-based schemes that need cover images

We compare our GSN with SOTA DL-based TS methods as shown in Tab.3. All these works need cover images for data hiding. For a fair comparison, the 128×128 cover images of CelebA faces generated by our GSN are used as the cover media. Pe values here are obtained by the steganalysis algorithm Ye-net(Ye et al., 2017). In the table, Hidden(Zhu et al., 2018) is a novel solution that embeds watermarks in cover images. SteganoGAN (Zhang et al., 2019a) is a popular scheme that hides binary secret data in cover images. Hiding-net (Baluja, 2020) can hide secret images into a cover image. HiNet(Jing et al., 2021) tries to conceal secret images into the cover images using a revertible network. UDH(Zhang et al., 2020a) is a newly published network for image concealment. In experiments, we only hide one 128×128 secret image in each cover image, carrying a payload of 24 bpp. Their Acc values are the rates of exactly extracted pixels in 3 channels, i.e., . Most of the DL-based TS schemes have low Pe values, which indicates modifying cover images to hide secret data is easily detectable.

(a) Wu, 32.12
(b) Li, 156.58
(c) Liu, 56.27
(d) Zhang, 54.54
(e) GSS, 45.85
(f) Hu-1, 53.99
(g) Hu-2, 52.29
(h) Our, 13.72()
Figure 12. Stego images generated by different GS methods. Methods’ names and the average Niqe scores are annotated below. Lower Niqe score means better image quality. From (a) to (h), each method carries an absolute payload of 537, 22, 5, 5, 1442, 300, 1200 and 16384 bits/image, respectively.

4.5.2. Comparison with GS schemes that don’t need cover images

We compare GSN with two SOTA tailored GS schemes as shown in the middle part of Tab.3, where Wu(Wu and Wang, 2014) produces texture stego images and Li(Li and Zhang, 2018) constructs fingerprint stego images. They own low payloads less than 1e-2 but have high Acc values close to 100%. We also compare our scheme with five SOTA DL-based GS models. They are reimplemented on dataset CelebA, where 128×128 stego images are generated for evaluation. The results are shown in the last rows of Tab.3 and Fig.12. Both Liu(Liu et al., 2017) and Zhang(Zhang et al., 2020b) map secret data to 32 class labels of dataset CelebA. These labels are then input to GANs to generate stego images with the payload of 3.05e-4 bpp (5 bits/image). In GSS(Zhang et al., 2019b), secret data is embedded in corrupted images with a payload of 8.8e-2 bpp, and then stego images are generated by inpainting these images. Both Hu-1(Hu et al., 2018) and Hu-2(Yu et al., 2021) map secret data to the noise vectors of GANs, with the payloads of 300 bits/image and 1200 bits/image, respectively.

Most DL-based TS methods have poor steganographic security, while our proposed GSN achieves better security performance with higher Pe values. Compared to tailored GS schemes, our GSN can generate realistic natural images with a much higher payload. In DL-based GS schemes, our GSN outperforms the other works overall. The payload is more than 11 times higher than that of theirs. Meanwhile, better Acc and Pe values are obtained. What’s more, our work can generate stego images of higher quality. Fig.12 gives a visual comparison among different GS methods. Results show our stego images are more natural with a lower Niqe score and a higher payload, in which a non-reference image assessor Niqe(Mittal et al., 2012) is used to evaluate the image quality.

5. Conclusions

This paper proposes a novel GS solution that integrates mutual information mechanism for stego image synthesis. Our generator is flexibly constructed to generate either cover or stego images, which improves the security in covert communication. We design a delicate secret block to hide secret data into the feature maps during stego image generation, with which high payload and image fidelity are achieved. What’s more, a novel hierarchical gradient decay technique is developed to improve the ability of steganalysis resistance. Meanwhile, a discriminator and a steganalyzer are adopted to improve the visual quality and statistical imperceptibility of generated cover/stego images, respectively. Various experiments have demonstrated the advantages of our GSN over existing works.

6. Acknowledgment

This work was supported by the National Natural Science Foundation of China (U20B2051, 62072114, U20A20178, U1936214).


  • S. Baluja (2020) Hiding images within images. IEEE Transactions on Pattern Analysis and Machine Intelligence 42 (7), pp. 1685–1697. External Links: Document Cited by: §1, §2, §4.5.1, Table 3.
  • P. V. K. Borges, J. Mayer, and E. Izquierdo (2008) Robust and transparent color modulation for text data hiding. IEEE Transactions on Multimedia 10 (8), pp. 1479–1489. Cited by: §1.
  • M. Boroumand, M. Chen, and J. Fridrich (2018) Deep residual network for steganalysis of digital images. IEEE Transactions on Information Forensics and Security 14 (5), pp. 1181–1193. Cited by: §1, §3.1, §3.6, §4.1, §4.2, §4.4.
  • X. Chen, Y. Duan, R. Houthooft, J. Schulman, I. Sutskever, and P. Abbeel (2016) Infogan: interpretable representation learning by information maximizing generative adversarial nets. arXiv preprint arXiv:1606.03657. Cited by: §3.1.
  • C. Chu, A. Zhmoginov, and M. Sandler (2017) Cyclegan, a master of steganography. arXiv preprint arXiv:1712.02950. Cited by: §1, §2.
  • X. Duan and H. Song (2018) Coverless information hiding based on generative model. arXiv preprint arXiv:1802.03528. Cited by: §2.
  • T. Filler, J. Judas, and J. Fridrich (2011) Minimizing additive distortion in steganography using syndrome-trellis codes. IEEE Transactions on Information Forensics and Security 6 (3), pp. 920–935. Cited by: §1.
  • J. Fridrich (2009) Steganography in digital media: principles, algorithms, and applications. Cambridge University Press. Cited by: §1.
  • M. Goljan, J. Fridrich, and R. Cogranne (2014) Rich model for steganalysis of color images. In 2014 IEEE International Workshop on Information Forensics and Security (WIFS), pp. 185–190. Cited by: §1.
  • J. Hayes and G. Danezis (2017) Generating steganographic images via adversarial training. In Advances in Neural Information Processing Systems, pp. 1954–1963. Cited by: §1.
  • M. Heusel, H. Ramsauer, T. Unterthiner, B. Nessler, and S. Hochreiter (2017) GANs trained by a two time-scale update rule converge to a local nash equilibrium. In Proceedings of the 31st International Conference on Neural Information Processing Systems, pp. 6629–6640. Cited by: §4.
  • D. Hu, L. Wang, W. Jiang, S. Zheng, and B. Li (2018)

    A novel image steganography method via deep convolutional generative adversarial networks

    IEEE Access 6, pp. 38303–38314. Cited by: §1, §2, §4.5.2, Table 3.
  • M. Hussain, A. W. A. Wahab, Y. I. B. Idris, A. T. Ho, and K. Jung (2018) Image steganography in spatial domain: a survey. Signal Processing: Image Communication 65, pp. 46–66. Cited by: §1.
  • J. Jing, X. Deng, M. Xu, J. Wang, and Z. Guan (2021) HiNet: deep image hiding by invertible network. In

    Proceedings of the IEEE/CVF International Conference on Computer Vision

    pp. 4733–4742. Cited by: §1, §4.5.1, Table 3.
  • T. Karras, S. Laine, M. Aittala, J. Hellsten, J. Lehtinen, and T. Aila (2020) Analyzing and improving the image quality of stylegan. In

    Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition

    pp. 8110–8119. Cited by: §3.2, §3.4, §3.6.
  • B. Li, M. Wang, J. Huang, and X. Li (2014) A new cost function for spatial image steganography. In 2014 IEEE International Conference on Image Processing (ICIP), pp. 4206–4210. Cited by: §1.
  • S. Li and X. Zhang (2018) Toward construction-based data hiding: from secrets to fingerprint images. IEEE Transactions on Image Processing 28 (3), pp. 1482–1497. Cited by: §1, §2, §4.5.2, Table 3.
  • M. Liu, M. Zhang, J. Liu, Y. Zhang, and Y. Ke (2017) Coverless information hiding based on generative adversarial networks. arXiv preprint arXiv:1712.06951. Cited by: §1, §2, §4.5.2, Table 3.
  • Z. Liu, P. Luo, X. Wang, and X. Tang (2015) Deep learning face attributes in the wild. In Proceedings of the IEEE international conference on computer vision, pp. 3730–3738. Cited by: §4.
  • S. Lu, R. Wang, T. Zhong, and P. L. Rosin (2021) Large-capacity image steganography based on invertible neural networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10816–10825. Cited by: §1.
  • L. Mescheder, A. Geiger, and S. Nowozin (2018) Which training methods for gans do actually converge?. arXiv preprint arXiv:1801.04406. Cited by: §3.2.
  • A. Mittal, R. Soundararajan, and A. C. Bovik (2012) Making a “completely blind” image quality analyzer. IEEE Signal processing letters 20 (3), pp. 209–212. Cited by: §4.5.2.
  • A. Odena, C. Olah, and J. Shlens (2017) Conditional image synthesis with auxiliary classifier gans. In

    Proceedings of the 34th International Conference on Machine Learning-Volume 70

    pp. 2642–2651. Cited by: §2.
  • H. Otori and S. Kuriyama (2009) Texture synthesis for mobile data communications. IEEE Computer graphics and applications 29 (6), pp. 74–81. Cited by: §1, §2.
  • N. Provos and P. Honeyman (2003) Hide and seek: an introduction to steganography. IEEE security & privacy 1 (3), pp. 32–44. Cited by: §1.
  • Z. Qian, H. Zhou, W. Zhang, and X. Zhang (2017) Robust steganography using texture synthesis. In Advances in Intelligent Information Hiding and Multimedia Signal Processing, pp. 25–33. Cited by: §1.
  • J. Qin, Y. Luo, X. Xiang, Y. Tan, and H. Huang (2019) Coverless image steganography: a survey. IEEE Access 7, pp. 171372–171394. Cited by: §1.
  • J. Qin, J. Wang, Y. Tan, H. Huang, X. Xiang, and Z. He (2020) Coverless image steganography based on generative adversarial network. Mathematics 8 (9), pp. 1394. Cited by: §1.
  • A. Radford, L. Metz, and S. Chintala (2015) Unsupervised representation learning with deep convolutional generative adversarial networks. arXiv preprint arXiv:1511.06434. Cited by: §2.
  • N. Subramanian, O. Elharrouss, S. Al-Maadeed, and A. Bouridane (2021) Image steganography: a review of the recent advances. IEEE Access 9 (), pp. 23409–23423. External Links: Document Cited by: §1, §2.
  • W. Tang, S. Tan, B. Li, and J. Huang (2017) Automatic steganographic distortion learning using a generative adversarial network. IEEE Signal Processing Letters 24 (10), pp. 1547–1551. Cited by: §1.
  • J. Tao, S. Li, X. Zhang, and Z. Wang (2019) Towards robust image steganography. IEEE Transactions on Circuits and Systems for Video Technology 29 (2), pp. 594–600. Cited by: §1.
  • Z. Wang, N. Gao, X. Wang, X. Qu, and L. Li (2018) SSteGAN: self-learning steganography based on generative adversarial networks. In International Conference on Neural Information Processing, pp. 253–264. Cited by: §2.
  • K. Wu and C. Wang (2014) Steganography using reversible texture synthesis. IEEE Transactions on Image Processing 24 (1), pp. 130–139. Cited by: §1, §2, §4.5.2, Table 3.
  • D. Xu, R. Wang, and Y. Q. Shi (2014) Data hiding in encrypted h. 264/avc video streams by codeword substitution. IEEE transactions on information forensics and security 9 (4), pp. 596–606. Cited by: §1.
  • J. Xu, X. Mao, X. Jin, A. Jaffer, S. Lu, L. Li, and M. Toyoura (2015) Hidden message in a deformation-based texture. The Visual Computer 31 (12), pp. 1653–1669. Cited by: §1, §2.
  • J. Ye, J. Ni, and Y. Yi (2017) Deep learning hierarchical representations for image steganalysis. IEEE Transactions on Information Forensics and Security 12 (11), pp. 2545–2557. Cited by: §1, §4.2, §4.5.1.
  • X. Yi, K. Yang, X. Zhao, Y. Wang, and H. Yu (2019) AHCM: adaptive huffman code mapping for audio steganography based on psychoacoustic model. IEEE Transactions on Information Forensics and Security 14 (8), pp. 2217–2231. Cited by: §1.
  • C. Yu (2020) Attention based data hiding with generative adversarial networks. In

    Proceedings of the AAAI Conference on Artificial Intelligence

    Vol. 34, pp. 1120–1128. Cited by: §1.
  • C. Yu, D. Hu, S. Zheng, W. Jiang, M. Li, and Z. Zhao (2021) An improved steganography without embedding based on attention gan. Peer-to-Peer Networking and Applications 14 (3), pp. 1446–1457. Cited by: §1, §4.5.2, Table 3.
  • F. Yu, A. Seff, Y. Zhang, S. Song, T. Funkhouser, and J. Xiao (2015) Lsun: construction of a large-scale image dataset using deep learning with humans in the loop. arXiv preprint arXiv:1506.03365. Cited by: §4.
  • C. Zhang, P. Benz, A. Karjauv, G. Sun, and I. S. Kweon (2020a) Udh: universal deep hiding for steganography, watermarking, and light field messaging. Advances in Neural Information Processing Systems 33, pp. 10223–10234. Cited by: §1, §4.5.1, Table 3.
  • K. A. Zhang, A. Cuesta-Infante, L. Xu, and K. Veeramachaneni (2019a) SteganoGAN: high capacity image steganography with gans. arXiv preprint arXiv:1901.03892. Cited by: §1, §4.5.1, Table 3.
  • Z. Zhang, G. Fu, R. Ni, J. Liu, and X. Yang (2020b) A generative method for steganography by cover synthesis with auxiliary semantics. Tsinghua Science and Technology 25 (4), pp. 516–527. Cited by: §4.5.2, Table 3.
  • Z. Zhang, G. Fu, R. Ni, J. Liu, and X. Yang (2020c) A generative method for steganography by cover synthesis with auxiliary semantics. Tsinghua Science and Technology 25 (4), pp. 516–527. Cited by: §1, §2.
  • Z. Zhang, J. Liu, Y. Ke, Y. Lei, J. Li, M. Zhang, and X. Yang (2019b) Generative steganography by sampling. IEEE Access 7, pp. 118586–118597. Cited by: §2, §4.5.2, Table 3.
  • J. Zhu, R. Kaplan, J. Johnson, and L. Fei-Fei (2018) Hidden: hiding data with deep networks. In Proceedings of the European Conference on Computer Vision (ECCV), pp. 657–672. Cited by: §1, §4.5.1, Table 3.
  • J. Zhu, T. Park, P. Isola, and A. A. Efros (2017)

    Unpaired image-to-image translation using cycle-consistent adversarial networks

    In Proceedings of the IEEE international conference on computer vision, pp. 2223–2232. Cited by: §1.