I Introduction
Steganography is the art and science of communicating in such a way that the presence of a message cannot be detected. C.E.Shannon has concluded in his famous paper “Communication theory of secrecy systems” that there are three general types of secrecy system. The first one is concealment system, the second one is privacy system, and the third one is “true” secrecy system. He created a complete informationtheoretic security model for the third system: “true” secrecy system[1]. Meanwhile, he also pointed out that the concealment system is primarily a psychological problem, which implies it is not easy to establish an informationtheoretic security model for concealment system.
After years of development, steganography on multimedia becomes the current mainstream and Cathin[2] proposed the informationtheoretic definition of steganographic security for the concealment system on multimedia. The security of a steganographic system can be quantified in terms of the relative entropy between the distributions of cover and stego , which yields bounds on the detection capability of any adversary. If the distributions of cover and stego are equal, namely, , the stegosystem is perfectly secure and the adversary can have no advantage over merely guessing that the stegoobject contains message or not. The security definition using the relative entropy promotes a new direction of steganography: modelbased steganography.
Modelbased steganography requires us to describe the distribution of cover, and then to pursue the distribution of stego is indiscriminate with that of cover. Under the framework, many works are proposed, such as MB[3], MG[4], MVG[5], and MiPOD[6]. However, the complexity and dimensionality of covers formed by digital media objects, such as natural audios, images and videos, will prevent us from determining a complete distribution . Therefore, these methods cannot achieve provable security. It is possible to realize this idea within a sufficiently simple model of covers, but the steganalyzers can detect the media from other perspectives.
At the same period, some heuristic methods based on minimal distortion steganography
[7] exhibit superior security performance, such as HILL[8], MSUERD[9], YAO[10]. Expressly, these methods try to capture the character of with the distortion function. It is obvious that their security cannot be proved theoretically. Some literatures [11, 12, 13] tried to established provably secure steganography in terms of computational indistinguishability by assuming that the cover is efficiently sampleable. However, this assumption, ”efficiently sampleable” , does not hold for nature multimedia.In summary, the mentioned schemes cannot achieve provable security, owing to the incomplete model of natural media. Actually, selecting natural media as cover is not necessary, because the essence of steganographic security is behavioral security, which tries to disguise the behavior of covert communication with a popular behavior. The reason of choosing natural media as the cover is that they are popular on the Internet, namely, sending natural media is a normal behavior for human. In the same way, any kind of media that are popular on the Internet can be selected as cover for message embedding.
Currently, generative media have merged on the Internet with the rapid development of Artificial Intelligence (AI), especially, generative models. Particularly, many applications based on generative media have been popular, such as Amazon Echo, Google Home, Apple Siri for generative audio, Prisma, styletransfer for generative image. The generative media are widespread on the Internet, which indicates that they are suitable for information hiding.
Generative model describes how generative media are generated, in terms of learning a probabilistic model. Expressly, the generative model will calculate the probability distribution
on the training data, and then a random sampling from the cumulative distribution function (CDF) is adopted for drawing samples. Since generative model exposes explicit probability distribution
, which can make up the shortcoming of the traditional steganographic methods based on natural media. Namely, the provably secure steganography can be implemented with generative media as covers.In this paper, we propose provably secure steganography on generative media. Firstly, we discuss the essence of the steganographic security, which is identical to behavioral security. The essence implies that the generative media (popular on the Internet) are suitable for information hiding. By virtue of the generative model, probability distributions of media are explicit to us. Furthermore, based on the duality of source coding and generating discrete distribution from fair coins, provably secure steganography is proposed. Instead of random sampling from the CDF as ordinary generation does, we combine the source decoding into the process of generation, which can implement the sampling according to the probability distribution as well as embed the encrypted message.
Adaptive Arithmetic Coding is selected as the source coding in proposed framework for message embedding and extraction. It is proved theoretically that the proposed generative steganography framework using Adaptive Arithmetic Coding is asymptotically perfect secure. It is worth mentioning that the adversary may easily distinguish the stego object is generative media, but he cannot discriminate the stego object is innocent or not, since generative media are popular on the Internet.
Taking texttospeech based on WaveNet as an instance, we describe the process of embedding and extracting message in detail, and the experimental results show that the proposed method is asymptotically perfectly secure when resists stateoftheart steganalysis.
The rest of this paper is organized as follows. After introducing notations, we review the essence of steganographic security. The framework of provably secure steganography on generative media is elaborated in Section III. The example adopting the framework and results of comparative experiments are presented in Section IV and Section V, respectively. Conclusion and future work are given in Section VI.
Ii Preliminaries and prior work
Iia The prisoners’ problem
In order to improve the readability of this paper, the prisoners’ problem[14] formulated by Simmons is introduced first, which is simple simulation on the background of steganography. Alice and Bob are imprisoned in separate cells and want to hatch an escape plan. They are allowed to communicate but their communication is monitored by warden Eve. If Eve finds out that the prisoners are secretly exchanging the messages, she will cut the communication channel and throw them into solitary confinement.
IiB Notations
Throughout the paper, matrices, vectors and sets are written in bold face. Alice and Bob work with the set of all possible covers and the sets of keys and messages:
(1) 
(2) 
(3) 
Following the diagram presented in Fig. 1, a steganographic scheme can be regarded as a pair of embedding and extraction functions Emb and Ext for Alice and Bob, respectively,
(4) 
(5) 
such that for all , and all , ,
(6) 
(7) 
Eve judges the object is innocent or not by all the possible knowledge except secret key according to Kerckhoffs’s principle.
IiC The essence of steganographic security
The informationtheoretic definition of steganographic security is given by Cathin[2]. Assuming the cover is drawn from with probability distribution and the steganographic method will generate stegoobject which has the distribution . Given an object , Eve must decide between two hypotheses: , which represents the hypothesis that does not contain a hidden message, and , which stands for the hypothesis that does contain a hidden message. Under hypothesis , the observation is drawn from the distribution , , and under , , conversely[2]. The distance of two distributions can be measured using relative entropy:
(8) 
When , the stegosystem is called perfectly secure because in this case the distribution of the stego objects is identical to the cover’s distribution . Thus it is impossible for Eve to distinguish between covers and stego objects.
The security definition indicates that steganography should pursue the distribution of stego is indiscriminate with that of cover. The mainstream steganographic methods firstly estimate the cover distribution
with a approximate distribution or describe the cost caused by deviating from with a cost function . And then stego objects are generated according to or . However, the complexity, dimensionality and uncontrollable randomness of covers formed by digital equipment, such as natural audios, images and videos, will prevent us from determining a complete distribution . we even don’t know the distance between and . That’s why we cannot get provably secure steganography with natural media. Recent advances on steganalysis also show that the security level of state of the art steganographic schemes is limited.Actually, selecting natural media as cover is not necessary, because the essence of steganographic security is behavioral security. The behavior of concealing message should conform to the characteristics of popular behavior. The reason of natural media are chosen as the cover is that the natural media are popular on the Internet, namely, sending natural media is a normal behavior for human. In the same way, any kind of media that are popular on the Internet can be selected as cover for message embedding.
Currently, generative media have merged on the Internet with the rapid development of Artificial Intelligence (AI), especially, generative model. Particularly, many generative applications have been popular, such as Microsoft Xiaobing for generative text, Amazon Echo, Google Home, Apple Siri for generative audio, Prisma, styletransfer for generative image. As a result, the generative media are suitable for information hiding. Specifically, some of the generative models such as NADE[15], MADE[16], PixelRNN/CNN[17], Glow[18], expose explicit probability distribution of generative media to us, which does favor to implement perfectly secure steganography.
For steganography in generative media, we try to produce stego objects that are difficult to be distinguished from normal generative media. In other words, although the adversary Eve can recognize the stego objects being generative media, he cannot get any evidence on steganography behavior when generative media is popular in some application scenarios.
Iii The provably secure steganography on generative media
Before proposing the framework of provably secure steganography on generative media, the regular architecture of generative model is reviewed.
Iiia Generative model
Generative model describes how media are generated, in terms of a probabilistic model. The generative model can be divided into two categories, i.e., explicit density probability and implicit density probability. In this paper, the generation model is mainly referring to the first category, which owns explicit density probability, including tractable density and approximate density. For instance, NADE[15], MADE[16], PixelRNN/CNN[17]
are generative models with tractable density probability and the density probability of Variational Autoencoder and Boltzmann Machine are approximate
[19].As shown in Fig. 2, the generative model will calculate the probability distribution of the training data, and then a simple random sampling from the cumulative distribution function (CDF) of the generation distribution is adopted for drawing samples.
IiiB Random sampling
In order to help readers to understand random sampling, we illustrate the process in detail.
is a random variable that has a probability mass function
for any , where denotes the sample space, and is the (possibly infinite) number of possible outcomes for the discrete variable , and suppose is in ascending order. Then the CDF, for is(9) 
Discrete random variables can be generated by slicing up the interval into subintervals which define a partition of :
(10) 
generating random variables , and seeing which subinterval falls into. Then the symbol of the subinterval is selected as the current sample.
IiiC Motivations
Based on the review of the generative model, we can find that the generative media is produced by randomly sampling according to the probability mass distribution. In the chapter 5.11 of the classical book of information theory[20], named “Elements of Information Theory”, the relationship between source coding and generating discrete distribution from fair coins is fully exploited. Generating discrete distribution from fair coins is the dual question of source encoding. The source coding considers the problem of representing a random variable by a sequence of bits such that the expected length of the representation was minimized. If the source encoding owns perfect performance, the variable obeying the distribution can be compressed into random bits . As for source decoding, random bits are decompressed into the variable following the distribution , which is identical to generating discrete distribution from fair coins. It is easy to associate information hiding with generating discrete distribution from fair coins, for the message is always encrypted as random bits before embedding. Consequently, the source decoding can replace the random sampling in the process of generating media so that the encrypted message can be decoded into the generative media. In addition, the decoding scheme rigorously follows the distribution of generative media, so the adversary cannot determine whether a generative digital object has a hidden message or not.
IiiD The framework of provably secure steganography
Since the generative model exposes the the explicit density probability distribution and source decoding can generate sample according to discrete distribution from encrypted message bits, the framework of perfectly secure steganography on generative media is proposed in this subsection. As depicted in Fig. 3, the framework is divided into two sides: senderend and receiverend. On the senderend, the generative model can be trained on their own. Alternately, senders can download the model from the Internet. The generative model is shared with the receivers by sending the model at first or telling them where to download. Since the proposed scheme is provably secure that will be given in the subsequent subsection, the generative model can be access to the adversary. The security of proposed scheme is guaranteed by the secret key which is used for encrypting the message.
Then the probability mass distribution is predicted by the model, denoted by , where is the parameter of the model and represents the generative media. As for ordinary generation, the generative media is obtained by random sampling as mentioned before, and is regarded as cover , denoted by:
(11) 
When it comes to steganography, the message is embedded in the process of generation using source decoding instead of random sampling, and the generative media is seen as stego , denoted by
(12) 
where SD means source decoding.
At the receiverend, owning the same generative model, the probability distribution can be obtained as well. using the corresponding source encoding method, given and , the message can be extracted as follows:
(13) 
where SE represents source encoding.
The perfect security of the proposed framework relies on the perfect compression performance of the source coding. There exist many kinds of source coding, such as Huffman codes, LempelZiv codes and Arithmetic Coding. The former two coding methods have problem that the complete codebook needs to be calculated offline, which is infeasible for large codebook sizes. Arithmetic coding for data compression is suboptimal variable length code where the encoding and decoding can be done online, i.e. no codebook needs to be stored,b and its time complexity is . Thanks to these advantages, we selected it as the source coding and explain the process of message embedding and extraction under the proposed framework.
IiiE Message Embedding and Extraction
Given the distribution of generative media, the process of embedding message corresponds to source decoding, and extraction corresponds to source encoding. is the alphabet for a generative cover in a certain order with the probability . The cumulative probability of a symbol can be defined as
(14) 
Owning these notations, we start to introduce the process of message embedding and extraction.
1) Message embedding: Here, we would like to transfer the adaptive arithmetic decoding (AAD) is selected as the source coding. Given the encrypted message , it can be interpreted as a fraction in the range by prepending “0.” to it:
(15) 
Following the adaptive arithmetic decoding algorithm, we start from the interval and subdivide it into the subinterval according to the probability of the symbols in , and then append the symbol corresponding to the subinterval in which the dyadic fraction lies into the stego :
(16) 
where represents appending the subsequent symbol into the previous vector. Regularly, the probability of symbols will be updated. Then calculate the subinterval according to the updated probability by
(17) 
(18) 
where and is the bound of subinterval in the th step. Repeat the process until the fraction satisfies the constraint:
(19) 
The constraint guarantees that the dyadic fraction is the unique fraction of length in the interval , such that the message can be extracted correctly. The message length and the probabilities of symbol are shared with the receiver.
To further clarify the scheme of message embedding using arithmetic decoding, in Algorithm 1 we provide a pseudocode that describes the implementation of message embedding by adaptive arithmetic decoding.
2) Message extraction: Correspondingly, the message extraction refers to adaptive arithmetic encoding (AAE). On the receiverend, the interval starts from , and will be subdivided into subintervals of length proportional to the probabilities of the symbols. Update the subinterval as follows:
(20) 
(21) 
Repeat the process until the number of steps reaches the length of . Finally, find the fraction satisfying , where is the message bit and is the length of message. Analogously, the pseudo code of message extraction is presented in Algorithm 2.
IiiF Proof of asymptotically perfect security
As mentioned before, the message embedding can be seen as the process of decompression and the message extraction is the process of compression. If the compressor is perfect, then the proposed steganography will be proved as perfect secure scheme. Here, we present the proof of the asymptotically perfect security of the proposed scheme.
The arithmetic code is prefix free, and by taking the binary representation of and truncating it to bits[21], we obtain a uniquely decodable code. When it comes to encode the entire sequence , the number of bits required to represent with enough accuracy such that the code for different values of are distinct is
(22) 
Remember that is the number of bits required to encode the entire sequence . Therefore, the average length of an arithmetic code for a sequence of length is given by
(23) 
Given that the average length is always greater than the entropy, the bounds on are
(24) 
The length per symbol , or rate of the arithmetic code is . Therefore, the bounds on are
(25) 
Also we know that the entropy of the sequence is nothing but the length of the sequence times the average entropy of every symbol[22]:
(26) 
Therefore,
(27) 
In our framework, is the real distribution of samples generated by the process of message embedding using AAD, and is the target distribution which we are desired to approximate. According to [20, Theorem 5.4.3], using the wrong distribution for encoding when the true distribution is incurs a penalty of . In other words, the increase in expected description length due to the approximate distribution rather than the true distribution is the relative entropy . Directly extended from Eq. (27), has upper bound:
(28) 
and if , then
(29) 
By increasing the length of the sequence, the relative entropy between and turns to be 0, meaning that the proposed steganographic scheme can asymptotically achieve perfectly security with sufficient elements using arithmetic coding.
Iv An example on Generative steganography
In this section, we will present steganographic scheme on texttospeech system based on WaveNet under proposed framework, for WaveNet exposing explicit probability distribution of the sample. Moreover, audio owns natural advantage that the semantic meaning of the audio is easily recognized by speechtotext system, which brings great convenience and reduces the communication cost a lot. In other words, there is no need to send the semantic meaning of the audio every time when you want to send message.
Iva TexttoSpeech using WaveNet
Fig. 4 shows a texttospeech diagram, where the text is first transferred into melscale spectrograms using spectrogram prediction network (SPN), followed by a modified WaveNet vocoder acting as a vocoder to synthesize timedomain waveforms from those spectrograms[23]. Under the diagram, some generative models achieve comparable mean opinion score (MOS) to professionally recorded speech. Here, the SPN is regarded as a certain tool, and we focus on the WaveNet vocoder, where the steganographic process could engage in.
WaveNet, an audio generative model operating directly on the raw audio waveform based on the sequential generation architecture, which could synthesize speech samples with natural segmental quality. The joint probability of a waveform is factorized as a product of conditional probabilities as follows:
(30) 
Each audio sample is therefore conditioned on the samples at all previous timesteps.
As for texttospeech system, conditional WaveNet is to the choice. Given an additional input , such as text, WaveNet can model the conditional distribution of the audio given this input. Eq. (30) now becomes
(31) 
The process of audio generation of each sample can be divided into two steps: Firstly, the neural network predict the distribution of the current sample
given all previous samples. Secondly, the sampling algorithm will select a value randomly according to the .IvB Perfectly secure Steganography using WaveNet
Following the framework designed in Section III, we sketch perfectly secure steganography scheme based on the texttospeech system including WaveNet. The whole diagram of the concealment system is shown in Fig. 5. The preparation of the system is that the sender should pretrain the texttospeech model including SPN and wavenet vocoder and then send the model to the receiver, or he can also use public models and just tell the receiver where to download. A speechtotext tool is also needed for reducing the communication cost, and both sides better use the same tool so that the sender can ensure the receiver will obtain the same text. The message embedding process is integrated with the waveform generation in wavenet vocoder.
At the senderend, given the input text, the melspectrogram can be predicted by the SPN. As for ordinary waveform generation, the sample is randomly chosen according to the distribution , where the distribution is yielded by the WaveNet. In detail, the probability of the first sample is produced by the WaveNet with the melspectrogram and a seed. As for ordinary waveform generation, the value of the first sample is randomly chosen. Then the probability distribution of is predicted by the network as , similar process will be repeated until all samples are generated.
When it comes to steganography, we need to decompress the encrypted message to the stego sample using Arithmetic Coding. First of all, the message bits is transferred into fraction according to Eq. (15), and the initial interval is divided into subintervals whose length are proportional to the probability distribution of the first sample (same as ) in the predefined order. The predefined order here is the possible values of audio sample in ascending order, such as 128 to 127 for 8 bit audio. Afterwards, the symbol corresponding to the subinterval in which lies is chosen as . Then the probability will be updated as . Analogously, the process will repeated until all samples are generated. To point out, the message length should be shorter than the entropy of . The whole process of message embedding using Algorithm 1 named AADWaveNet, labeled in Fig. 5.
After accomplishing the generation of stego audio, it will be sent to the receiver. Utilizing speechtotext tool such as Deepspeech[24], the audio is recognized into text. Since owning the same model, and message length , the probability distribution of every step will be identical to that generated in the senderend. As a result, the message can be extracted correctly using Algorithm 2. The mentioned process of message extraction using arithmetic encoding is named AAEWaveNet.
V Experiments
In this section, experimental results and analysis are presented to demonstrate the feasibility and effectiveness of the proposed schemes.
Va Setups
VA1 Cover Source
In this paper, we randomly collect 1,000 short text sentences and transfer it into melspectrogram using the SPN^{1}^{1}1The architecture of spectrogram prediction network can be downloaded at https://github.com/Rayhanemamah/Tacotron2. in Tacotron2[23]. Then WaveNet vocoder is used for audio waveform generation^{2}^{2}2The architecture of wavenet vocoder can be downloaded at https://github.com/r9y9/wavenet_vocoder. The wavenet vocoder is trained on CMU ARCTIC dataset[25] with 100,000 steps. All the audio clips are stored in the uncompressed WAV format. The audio length ranges from 0.5s to 3s, and the sample rate is 16kHz.
As for embedding message, the process of random sampling will be replaced with adaptive arithmetic decoding (AADWaveNet). Then original WaveNet vocoder and AADWaveNet steganographic scheme are utilized to generate cover audio and stego audio, respectively. Deepspeech^{3}^{3}3Deepspeech can be downloaded at https://github.com/mozilla/DeepSpeech. is selected for transferring audio to text.
VA2 Steganalysis Features
Steganalysis are implemented to verify the security of generative audio steganography. There are several mainstream universal feature sets for audio steganalysis, such as 2DMarkov[26], DMC[27], TimeMarkov, MelFrequency[28]. Here, DMC and the combined version of TimeMarkov and MelFrequency (abbreviated as CTM) are selected as the steganalysis feature for their stateoftheart performance.
VA3 Classifier
The detectors are trained as binary classifiers implemented using the FLD ensemble with default settings
[29]. A separate classifier is trained for each embedding algorithm and payloads. The ensemble by default minimizes the total classification error probability under equal priors:(32) 
where and are the falsealarm probability and the misseddetection probability respectively. The ultimate security is qualified by average error rate averaged over ten 500/500 database splits, and larger means stronger security.
VB Visualization
Fig .6 presents the process of textmel spectrogramaudio wave. The text “This is really awesome” is converted into melspectrogram and then the audio wave is generated using wavenet vocoder. Fig. 6(b)(c) are the normal audios through random sampling and (d) is the stego audio generated by proposed steganographic algorithm. It can be seen that the differences among normal audio and stego audio are large, which is distinct to steganography by modification. However, the differences between two normal audios like Fig. 6(a) and (b) are large as well, meaning that steganalyzers cannot distinct normal audio and stego audio intuitively.
VC Encoding performance
In order to present the performance of embedding algorithm in respect of distribution discrepancy, we realize embedding algorithm on cover source obeying Gaussian distribution. Gaussian functions are often used to represent the probability density function of a normally distributed random variable
with expected valueand variance
:(33) 
Here, cover values range from 0 to 255, and the parameters of are set as follows: and . The cover is randomly sampled according to the probability distribution, and the stegos with different payloads are generated by AAD. We embed the random message with 0.1 bps (bit per sample) and 0.4 bps, less than the information entropy bound (4.414 bps).
The histograms of cover and stego are shown in Fig. 7, and the red solid line is the curve of preset Gaussian distribution. The first row of Fig. 7 is of short carrier (1,000 samples) and the second row represents long carrier (10,000 samples). We can observed that the long carrier fits better than the short carrier, and the distributions of cover, small payload stego and large payload stego are very close and not easily distinguishable.
We have also calculated the distribution distance of generated samples. Because the KL divergence will produce unstable calculations (division by zero), Bhattacharyya distance[30] is adopted. The Bhattacharyya distance between two classes under the normal distribution can be calculated as follows[31]:
(34) 
where is the Bhattacharyya distance between and distributions or classes, is the variance of the th distribution, is the mean of the th distribution and are two different distributions. The distance between the preset Gaussian distribution and generated distribution is calculated. As for calculation, th distribution is the preset Gaussian distribution where and th distribution is the generated distribution. The results are shown in Table I. The distance of normal sample and stego samples has no straight correlation, meaning that they are indistinguishable. By increasing the length of the sequence, the distribution distance becomes smaller, which reveals the generated sample using the proposed steganographic scheme is asymptotically incidental to the target sample, which implies the proposed scheme can asymptotically achieve perfectly security with sufficient cover elements. Furthermore, the asymptotically perfect security of the proposed scheme will be verified precisely by strong steganalytic methods in the next subsection.
Cover  Stego(0.1bps)  Stego(0.4bps)  

Short sample  5.85  3.27  5.07 
Long sample  3.15  7.72  5.87 
VD Security performance
The stateoftheart steganalytic methods are utilized to verify the security of proposed scheme. Since the steganalysis performance is diverse and related to cover source, we realize other steganographic methods to show the effectiveness of the selected steganalysis features. LSB matching[32] and AACbased[33] algorithms are chosen, where the former is the conventional and the latter is contentadaptive. LSB matching means that if the message bit does not match the LSB of the cover element, then one is randomly either added or subtracted from the value of the cover pixel element. Otherwise, there is no need for modification. AACbased algorithm is simulated at its payloaddistortion bound. The distortion of AACbased is defined as the reciprocal of the difference between the original audio and the reconstructed audio through compression and decompression by advanced audio coding.
Fig.8 and Fig.9 show the average detection error rate as a function of payload in bps for steganographic algorithm payloads ranging from 0.10.5 bps against CTM and DMC. It can be observed in Fig. 8 and Fig. 9 that the of AACbased decreases with the increment of payload and turns to be nearly 0%, and that of LSB matching is always nearly 0%, showing that the steganalysis is effective with respect to the generated audio. Accordingly, of proposed scheme is nearly 50%, which means the proposed scheme is nearly perfectly secure. In other words, the strong steganalyzer judges the stego nearly by random guess. The experimental results verify the security performance as proved in Section
Vi Conclusions
Based on the relationship of source coding and generating discrete distribution from fair coins and the explicit probability distribution yielded by generative model, perfectly secure steganography on generative media is proposed. Instead of random sampling from the cumulative distribution function as ordinary generative models do, we combine the source decoding into the process of generation, which can implement the sampling according to the probability distribution as well as embed the encrypted message.
Arithmetic Coding is selected as the source coding method in the proposed framework, which is proved that it can asymptotically achieve perfect security. Take texttospeech system as an instance, message embedding and extraction using the proposed scheme are illustrated in detail. Distribution distance and steganalysis are utilized to assess the performance of AAD and steganalytic performance. The results show that the proposed steganographic method can achieve asymptotically perfectly secure.
In our future work, we will explore other effective source encoding schemes and try to transfer them to generative steganographic encoding. Furthermore, other generative media, such as text and video, will be utilized under the proposed framework.
Acknowledgment
The authors would like to thank Prof. Weiqi Luo from Sun Yatsen University for providing us the source codes of audio steganalysis. The authors also would like to thank Ryuichi Yamamoto for his valuable advice.
References
 [1] C. E. Shannon, “Communication theory of secrecy systems,” Bell system technical journal, vol. 28, no. 4, pp. 656–715, 1949.
 [2] C. Cachin, “An informationtheoretic model for steganography,” in International Workshop on Information Hiding. Springer, 1998, pp. 306–318.
 [3] P. Sallee, “Modelbased steganography,” in International workshop on digital watermarking. Springer, 2003, pp. 154–167.
 [4] J. Fridrich and J. Kodovskỳ, “Multivariate gaussian model for designing additive distortion for steganography,” in 2013 IEEE International Conference on Acoustics, Speech and Signal Processing. IEEE, 2013, pp. 2949–2953.
 [5] V. Sedighi, J. Fridrich, and R. Cogranne, “Contentadaptive pentary steganography using the multivariate generalized gaussian cover model,” in Media Watermarking, Security, and Forensics 2015, vol. 9409. International Society for Optics and Photonics, 2015, p. 94090H.
 [6] V. Sedighi, R. Cogranne, and J. Fridrich, “Contentadaptive steganography by minimizing statistical detectability,” IEEE Transactions on Information Forensics and Security, vol. 11, no. 2, pp. 221–234, 2016.
 [7] T. Filler, J. Judas, and J. Fridrich, “Minimizing additive distortion in steganography using syndrometrellis codes,” IEEE Transactions on Information Forensics and Security, vol. 6, no. 3, pp. 920–935, 2011.
 [8] B. Li, M. Wang, J. Huang, and X. Li, “A new cost function for spatial image steganography,” in 2014 IEEE International Conference on Image Processing (ICIP). IEEE, 2014, pp. 4206–4210.
 [9] K. Chen, H. Zhou, W. Zhou, W. Zhang, and N. Yu, “Defining cost functions for adaptive jpeg steganography at the microscale,” IEEE Transactions on Information Forensics and Security, pp. 1–1, 2018.
 [10] Y. Yao, W. Zhang, N. Yu, and X. Zhao, “Defining embedding distortion for motion vectorbased video steganography,” Multimedia tools and Applications, vol. 74, no. 24, pp. 11 163–11 186, 2015.
 [11] N. J. Hopper, J. Langford, and L. Von Ahn, “Provably secure steganography,” in Annual International Cryptology Conference. Springer, 2002, pp. 77–92.
 [12] A. Lysyanskaya and M. Meyerovich, “Provably secure steganography with imperfect sampling,” in International Workshop on Public Key Cryptography. Springer, 2006, pp. 123–139.
 [13] N. Hopper, L. von Ahn, and J. Langford, “Provably secure steganography,” IEEE Transactions on Computers, vol. 58, no. 5, pp. 662–676, 2009.
 [14] G. J. Simmons, “The prisoners’ problem and the subliminal channel,” in Advances in Cryptology. Springer, 1984, pp. 51–67.
 [15] H. Larochelle and I. Murray, “The neural autoregressive distribution estimator,” in Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics, 2011, pp. 29–37.

[16]
M. Germain, K. Gregor, I. Murray, and H. Larochelle, “Made: Masked autoencoder
for distribution estimation,” in
International Conference on Machine Learning
, 2015, pp. 881–889.  [17] A. v. d. Oord, N. Kalchbrenner, and K. Kavukcuoglu, “Pixel recurrent neural networks,” arXiv preprint arXiv:1601.06759, 2016.
 [18] D. P. Kingma and P. Dhariwal, “Glow: Generative flow with invertible 1x1 convolutions,” arXiv preprint arXiv:1807.03039, 2018.
 [19] I. Goodfellow, “Nips 2016 tutorial: Generative adversarial networks,” arXiv preprint arXiv:1701.00160, 2016.
 [20] T. M. Cover and J. A. Thomas, Elements of information theory. John Wiley & Sons, 2012.
 [21] K. Sayood, Introduction to data compression. Morgan Kaufmann, 2017.
 [22] A. Said, “Introduction to arithmetic codingtheory and practice,” Hewlett Packard Laboratories Report, pp. 1057–7149, 2004.
 [23] J. Shen, R. Pang, R. J. Weiss, M. Schuster, N. Jaitly, Z. Yang, Z. Chen, Y. Zhang, Y. Wang, R. SkerryRyan et al., “Natural tts synthesis by conditioning wavenet on mel spectrogram predictions,” arXiv preprint arXiv:1712.05884, 2017.
 [24] A. Hannun, C. Case, J. Casper, B. Catanzaro, G. Diamos, E. Elsen, R. Prenger, S. Satheesh, S. Sengupta, A. Coates et al., “Deep speech: Scaling up endtoend speech recognition,” arXiv preprint arXiv:1412.5567, 2014.
 [25] J. Kominek and A. W. Black, “The cmu arctic speech databases,” in Fifth ISCA workshop on speech synthesis, 2004.
 [26] Q. Liu, A. H. Sung, and M. Qiao, “Temporal derivativebased spectrum and melcepstrum audio steganalysis,” IEEE Transactions on Information Forensics and Security, vol. 4, no. 3, pp. 359–368, 2009.
 [27] Q. Liu, A. H. Sung, and M. Qiao, “Derivativebased audio steganalysis,” ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM), vol. 7, no. 3, p. 18, 2011.
 [28] W. Luo, H. Li, Q. Yan, R. Yang, and J. Huang, “Improved audio steganalytic feature and its applications in audio forensics,” ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM), vol. 14, no. 2, p. 43, 2018.
 [29] J. Kodovskỳ, J. J. Fridrich, and V. Holub, “Ensemble classifiers for steganalysis of digital media.” IEEE Trans. Information Forensics and Security, vol. 7, no. 2, pp. 432–444, 2012.
 [30] A. Bhattacharyya, “On a measure of divergence between two statistical populations defined by their probability distributions,” Bull. Calcutta Math. Soc., vol. 35, pp. 99–109, 1943.
 [31] G. B. Coleman and H. C. Andrews, “Image segmentation by clustering,” Proceedings of the IEEE, vol. 67, no. 5, pp. 773–785, 1979.
 [32] J. Mielikainen, “Lsb matching revisited,” IEEE signal processing letters, vol. 13, no. 5, pp. 285–287, 2006.
 [33] W. Luo, Y. Zhang, and H. Li, “Adaptive audio steganography based on advanced audio coding and syndrometrellis coding,” in International Workshop on Digital Watermarking. Springer, 2017, pp. 177–186.
Comments
There are no comments yet.