Provably Secure Steganography on Generative Media

11/09/2018 ∙ by Kejiang Chen, et al. ∙ USTC 0

In this paper, we propose provably secure steganography on generative media. Firstly, we discuss the essence of the steganographic security, which is identical to behavioral security. The behavioral security implies that the generative media are suitable for information hiding as well. Based on the duality of source coding and generating discrete distribution from fair coins and the explicit probability distribution yielded by generative model, perfectly secure steganography on generative media is proposed. Instead of random sampling from the probability distribution as ordinary generative models do, we combine the source decoding into the process of generation, which can implement the sampling according to the probability distribution as well as embed the encrypted message. Adaptive Arithmetic Coding is selected as the source coding method, and it is proved theoretically that the proposed generative steganography framework using adaptive Arithmetic Coding is asymptotically perfect secure. Taking text-to-speech system based on WaveNet as an instance, we describe the process of embedding and extracting message in detail, and the experimental results show that the proposed method is nearly perfectly secure when resists state-of-the-art steganalysis.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 7

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

Steganography is the art and science of communicating in such a way that the presence of a message cannot be detected. C.E.Shannon has concluded in his famous paper “Communication theory of secrecy systems” that there are three general types of secrecy system. The first one is concealment system, the second one is privacy system, and the third one is “true” secrecy system. He created a complete information-theoretic security model for the third system: “true” secrecy system[1]. Meanwhile, he also pointed out that the concealment system is primarily a psychological problem, which implies it is not easy to establish an information-theoretic security model for concealment system.

After years of development, steganography on multimedia becomes the current mainstream and Cathin[2] proposed the information-theoretic definition of steganographic security for the concealment system on multimedia. The security of a steganographic system can be quantified in terms of the relative entropy between the distributions of cover and stego , which yields bounds on the detection capability of any adversary. If the distributions of cover and stego are equal, namely, , the stegosystem is perfectly secure and the adversary can have no advantage over merely guessing that the stego-object contains message or not. The security definition using the relative entropy promotes a new direction of steganography: model-based steganography.

Model-based steganography requires us to describe the distribution of cover, and then to pursue the distribution of stego is indiscriminate with that of cover. Under the framework, many works are proposed, such as MB[3], MG[4], MVG[5], and MiPOD[6]. However, the complexity and dimensionality of covers formed by digital media objects, such as natural audios, images and videos, will prevent us from determining a complete distribution . Therefore, these methods cannot achieve provable security. It is possible to realize this idea within a sufficiently simple model of covers, but the steganalyzers can detect the media from other perspectives.

At the same period, some heuristic methods based on minimal distortion steganography

[7] exhibit superior security performance, such as HILL[8], MSUERD[9], YAO[10]. Expressly, these methods try to capture the character of with the distortion function. It is obvious that their security cannot be proved theoretically. Some literatures [11, 12, 13] tried to established provably secure steganography in terms of computational indistinguishability by assuming that the cover is efficiently sampleable. However, this assumption, ”efficiently sampleable” , does not hold for nature multimedia.

In summary, the mentioned schemes cannot achieve provable security, owing to the incomplete model of natural media. Actually, selecting natural media as cover is not necessary, because the essence of steganographic security is behavioral security, which tries to disguise the behavior of covert communication with a popular behavior. The reason of choosing natural media as the cover is that they are popular on the Internet, namely, sending natural media is a normal behavior for human. In the same way, any kind of media that are popular on the Internet can be selected as cover for message embedding.

Currently, generative media have merged on the Internet with the rapid development of Artificial Intelligence (AI), especially, generative models. Particularly, many applications based on generative media have been popular, such as Amazon Echo, Google Home, Apple Siri for generative audio, Prisma, style-transfer for generative image. The generative media are widespread on the Internet, which indicates that they are suitable for information hiding.

Generative model describes how generative media are generated, in terms of learning a probabilistic model. Expressly, the generative model will calculate the probability distribution

on the training data, and then a random sampling from the cumulative distribution function (CDF) is adopted for drawing samples. Since generative model exposes explicit probability distribution

, which can make up the shortcoming of the traditional steganographic methods based on natural media. Namely, the provably secure steganography can be implemented with generative media as covers.

In this paper, we propose provably secure steganography on generative media. Firstly, we discuss the essence of the steganographic security, which is identical to behavioral security. The essence implies that the generative media (popular on the Internet) are suitable for information hiding. By virtue of the generative model, probability distributions of media are explicit to us. Furthermore, based on the duality of source coding and generating discrete distribution from fair coins, provably secure steganography is proposed. Instead of random sampling from the CDF as ordinary generation does, we combine the source decoding into the process of generation, which can implement the sampling according to the probability distribution as well as embed the encrypted message.

Adaptive Arithmetic Coding is selected as the source coding in proposed framework for message embedding and extraction. It is proved theoretically that the proposed generative steganography framework using Adaptive Arithmetic Coding is asymptotically perfect secure. It is worth mentioning that the adversary may easily distinguish the stego object is generative media, but he cannot discriminate the stego object is innocent or not, since generative media are popular on the Internet.

Taking text-to-speech based on WaveNet as an instance, we describe the process of embedding and extracting message in detail, and the experimental results show that the proposed method is asymptotically perfectly secure when resists state-of-the-art steganalysis.

The rest of this paper is organized as follows. After introducing notations, we review the essence of steganographic security. The framework of provably secure steganography on generative media is elaborated in Section III. The example adopting the framework and results of comparative experiments are presented in Section IV and Section V, respectively. Conclusion and future work are given in Section VI.

Ii Preliminaries and prior work

Ii-a The prisoners’ problem

In order to improve the readability of this paper, the prisoners’ problem[14] formulated by Simmons is introduced first, which is simple simulation on the background of steganography. Alice and Bob are imprisoned in separate cells and want to hatch an escape plan. They are allowed to communicate but their communication is monitored by warden Eve. If Eve finds out that the prisoners are secretly exchanging the messages, she will cut the communication channel and throw them into solitary confinement.

Ii-B Notations

Throughout the paper, matrices, vectors and sets are written in bold face. Alice and Bob work with the set of all possible covers and the sets of keys and messages:

(1)
(2)
(3)
Fig. 1: A diagram of steganographic communication.

Following the diagram presented in Fig. 1, a steganographic scheme can be regarded as a pair of embedding and extraction functions Emb and Ext for Alice and Bob, respectively,

(4)
(5)

such that for all , and all , ,

(6)
(7)

Eve judges the object is innocent or not by all the possible knowledge except secret key according to Kerckhoffs’s principle.

Ii-C The essence of steganographic security

The information-theoretic definition of steganographic security is given by Cathin[2]. Assuming the cover is drawn from with probability distribution and the steganographic method will generate stego-object which has the distribution . Given an object , Eve must decide between two hypotheses: , which represents the hypothesis that does not contain a hidden message, and , which stands for the hypothesis that does contain a hidden message. Under hypothesis , the observation is drawn from the distribution , , and under , , conversely[2]. The distance of two distributions can be measured using relative entropy:

(8)

When , the stegosystem is called perfectly secure because in this case the distribution of the stego objects is identical to the cover’s distribution . Thus it is impossible for Eve to distinguish between covers and stego objects.

The security definition indicates that steganography should pursue the distribution of stego is indiscriminate with that of cover. The mainstream steganographic methods firstly estimate the cover distribution

with a approximate distribution or describe the cost caused by deviating from with a cost function . And then stego objects are generated according to or . However, the complexity, dimensionality and uncontrollable randomness of covers formed by digital equipment, such as natural audios, images and videos, will prevent us from determining a complete distribution . we even don’t know the distance between and . That’s why we cannot get provably secure steganography with natural media. Recent advances on steganalysis also show that the security level of state of the art steganographic schemes is limited.

Actually, selecting natural media as cover is not necessary, because the essence of steganographic security is behavioral security. The behavior of concealing message should conform to the characteristics of popular behavior. The reason of natural media are chosen as the cover is that the natural media are popular on the Internet, namely, sending natural media is a normal behavior for human. In the same way, any kind of media that are popular on the Internet can be selected as cover for message embedding.

Currently, generative media have merged on the Internet with the rapid development of Artificial Intelligence (AI), especially, generative model. Particularly, many generative applications have been popular, such as Microsoft Xiaobing for generative text, Amazon Echo, Google Home, Apple Siri for generative audio, Prisma, styletransfer for generative image. As a result, the generative media are suitable for information hiding. Specifically, some of the generative models such as NADE[15], MADE[16], PixelRNN/CNN[17], Glow[18], expose explicit probability distribution of generative media to us, which does favor to implement perfectly secure steganography.

For steganography in generative media, we try to produce stego objects that are difficult to be distinguished from normal generative media. In other words, although the adversary Eve can recognize the stego objects being generative media, he cannot get any evidence on steganography behavior when generative media is popular in some application scenarios.

Iii The provably secure steganography on generative media

Before proposing the framework of provably secure steganography on generative media, the regular architecture of generative model is reviewed.

Iii-a Generative model

Generative model describes how media are generated, in terms of a probabilistic model. The generative model can be divided into two categories, i.e., explicit density probability and implicit density probability. In this paper, the generation model is mainly referring to the first category, which owns explicit density probability, including tractable density and approximate density. For instance, NADE[15], MADE[16], PixelRNN/CNN[17]

are generative models with tractable density probability and the density probability of Variational Autoencoder and Boltzmann Machine are approximate

[19].

As shown in Fig. 2, the generative model will calculate the probability distribution of the training data, and then a simple random sampling from the cumulative distribution function (CDF) of the generation distribution is adopted for drawing samples.

Fig. 2: The framework of sample generation using generative model.

Iii-B Random sampling

In order to help readers to understand random sampling, we illustrate the process in detail.

is a random variable that has a probability mass function

for any , where denotes the sample space, and is the (possibly infinite) number of possible outcomes for the discrete variable , and suppose is in ascending order. Then the CDF, for is

(9)

Discrete random variables can be generated by slicing up the interval into subintervals which define a partition of :

(10)

generating random variables , and seeing which subinterval falls into. Then the symbol of the subinterval is selected as the current sample.

Iii-C Motivations

Based on the review of the generative model, we can find that the generative media is produced by randomly sampling according to the probability mass distribution. In the chapter 5.11 of the classical book of information theory[20], named “Elements of Information Theory”, the relationship between source coding and generating discrete distribution from fair coins is fully exploited. Generating discrete distribution from fair coins is the dual question of source encoding. The source coding considers the problem of representing a random variable by a sequence of bits such that the expected length of the representation was minimized. If the source encoding owns perfect performance, the variable obeying the distribution can be compressed into random bits . As for source decoding, random bits are decompressed into the variable following the distribution , which is identical to generating discrete distribution from fair coins. It is easy to associate information hiding with generating discrete distribution from fair coins, for the message is always encrypted as random bits before embedding. Consequently, the source decoding can replace the random sampling in the process of generating media so that the encrypted message can be decoded into the generative media. In addition, the decoding scheme rigorously follows the distribution of generative media, so the adversary cannot determine whether a generative digital object has a hidden message or not.

Fig. 3: The framework of perfectly secure steganography on generative media. Sender and receiver share the same generative model, so the probability distributions of both sides are the same. Normal users generate the media by random sampling, which is regarded as cover. Steganographer will embed the message by source decoding according to the probability distribution, and the generative media is seen as stego. Receivers own the same probability distribution and stego, so the message can be extracted by source encoding.

Iii-D The framework of provably secure steganography

Since the generative model exposes the the explicit density probability distribution and source decoding can generate sample according to discrete distribution from encrypted message bits, the framework of perfectly secure steganography on generative media is proposed in this subsection. As depicted in Fig. 3, the framework is divided into two sides: sender-end and receiver-end. On the sender-end, the generative model can be trained on their own. Alternately, senders can download the model from the Internet. The generative model is shared with the receivers by sending the model at first or telling them where to download. Since the proposed scheme is provably secure that will be given in the subsequent subsection, the generative model can be access to the adversary. The security of proposed scheme is guaranteed by the secret key which is used for encrypting the message.

Then the probability mass distribution is predicted by the model, denoted by , where is the parameter of the model and represents the generative media. As for ordinary generation, the generative media is obtained by random sampling as mentioned before, and is regarded as cover , denoted by:

(11)

When it comes to steganography, the message is embedded in the process of generation using source decoding instead of random sampling, and the generative media is seen as stego , denoted by

(12)

where SD means source decoding.

At the receiver-end, owning the same generative model, the probability distribution can be obtained as well. using the corresponding source encoding method, given and , the message can be extracted as follows:

(13)

where SE represents source encoding.

The perfect security of the proposed framework relies on the perfect compression performance of the source coding. There exist many kinds of source coding, such as Huffman codes, Lempel-Ziv codes and Arithmetic Coding. The former two coding methods have problem that the complete codebook needs to be calculated offline, which is infeasible for large codebook sizes. Arithmetic coding for data compression is sub-optimal variable length code where the encoding and decoding can be done online, i.e. no codebook needs to be stored,b and its time complexity is . Thanks to these advantages, we selected it as the source coding and explain the process of message embedding and extraction under the proposed framework.

Iii-E Message Embedding and Extraction

Given the distribution of generative media, the process of embedding message corresponds to source decoding, and extraction corresponds to source encoding. is the alphabet for a generative cover in a certain order with the probability . The cumulative probability of a symbol can be defined as

(14)

Owning these notations, we start to introduce the process of message embedding and extraction.

1) Message embedding: Here, we would like to transfer the adaptive arithmetic decoding (AAD) is selected as the source coding. Given the encrypted message , it can be interpreted as a fraction in the range by prepending “0.” to it:

(15)

Following the adaptive arithmetic decoding algorithm, we start from the interval and subdivide it into the subinterval according to the probability of the symbols in , and then append the symbol corresponding to the subinterval in which the dyadic fraction lies into the stego :

(16)

where represents appending the subsequent symbol into the previous vector. Regularly, the probability of symbols will be updated. Then calculate the subinterval according to the updated probability by

(17)
(18)

where and is the bound of subinterval in the th step. Repeat the process until the fraction satisfies the constraint:

(19)

The constraint guarantees that the dyadic fraction is the unique fraction of length in the interval , such that the message can be extracted correctly. The message length and the probabilities of symbol are shared with the receiver.

To further clarify the scheme of message embedding using arithmetic decoding, in Algorithm 1 we provide a pseudo-code that describes the implementation of message embedding by adaptive arithmetic decoding.

0:  The random message , the probability distribution and the cumulative .
0:  The stego sequence .
1:  convert the random message bits into a fraction .
2:   = 1
3:   = 0
4:  k = 0
5:  while  do
6:     
7:     subdivide the interval into subintervals of length proportional to the probabilities of the symbols in cover in the predefined order. The probabilities can be updated if needed.
8:     take the symbol corresponding to the subinterval in which lies.
9:     
10:     
11:     
12:  end while
13:  
Algorithm 1 Message embedding using AAD
0:  The stego sequence , the probability distribution , the message length and the cumulative .
0:  The message .
1:   = 1
2:   = 0
3:  k = 0
4:  while  do
5:     subdivide the interval into subintervals of length proportional to the probabilities of the symbols in cover (in the predefined order).
6:     
7:     
8:     
9:  end while
10:  find the fraction satisfying , where is the message bit.
11:  
Algorithm 2 Message extraction using AAC

2) Message extraction: Correspondingly, the message extraction refers to adaptive arithmetic encoding (AAE). On the receiver-end, the interval starts from , and will be subdivided into subintervals of length proportional to the probabilities of the symbols. Update the subinterval as follows:

(20)
(21)

Repeat the process until the number of steps reaches the length of . Finally, find the fraction satisfying , where is the message bit and is the length of message. Analogously, the pseudo code of message extraction is presented in Algorithm 2.

Iii-F Proof of asymptotically perfect security

As mentioned before, the message embedding can be seen as the process of decompression and the message extraction is the process of compression. If the compressor is perfect, then the proposed steganography will be proved as perfect secure scheme. Here, we present the proof of the asymptotically perfect security of the proposed scheme.

The arithmetic code is prefix free, and by taking the binary representation of and truncating it to bits[21], we obtain a uniquely decodable code. When it comes to encode the entire sequence , the number of bits required to represent with enough accuracy such that the code for different values of are distinct is

(22)

Remember that is the number of bits required to encode the entire sequence . Therefore, the average length of an arithmetic code for a sequence of length is given by

(23)

Given that the average length is always greater than the entropy, the bounds on are

(24)

The length per symbol , or rate of the arithmetic code is . Therefore, the bounds on are

(25)

Also we know that the entropy of the sequence is nothing but the length of the sequence times the average entropy of every symbol[22]:

(26)

Therefore,

(27)

In our framework, is the real distribution of samples generated by the process of message embedding using AAD, and is the target distribution which we are desired to approximate. According to [20, Theorem 5.4.3], using the wrong distribution for encoding when the true distribution is incurs a penalty of . In other words, the increase in expected description length due to the approximate distribution rather than the true distribution is the relative entropy . Directly extended from Eq. (27), has upper bound:

(28)

and if , then

(29)

By increasing the length of the sequence, the relative entropy between and turns to be 0, meaning that the proposed steganographic scheme can asymptotically achieve perfectly security with sufficient elements using arithmetic coding.

Iv An example on Generative steganography

In this section, we will present steganographic scheme on text-to-speech system based on WaveNet under proposed framework, for WaveNet exposing explicit probability distribution of the sample. Moreover, audio owns natural advantage that the semantic meaning of the audio is easily recognized by speech-to-text system, which brings great convenience and reduces the communication cost a lot. In other words, there is no need to send the semantic meaning of the audio every time when you want to send message.

Iv-a Text-to-Speech using WaveNet

Fig. 4: Diagram of text-to-speech using WaveNet. Spectrogram prediction network (SPN) generates the mel-spectrogram and WaveNet synthesizes time-domain waveforms from those spectrograms.

Fig. 4 shows a text-to-speech diagram, where the text is first transferred into mel-scale spectrograms using spectrogram prediction network (SPN), followed by a modified WaveNet vocoder acting as a vocoder to synthesize time-domain waveforms from those spectrograms[23]. Under the diagram, some generative models achieve comparable mean opinion score (MOS) to professionally recorded speech. Here, the SPN is regarded as a certain tool, and we focus on the WaveNet vocoder, where the steganographic process could engage in.

WaveNet, an audio generative model operating directly on the raw audio waveform based on the sequential generation architecture, which could synthesize speech samples with natural segmental quality. The joint probability of a waveform is factorized as a product of conditional probabilities as follows:

(30)

Each audio sample is therefore conditioned on the samples at all previous timesteps.

As for text-to-speech system, conditional WaveNet is to the choice. Given an additional input , such as text, WaveNet can model the conditional distribution of the audio given this input. Eq. (30) now becomes

(31)

The process of audio generation of each sample can be divided into two steps: Firstly, the neural network predict the distribution of the current sample

given all previous samples. Secondly, the sampling algorithm will select a value randomly according to the .

Iv-B Perfectly secure Steganography using WaveNet

Fig. 5: The diagram of generative steganography using WaveNet and Arithmetic Coding. Given input text, the SPN will transfer it to the mel-spectrogram. For ordinary audio waveform generation, feed the WaveNet the mel-spectrogram and then the network will produce an audio. As for steganography, the arithmetic decoding is integrated with the WaveNet for message embedding (AAD-WaveNet). The generated stego audio can be recognized into the same text by human or speech-to-text tool like Deepspeech, and the mel-spectrogram can be obtained as well. The arithmetic encoding is integrated with the WaveNet for message extraction (AAE-WaveNet).

Following the framework designed in Section III, we sketch perfectly secure steganography scheme based on the text-to-speech system including WaveNet. The whole diagram of the concealment system is shown in Fig. 5. The preparation of the system is that the sender should pretrain the text-to-speech model including SPN and wavenet vocoder and then send the model to the receiver, or he can also use public models and just tell the receiver where to download. A speech-to-text tool is also needed for reducing the communication cost, and both sides better use the same tool so that the sender can ensure the receiver will obtain the same text. The message embedding process is integrated with the waveform generation in wavenet vocoder.

At the sender-end, given the input text, the mel-spectrogram can be predicted by the SPN. As for ordinary waveform generation, the sample is randomly chosen according to the distribution , where the distribution is yielded by the WaveNet. In detail, the probability of the first sample is produced by the WaveNet with the mel-spectrogram and a seed. As for ordinary waveform generation, the value of the first sample is randomly chosen. Then the probability distribution of is predicted by the network as , similar process will be repeated until all samples are generated.

When it comes to steganography, we need to decompress the encrypted message to the stego sample using Arithmetic Coding. First of all, the message bits is transferred into fraction according to Eq. (15), and the initial interval is divided into subintervals whose length are proportional to the probability distribution of the first sample (same as ) in the predefined order. The predefined order here is the possible values of audio sample in ascending order, such as -128 to 127 for 8 bit audio. Afterwards, the symbol corresponding to the subinterval in which lies is chosen as . Then the probability will be updated as . Analogously, the process will repeated until all samples are generated. To point out, the message length should be shorter than the entropy of . The whole process of message embedding using Algorithm 1 named AAD-WaveNet, labeled in Fig. 5.

After accomplishing the generation of stego audio, it will be sent to the receiver. Utilizing speech-to-text tool such as Deepspeech[24], the audio is recognized into text. Since owning the same model, and message length , the probability distribution of every step will be identical to that generated in the sender-end. As a result, the message can be extracted correctly using Algorithm 2. The mentioned process of message extraction using arithmetic encoding is named AAE-WaveNet.


Fig. 6: The text is “This is really awesome”. (a) is the mel-spectrogram of the text generated by Tacotron-2. (b)(c) are the normal audios through random sampling and (d) is the stego audio generated by proposed steganographic algorithm. It can be seen the differences among three audios are large, meaning that steganalyzers cannot distinct them intuitively.

V Experiments

In this section, experimental results and analysis are presented to demonstrate the feasibility and effectiveness of the proposed schemes.

V-a Setups

V-A1 Cover Source

In this paper, we randomly collect 1,000 short text sentences and transfer it into mel-spectrogram using the SPN111The architecture of spectrogram prediction network can be downloaded at https://github.com/Rayhane-mamah/Tacotron-2. in Tacotron-2[23]. Then WaveNet vocoder is used for audio waveform generation222The architecture of wavenet vocoder can be downloaded at https://github.com/r9y9/wavenet_vocoder. The wavenet vocoder is trained on CMU ARCTIC dataset[25] with 100,000 steps. All the audio clips are stored in the uncompressed WAV format. The audio length ranges from 0.5s to 3s, and the sample rate is 16kHz.

As for embedding message, the process of random sampling will be replaced with adaptive arithmetic decoding (AAD-WaveNet). Then original WaveNet vocoder and AAD-WaveNet steganographic scheme are utilized to generate cover audio and stego audio, respectively. Deepspeech333Deepspeech can be downloaded at https://github.com/mozilla/DeepSpeech. is selected for transferring audio to text.

V-A2 Steganalysis Features

Steganalysis are implemented to verify the security of generative audio steganography. There are several mainstream universal feature sets for audio steganalysis, such as 2D-Markov[26], D-MC[27], Time-Markov, Mel-Frequency[28]. Here, D-MC and the combined version of Time-Markov and Mel-Frequency (abbreviated as CTM) are selected as the steganalysis feature for their state-of-the-art performance.

V-A3 Classifier

The detectors are trained as binary classifiers implemented using the FLD ensemble with default settings

[29]. A separate classifier is trained for each embedding algorithm and payloads. The ensemble by default minimizes the total classification error probability under equal priors:

(32)

where and are the false-alarm probability and the missed-detection probability respectively. The ultimate security is qualified by average error rate averaged over ten 500/500 database splits, and larger means stronger security.

V-B Visualization

Fig .6 presents the process of textmel spectrogramaudio wave. The text “This is really awesome” is converted into mel-spectrogram and then the audio wave is generated using wavenet vocoder. Fig. 6(b)(c) are the normal audios through random sampling and (d) is the stego audio generated by proposed steganographic algorithm. It can be seen that the differences among normal audio and stego audio are large, which is distinct to steganography by modification. However, the differences between two normal audios like Fig. 6(a) and (b) are large as well, meaning that steganalyzers cannot distinct normal audio and stego audio intuitively.

(a) short cover,
(b) stego 0.1bps,
(c) stego 0.4bps,
(d) long cover,
(e) stego 0.1 bps,
(f) stego 0.4 bps,
Fig. 7: The histogram of normal audio and stego audios, and the red curve is the preset probability distribution. The first row is of short carriers (1000 samples) and the second row represents long carriers (10000 samples). It can be observed that the long carrier fits better than the short carrier, and the distributions of cover, small payload stego and large payload stego are very close and not easily distinguishable.

V-C Encoding performance

In order to present the performance of embedding algorithm in respect of distribution discrepancy, we realize embedding algorithm on cover source obeying Gaussian distribution. Gaussian functions are often used to represent the probability density function of a normally distributed random variable

with expected value

and variance

:

(33)

Here, cover values range from 0 to 255, and the parameters of are set as follows: and . The cover is randomly sampled according to the probability distribution, and the stegos with different payloads are generated by AAD. We embed the random message with 0.1 bps (bit per sample) and 0.4 bps, less than the information entropy bound (4.414 bps).

The histograms of cover and stego are shown in Fig. 7, and the red solid line is the curve of preset Gaussian distribution. The first row of Fig. 7 is of short carrier (1,000 samples) and the second row represents long carrier (10,000 samples). We can observed that the long carrier fits better than the short carrier, and the distributions of cover, small payload stego and large payload stego are very close and not easily distinguishable.

We have also calculated the distribution distance of generated samples. Because the KL divergence will produce unstable calculations (division by zero), Bhattacharyya distance[30] is adopted. The Bhattacharyya distance between two classes under the normal distribution can be calculated as follows[31]:

(34)

where is the Bhattacharyya distance between and distributions or classes, is the variance of the -th distribution, is the mean of the -th distribution and are two different distributions. The distance between the preset Gaussian distribution and generated distribution is calculated. As for calculation, -th distribution is the preset Gaussian distribution where and -th distribution is the generated distribution. The results are shown in Table I. The distance of normal sample and stego samples has no straight correlation, meaning that they are indistinguishable. By increasing the length of the sequence, the distribution distance becomes smaller, which reveals the generated sample using the proposed steganographic scheme is asymptotically incidental to the target sample, which implies the proposed scheme can asymptotically achieve perfectly security with sufficient cover elements. Furthermore, the asymptotically perfect security of the proposed scheme will be verified precisely by strong steganalytic methods in the next subsection.

Cover Stego(0.1bps) Stego(0.4bps)
Short sample 5.85 3.27 5.07
Long sample 3.15 7.72 5.87
TABLE I: The Bhattacharyya distance between preset Gaussian distribution and generated distribution

V-D Security performance

The state-of-the-art steganalytic methods are utilized to verify the security of proposed scheme. Since the steganalysis performance is diverse and related to cover source, we realize other steganographic methods to show the effectiveness of the selected steganalysis features. LSB matching[32] and AACbased[33] algorithms are chosen, where the former is the conventional and the latter is content-adaptive. LSB matching means that if the message bit does not match the LSB of the cover element, then one is randomly either added or subtracted from the value of the cover pixel element. Otherwise, there is no need for modification. AACbased algorithm is simulated at its payload-distortion bound. The distortion of AACbased is defined as the reciprocal of the difference between the original audio and the reconstructed audio through compression and decompression by advanced audio coding.


Fig. 8: The average detection error rate as a function of payload in bits per sample (bps) for steganographic algorithm payloads ranging from 0.1-0.5 bps against CTM.

Fig. 9: The average detection error rate as a function of payload in bits per sample (bps) for steganographic algorithm payloads ranging from 0.1-0.5 bps against D-MC.

Fig.8 and Fig.9 show the average detection error rate as a function of payload in bps for steganographic algorithm payloads ranging from 0.1-0.5 bps against CTM and D-MC. It can be observed in Fig. 8 and Fig. 9 that the of AACbased decreases with the increment of payload and turns to be nearly 0%, and that of LSB matching is always nearly 0%, showing that the steganalysis is effective with respect to the generated audio. Accordingly, of proposed scheme is nearly 50%, which means the proposed scheme is nearly perfectly secure. In other words, the strong steganalyzer judges the stego nearly by random guess. The experimental results verify the security performance as proved in Section

Vi Conclusions

Based on the relationship of source coding and generating discrete distribution from fair coins and the explicit probability distribution yielded by generative model, perfectly secure steganography on generative media is proposed. Instead of random sampling from the cumulative distribution function as ordinary generative models do, we combine the source decoding into the process of generation, which can implement the sampling according to the probability distribution as well as embed the encrypted message.

Arithmetic Coding is selected as the source coding method in the proposed framework, which is proved that it can asymptotically achieve perfect security. Take text-to-speech system as an instance, message embedding and extraction using the proposed scheme are illustrated in detail. Distribution distance and steganalysis are utilized to assess the performance of AAD and steganalytic performance. The results show that the proposed steganographic method can achieve asymptotically perfectly secure.

In our future work, we will explore other effective source encoding schemes and try to transfer them to generative steganographic encoding. Furthermore, other generative media, such as text and video, will be utilized under the proposed framework.

Acknowledgment

The authors would like to thank Prof. Weiqi Luo from Sun Yat-sen University for providing us the source codes of audio steganalysis. The authors also would like to thank Ryuichi Yamamoto for his valuable advice.

References

  • [1] C. E. Shannon, “Communication theory of secrecy systems,” Bell system technical journal, vol. 28, no. 4, pp. 656–715, 1949.
  • [2] C. Cachin, “An information-theoretic model for steganography,” in International Workshop on Information Hiding.   Springer, 1998, pp. 306–318.
  • [3] P. Sallee, “Model-based steganography,” in International workshop on digital watermarking.   Springer, 2003, pp. 154–167.
  • [4] J. Fridrich and J. Kodovskỳ, “Multivariate gaussian model for designing additive distortion for steganography,” in 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.   IEEE, 2013, pp. 2949–2953.
  • [5] V. Sedighi, J. Fridrich, and R. Cogranne, “Content-adaptive pentary steganography using the multivariate generalized gaussian cover model,” in Media Watermarking, Security, and Forensics 2015, vol. 9409.   International Society for Optics and Photonics, 2015, p. 94090H.
  • [6] V. Sedighi, R. Cogranne, and J. Fridrich, “Content-adaptive steganography by minimizing statistical detectability,” IEEE Transactions on Information Forensics and Security, vol. 11, no. 2, pp. 221–234, 2016.
  • [7] T. Filler, J. Judas, and J. Fridrich, “Minimizing additive distortion in steganography using syndrome-trellis codes,” IEEE Transactions on Information Forensics and Security, vol. 6, no. 3, pp. 920–935, 2011.
  • [8] B. Li, M. Wang, J. Huang, and X. Li, “A new cost function for spatial image steganography,” in 2014 IEEE International Conference on Image Processing (ICIP).   IEEE, 2014, pp. 4206–4210.
  • [9] K. Chen, H. Zhou, W. Zhou, W. Zhang, and N. Yu, “Defining cost functions for adaptive jpeg steganography at the microscale,” IEEE Transactions on Information Forensics and Security, pp. 1–1, 2018.
  • [10] Y. Yao, W. Zhang, N. Yu, and X. Zhao, “Defining embedding distortion for motion vector-based video steganography,” Multimedia tools and Applications, vol. 74, no. 24, pp. 11 163–11 186, 2015.
  • [11] N. J. Hopper, J. Langford, and L. Von Ahn, “Provably secure steganography,” in Annual International Cryptology Conference.   Springer, 2002, pp. 77–92.
  • [12] A. Lysyanskaya and M. Meyerovich, “Provably secure steganography with imperfect sampling,” in International Workshop on Public Key Cryptography.   Springer, 2006, pp. 123–139.
  • [13] N. Hopper, L. von Ahn, and J. Langford, “Provably secure steganography,” IEEE Transactions on Computers, vol. 58, no. 5, pp. 662–676, 2009.
  • [14] G. J. Simmons, “The prisoners’ problem and the subliminal channel,” in Advances in Cryptology.   Springer, 1984, pp. 51–67.
  • [15] H. Larochelle and I. Murray, “The neural autoregressive distribution estimator,” in Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics, 2011, pp. 29–37.
  • [16] M. Germain, K. Gregor, I. Murray, and H. Larochelle, “Made: Masked autoencoder for distribution estimation,” in

    International Conference on Machine Learning

    , 2015, pp. 881–889.
  • [17] A. v. d. Oord, N. Kalchbrenner, and K. Kavukcuoglu, “Pixel recurrent neural networks,” arXiv preprint arXiv:1601.06759, 2016.
  • [18] D. P. Kingma and P. Dhariwal, “Glow: Generative flow with invertible 1x1 convolutions,” arXiv preprint arXiv:1807.03039, 2018.
  • [19] I. Goodfellow, “Nips 2016 tutorial: Generative adversarial networks,” arXiv preprint arXiv:1701.00160, 2016.
  • [20] T. M. Cover and J. A. Thomas, Elements of information theory.   John Wiley & Sons, 2012.
  • [21] K. Sayood, Introduction to data compression.   Morgan Kaufmann, 2017.
  • [22] A. Said, “Introduction to arithmetic coding-theory and practice,” Hewlett Packard Laboratories Report, pp. 1057–7149, 2004.
  • [23] J. Shen, R. Pang, R. J. Weiss, M. Schuster, N. Jaitly, Z. Yang, Z. Chen, Y. Zhang, Y. Wang, R. Skerry-Ryan et al., “Natural tts synthesis by conditioning wavenet on mel spectrogram predictions,” arXiv preprint arXiv:1712.05884, 2017.
  • [24] A. Hannun, C. Case, J. Casper, B. Catanzaro, G. Diamos, E. Elsen, R. Prenger, S. Satheesh, S. Sengupta, A. Coates et al., “Deep speech: Scaling up end-to-end speech recognition,” arXiv preprint arXiv:1412.5567, 2014.
  • [25] J. Kominek and A. W. Black, “The cmu arctic speech databases,” in Fifth ISCA workshop on speech synthesis, 2004.
  • [26] Q. Liu, A. H. Sung, and M. Qiao, “Temporal derivative-based spectrum and mel-cepstrum audio steganalysis,” IEEE Transactions on Information Forensics and Security, vol. 4, no. 3, pp. 359–368, 2009.
  • [27] Q. Liu, A. H. Sung, and M. Qiao, “Derivative-based audio steganalysis,” ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM), vol. 7, no. 3, p. 18, 2011.
  • [28] W. Luo, H. Li, Q. Yan, R. Yang, and J. Huang, “Improved audio steganalytic feature and its applications in audio forensics,” ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM), vol. 14, no. 2, p. 43, 2018.
  • [29] J. Kodovskỳ, J. J. Fridrich, and V. Holub, “Ensemble classifiers for steganalysis of digital media.” IEEE Trans. Information Forensics and Security, vol. 7, no. 2, pp. 432–444, 2012.
  • [30] A. Bhattacharyya, “On a measure of divergence between two statistical populations defined by their probability distributions,” Bull. Calcutta Math. Soc., vol. 35, pp. 99–109, 1943.
  • [31] G. B. Coleman and H. C. Andrews, “Image segmentation by clustering,” Proceedings of the IEEE, vol. 67, no. 5, pp. 773–785, 1979.
  • [32] J. Mielikainen, “Lsb matching revisited,” IEEE signal processing letters, vol. 13, no. 5, pp. 285–287, 2006.
  • [33] W. Luo, Y. Zhang, and H. Li, “Adaptive audio steganography based on advanced audio coding and syndrome-trellis coding,” in International Workshop on Digital Watermarking.   Springer, 2017, pp. 177–186.