Steganography is a research field with a long history of development. It mainly studies how to embed important information into common information carriers, hide the existence of it and thus to protect its security. Steganography has a very wide range of applications. In addition to providing military intelligence support, it can be used to protect the privacy of users in the communication process in daily life , as well as copyright protection for digital media [23, 29, 38]. Due to its extensive use, steganography has attracted wide attention of researchers in recent years.
The task of information hiding can be modeled as a non-cooperative game under the “Prisoners’ Problem” [26, 21], which can be briefly described as follows. Alice and Bob need to pass some secret information. However, all of their communication content must be reviewed by Eve. Once Eve detects that the transmitted information carrier contains secret information, the entire communication task fails. Basically, any information carrier that is not compressed to the Shannon limit has a certain degree of information redundancy, so they all can be used to embed additional information inside, such as digital medias like image, audio, text and video. Among them, text might have a higher information coding degree, which makes texts more difficult to embed additional information inside. However, text is one of the most commonly used communication carriers in people’s daily life. Therefore, studying how to effectively embed hidden information in texts has attracted a lot of researchers’ interest[30, 3, 20, 7, 18, 32, 36, 10, 6, 39]. Some early text steganography techniques were mainly implemented by making minor modifications to the text, such as adjusting word spacing, or by synonymizing specific words and phrases. These technologies are good attempts, but usually they can only have a very low embedding rate, so it is difficult for them to be practical. Therefore, later researchers began to try to use the text automatic generation model to achieve information embedding[18, 32, 36, 10, 6, 39].
The challenges for carrier generation steganography are obvious. Since the carrier will not be given in advance, the first challenge is: How to generate a semantically complete and natural enough steganographic sample? In order to solve this challenge, previous researchers have done a lot of works [18, 32, 36, 10, 6, 39]. From the first generation of steganography based on specific syntax structure 
, to the later generation of steganography based on Markov model[20, 25, 35]
, and the recent emergence of a method of steganography based on neural network[32, 36, 10, 6, 39]
, researchers have gradually been able to generate high-quality steganographic sentences. The current solution is mainly based on the following technical framework: Alice first uses a well-designed model (such as recurrent neural network) to learn a statistical language model of a large number of natural texts. Then Alice tries to encoded the conditional probability distribution of each word in the generation precess (Huffman coding  or arithmetic coding ), so as to embed specific information into the generated natural sentences. This kind of methods seem to be able to solve the first challenge mentioned above, that is, generating natural enough steganographic texts. However, we found that there are still some defects, which may bring potential security risks.
Firstly, this kind of technical framework embeds information by encoding the conditional probability space of each word in the generated text. Therefore, with the increase of embedding rate, the model will be more and more likely to choose words with lower conditional probability, thus generating low-quality or even grammatical error steganographic sentences. Can we break this steganographic framework and select words with high conditional probability at any embedding rate, so as to ensure that the quality of the generated steganography text is always high enough? Secondly, the previous text generation steganographic methods usually can’t control the semantic expression of the generated steganographic sentences. This may bring huge security risks. Z. Yang et al.  have recently proposed a new security framework for covert communication systems. In this new framework, they find that if Alice does not control the semantics of the generated carrier and continues to generate a steganographic carrier with random semantics, even if the quality of each sentence is good enough, it may still bring other security risks of Alice and Bob’s covert communication.
Trying to solve the two challenges of current text steganographic methods are the motivation of this paper. We abandon the current text steganography framework of “language model + conditional probability coding” and try to use Knowledge Graph (KG) to guide the generation of steganographic sentences. On the one hand, we hide the secret information by coding the path in the knowledge graph, but not the conditional probability of each generated word, which is quitely different from previous text generation steganographic methods. On the other hand, we can use the knowledge Graph to guide the semantic expression of the generated steganographic text, so as to realize the generation of semantic controllable steganographic text to a certain extent. The scheme we proposed in this paper is probably not the most ideal solution to the above two challenges, but we hope to guide the follow-up researchers to create more diverse text generative steganography schemes, and to achieve semantic controllable text generative steganography as much as possible.
Ii Related Works
Linguistic steganography based on automatic sentence generation technology has attracted a lot of researchers’ attention for a long time due to its wide application prospects. In the early age, researchers could only generate sentences without semantic information and grammatical rules . Later, some researchers tried to introduce syntactic rules to constrain the generated texts 
, but the steganographic sentences generated by these methods were simple and could be easily recognized. After that, researchers tried to combine some natural language processing technologies to generate steganographic sentences[20, 7, 18, 25, 35].
At present, most of the steganographic text automatic generation models are under the following framework: using a well-designed model to learn the statistical language model from a large number of normal sentences, and then implementing secret information hiding by encoding the conditional probability distribution of each word in the text generation process[10, 35, 32, 36, 39, 6]. In this framework, the early works mainly use Markov model to approximate the language model and calculate the conditional probability distribution of each word [25, 35]. However, due to the limitations of Markov model itself , the quality of the text generated by Markov model is still not good enough, which makes it easy to be recognized. In recent years, with the development of natural language processing technology, more and more steganographic text generation models based on neural network models have emerged[10, 32, 6, 39, 36]. T. Fang et al. first divide the dictionary and fixedly encode each word, and then use the recurent neural network (RNN) to learn the statistical language model of natural text. Finally, in the automatic text generation process, different words are selected as output at each step according to the information that need to be hidden. Z. Yang et al.  also use a RNN to learn the statistical language model of a large number of normal sentences. Then they use a full binary tree (FLC) and a Huffman tree (VLC) to encode the conditional probability distribution of each word, and output corresponding words according to the information needs to be hidden, so as to realize the embedding of hidden information in the sentence generation precess. After that, Dai et al.  and Ziegler et al.  further improve the statistical language model and the coding method of conditional probability distribution, which can further optimize the conditional probability distribution of each word in the generated steganographic sentences.
These neural network based text automatic generation methods can better fit the statistical language model of normal sentences than Markov models, so that the quality of generated steganographic texts have been significantly improved [10, 32, 36, 39, 6]. However, it still seems not enough. Z. Yang et al. recently proved through experiments that, due to these existing models cannot control the semantics of the generated steganographic sentences, even if the quality of the generated text is good enough, it will still bring potential security risks. Exploring the technology of automatic generation of steganographic text with controllable semantics is a challenge that needs to be solved.
Automaticlly generating specific semantic text has been a very important research topic in the field of natural language processing for a long time. It is directly related to many valuable research topics, such as automatic image captioning, automatic translation , automatic dialogue generation 
, etc. Most of these models follow the unified technical framework, which is called Encoder-Decoder framework. They use a specific encoder to encode the semantic information that needs to be expressed (like an image in the image captioning task) into a semantic vector, and then send it to the decoder, and the decoder then generates natural text containing this specific semantic. However, in order to ensure the universality of steganography algorithm, we usually assume that the embedded secret information is arbitrary (without specific semantics). Under this premise, if we further require the generated steganographic sentence to contain specific semantics, it is thus extremely challenging.
The existing text generation steganography model framework mainly implements the secret information hiding through coding the conditional probability of each word during the generation process. This technical framework can hardly control the semantic of the generated steganographic sentence. Therefore, in this paper, we consider information hiding on the encoder side without altering the decoder side, so as to control the semantic expression of the generated steganographic sentences to a certain extent.
Iii The Proposed Method
Iii-a Notations and Problem Statement
In this paper, we adopt the following notation conventions. Random variables will be denoted by captial letters (e.g.,), and their individual values will be denoted by the respective lower case letters (e.g., ). The domains over which random variables are denoted by script letters (e.g., ) and the number of all possible values in will be denoted as . A graph contains vertices which represent concepts, and edges which represent the relations between vertices they connnect. We define a graph with vertices as follows:
where and denotes the vertices set and edges set, respectively. We define the set of all the edges starting with as , and the set of all the edges ending with as , that is:
We use and to represent the number of edges with node as the starting point and the ending point, respectively. We express the set of paths from node to node as , and the number of possible paths as .
In this paper, the proposed Graph-Stega model also uses Encoder-Decoder framework. We try to introduce a knowledge graph at the Encoder side, and then reasonably encode the paths between the nodes in the graph. Then according to the secret information that needs to be embedded, we extract its corresponding path and construct a subgraph, then use a graph embedding network to extract its semantics, and finally send it to the Decoder to generate steganographic text. The overall framework is shown in Figure 1.
Suppose, there has a secret message set , a secret key set and a graph space . Therefore, in fact, we want to complete the mapping process from the secret information space to the graph space and then to the text space , that is:
Iii-B Path Coding in Knowledge Graph
The core idea of the proposed method is that, for a given knowledge graph, there are multiple connected paths between any two nodes (e.g., for and ), each path constitutes a subgraph. Therefore, we can first code these subgraphs reasonably so that different subgraphs (paths) represent different secret information. According to the secret information that needs to be embedded, we then extract the corresponding subgraph from the graph space. Next, we use a specific algorithm to extract the semantic information contained in the subgraph and embedded into a semantic vector . Finelly, we use a decoder to generate corresponding steganographic text based on this inputted semantic vector .
For any node, such as , there are edges starting from it, and each edge represents a possible semantic trend. We can convert the encoding of the subgraph to the encoding of the edge set from each node. In this paper, we use Huffman tree to encode the edge set from each node, and the weight is the frequency of each edge appearing in the whole corresponding data set. This enables us to code the semantic trend of each node. The next question is, how do we choose the starting and ending nodes of the path? Obviously this cannot be chosen randomly, otherwise it will increase the difficulty of the decoder to generate natural text.
Here, we consider the fact that there always be a hierarchical structure in a knowledge graph. Some nodes in the knowledge graph represent an objective entity, such as a car, while other nodes may represent the first-level attributes corresponding to a specific entity, such as an engine, and some nodes represent their corresponding second-level attributes, such as fuel consumption. Here, we divide the nodes in the knowledge graph into different sets according to the semantic level. Assuming that the entire knowledge graph contains semantic levels, we use to represent the set of nodes in -th semantic level, and the number of nodes in -th semantic level is denoted as . Then, from the nodes in to the nodes in
constitutes a semantic Markov chain, that is:
Therefore, we can specify that during the path encoding process, for each edge, the semantic level of the ending node will not be higher than or equal to the semantic level of the starting node. Under this restriction, each path will correspond to a relatively clear and complete semantic information. For example, for a path like “automobile engine
fuel consumption”, the semantic scope of this path is basically clear, but at the same time, the corresponding natural text still has a certain degree of freedom. For example, the corresponding text can be “fuel consumption is a normal phenol with internal combination engines in cars”, or just “I don’t like this car very much because the fuel consumes of the engine is too much”.
For Alice, if she wants to control the semantics of the generated steganographic sentence, she can only code other nodes and paths by fixing specific words in the graph. For example, if she wants to generate steganographic sentence which describes the engine, she can fix the “engine” node. Or if Alice wants to generate sentence with positive sentiment, she can fix the “good” node. In this way, Alice can control the semantic of generated steganographic sentences to a certain extent by sacrificing a little embedding rate (reducing the freedom of some nodes).
Iii-C Subgraph Embedding
After extracting subgraphs from the knowledge graph based on the secret information, we need to further extract their semantics and convert the subgraphs into corresponding semantic vectors, and then send them into the decoder to generate natural texts. In this paper, we mainly refer to the model proposed in  for semantic extraction of subgraphs.
Firstly, we can express each edge, like , in the graph as a triple , where represents the starting node, represents the ending node, and represents the connection relationship of them. Secondly, we can map the words represented by each node and each edge into word vector form , such as using represent , and represent . Then, we use a recurrent neural network with GRUs to update the vectorized semantic representation of nodes and edges according to the information flow in the subgraph. The specific update strategy for and is as follows:
Where , , indicate the input gate, the forget gate and the output gate at -th step, respectively. are the weights in them and are the bias.
Considering the contribution of each node and edge to the semantic representation of the whole graph may be different. Therefore, in order to form a complete semantic representation of the input subgraph, we use attention mechanism  to fuse the weighted information of the last iteration node vectors and semantic vectors, that is:
Here, represents a fully connected neural network layer. Finally, the semantic information of the whole graph is contained in the semantic vector .
Iii-D Sentence Generation
For a text set with a dictionary , each sentence with length is sampled out from the space . However, most combinations do not contain complete semantic information. In order to obtain a semantically complete word sequence, the most common approach is based on statistical language model . Statistical Language Model (LM) first learns conditional distribution probability of each word in normal sentences by training on a large normal sentences set, that is:
where denotes the whole sentence and denotes the -th word in it. The task of decoder is to find a suitable word sequence with complete semantics and correct syntax among possible combinations according to the semantic vector . In this work, we use recurrent neural network with LSTM units  as the decoder. Its mathematical description is much similar to formula (5), and for simplicity, we denote the transfer function of LSTM units by . RNN can learn the statistical language model from a large number of normal texts, and then calculate the conditional probability distribution of the next word according to the previous generated words, and finally it can generate sentences that conform to such statistical language model.
For example, supporse currently we have alread generate words and given semantic vector , then the model will calculate the probability distribution of the -th word:
The previous steganographic text generative model mainly encodes this conditional probability distribution of each word and to embed the secret information [32, 36, 39, 6]. But as we mentioned before, their common problem is that, with the increase of embedding rate, the model will gradually select words with lower conditional probability, thus reducing the quality of generated text. The proposed model mainly conduct steganography in the knowledge graph at the encoder side, so we do not need to modify the conditional probability of each word in the decoder, and we can choose the word with the highest probability as the current output every time, so as to ensure the quality of the generated text.
To ensure that Bob can decode successfully, we need to ensure that the generated text contains the nodes in the subgraph. To solve this challenge, we refer to [13, 12] and introduce the copy mechanism into the text generation process. This mechanism calculates the final vocabulary distribution from two parts:
Where is a switch for controlling generating a word from the vocabulary or directly copying it from the input graph. By introducing this mechanism, we find that most of the generated text can contain the words of the nodes in the input subgraph. In fact, in our task of covert communication, even if the generated text does not contain specific words, Alice can regenerate it again until the sentence contains the required words (the probability of this situation is less than 5% in our experiments).
When Bob receives the sentences transmitted from Alice, the corresponding path of each sentence in the knowledge map is unique, so it can ensure that Bob can accurately extract the hidden information.
Iv Model Analysis and Experiments
Iv-a Dataset and Model Training
Before using the proposed method for covert communication, Alice and Bob are required to select a publicly available knowledge map in advance. In this paper, we use the automobile review dataset and corresponding knowledge graph constructed in  to verify and test the proposed steganography algorithm. This knowledge graph is stored in the form of triples, that is, there are three semantic levels of nodes: entity attribute sentiment. The whole knowledge graph contains 36,373 triples, corresponding to more than 100,000 natural sentences.
We train the proposed model using maximum likelihood with a regularization term on the attention weights by minimizing a loss function over training set. The loss function is a negative log probability of the ground truth words, that is:
Where is the ground truth word and
is a balancing factor between the cross entropy loss and a penalty on the attention weights. We use stochastic gradient descent with momentum 0.9 to train the parameters of our netwok.
Iv-B Hidden Capicity and Semantic Correlation
The existing steganography methods based on conditional probability coding mainly control the embedding rate by adjusting the size of candidate pool of each word. The larger the candidate pool is, the higher the embedding rate is, but at the same time, the more likely to select words with lower conditional probability,. Therefore, this kind of method will form a mutual restriction relationship between the information embedding rate and the quality of the generated text. In the proposed model, we encode the path and embed the secret information in the knowledge graph, and transform the constraint into the constraint between the information embedding rate and the semantic controllability, so that the embedding rate will not affect the text quality.
The calculation method of embedding rate is to divide the actual number of embedded bits by the number of bits occupied by the entire generated text in the computer. For the proposed method, the information embedding rate can be expressed as follows:
where is the number of generated sentences and is the length of -th sentence. The denominator indicates the number of bits occupied by the -th sentence in the computer. Since each English letter actually occupies one byte in the computer, i.e. 8 bits, the number of bits occupied by each English sentence is , where represents the number of letters contained in the -th word of the -th sentence. and represent the average length of each sentence in the generated text and the average number of letters contained in each word. represents the length of the semantic chain (formula (4)) corresponding to the -th generated text, and represents the number of edges starting from the -th node. Alice can select some nodes in the fixed semantic chain to control the semantics of the generated hidden text, but it will lose some of the final information embedding rate.
During the experiment, we adjusted the embedding rate by fixing one node, two nodes, and non-fixed nodes in the triplet to generate steganographic sentences with different embedding rates. Then, we further tested the semantic association between the generated steganographic sentences and the input subgraph. For each input subgraph, we derive its corresponding text from the dataset as our standard reference. Then by comparing the sentence we generated, we calculated several standard metrics used in automatic translation tasks: BLEU , METEOR , CIDEr  and ROUGH-L  (the higher the better). Under different embedding rates, the test results of semantic relevance between the generated steganography and the input subgraph are shown in Table 1.
According to the test results, we found that the generated steganography text can obey the input semantic information to a certain extent. And under different embedding rates, its semantic relevance will not change much.
Iv-C Quality Evaluation
Furthermore, we want to know whether the quality of the steganographic sentences generated by the proposed method are reliable, and also, whether it will decline significantly with the increase of embedding rate. In the field of Natural Language Processing, is a standard metric for sentence quality testing [10, 32]. It is defined as the average per-word log-probability on the test texts:
where is the generated sentence, is the number of words in the generated sentence . Usually, the smaller the value of , the better the language model of the sentences, which indicates better quality the generated sentences. We tested the mean values of the steganographic sentences generated under different embedding rates and the training sentences in the data set. The results are shown in Table 2.
From the results in Table 2, we can see that the of generated steganographic sentences is close to that of training sentences, and will not decline significantly with the increase of embedding rate. This further proves that our steganography method has more advantages than the current text generative steganographic methods.
In addition, we also tested the anti detection ability of our generated steganographic sentences. We tried to use the text steganalysis algorithm proposed in  to detect the generated steganography from normal sentences. When the , the detection results have been shown in Table 3. From the test results, we found that our model has a certain anti detection ability.
In this paper, we proposed a new text generation based steganographic method. The proposed method abandon the current text steganography framework of “language model + conditional probability coding”, and try to use Knowledge Graph (KG) to guide the generation of steganographic sentences. The experimental results show that the proposed model is effective and has many outstanding characteristics that the current text generation steganography algorithm does not have. We hope that this paper will serve as a reference guide for the researchers to facilitate the design and implementation of better text steganography.
-  (2014) Neural machine translation by jointly learning to align and translate. Computer Science. Cited by: §II, §III-C.
A neural probabilistic language model.
Journal of Machine Learning Research3 (6), pp. 1137–1155. Cited by: §III-D.
-  (1997) Hiding the hidden: a software system for concealing ciphertext as innocuous text. In International Conference on Information and Communications Security, pp. 335–345. Cited by: §I, §I, §II.
Knowledge-enhanced neural networks for sentiment analysis of chinese reviews. Neurocomputing 368, pp. 51–58. Cited by: §IV-A.
-  (1998) Electronic document data hiding technique using inter-character space. In Circuits and Systems, 1998. IEEE APCCAS 1998. The 1998 IEEE Asia-Pacific Conference on, pp. 419–422. Cited by: §I.
-  (2019) Towards near-imperceptible steganographic text. arXiv preprint arXiv:1907.06679. Cited by: §I, §I, §II, §II, §III-D.
-  (2010) Text steganography system using markov chain source model and des algorithm.. JSW 5 (7), pp. 785–792. Cited by: §I, §II.
-  (2019) Enabling covert body area network using electro-quasistatic human body communication. Scientific reports 9 (1), pp. 4160. Cited by: §I.
-  (2014) Meteor universal: language specific translation evaluation for any target language. In Proceedings of the ninth workshop on statistical machine translation, pp. 376–380. Cited by: §IV-B.
-  (2017) Generating steganographic text with lstms. arXiv preprint arXiv:1705.10742. Cited by: §I, §I, §II, §II, §IV-C.
-  (2009) Steganography in digital media: principles, algorithms, and applications. Cambridge University Press. Cited by: §I.
-  (2016) Incorporating copying mechanism in sequence-to-sequence learning. arXiv preprint arXiv:1603.06393. Cited by: §III-D.
-  (2016) Pointing the unknown words. arXiv preprint arXiv:1603.08148. Cited by: §III-D.
-  (1997) Long short-term memory. Neural Computation 9 (8), pp. 1735–1780. Cited by: §III-D.
-  (2011) Steganography in inactive frames of voip streams encoded by source codec. IEEE Transactions on information forensics and security 6 (2), pp. 296–306. Cited by: §I.
Deep reinforcement learning for dialogue generation. pp. 1192–1202. Cited by: §II.
-  (2006) An information-theoretic approach to automatic evaluation of summaries. In Proceedings of the main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics, pp. 463–470. Cited by: §IV-B.
-  (2016) Text steganography based on ci-poetry generation using markov chain model. Ksii Transactions on Internet & Information Systems 10 (9), pp. 4568–4584. Cited by: §I, §I, §II.
-  (2011) Quantitative analysis of culture using millions of digitized books. science 331 (6014), pp. 176–182. Cited by: §I.
-  (2014) An approach for text steganography based on markov chains. arXiv preprint arXiv:1409.0915. Cited by: §I, §I, §II.
-  (2003) Information-theoretic analysis of information hiding. IEEE Transactions on information theory 49 (3), pp. 563–593. Cited by: §I.
-  (2002) BLEU: a method for automatic evaluation of machine translation. In Proceedings of the 40th annual meeting on association for computational linguistics, pp. 311–318. Cited by: §IV-B.
-  (2000) Digital music safeguard may need retuning. Science 290 (5493), pp. 917–919. External Links: Cited by: §I.
-  (2012) Data hiding in mpeg video files using multivariate regression and flexible macroblock ordering. IEEE transactions on information forensics and security 7 (2), pp. 455–464. Cited by: §I.
-  (2016) A text steganography method based on markov chains. Automatic Control and Computer Sciences 50 (8), pp. 802–808. Cited by: §I, §II, §II.
-  (1984) The prisoners’ problem and the subliminal channel. Advances in Cryptology Proc Crypto, pp. 51–67. Cited by: §I.
-  (2018) A graph-to-sequence model for amr-to-text generation. arXiv preprint arXiv:1805.02473. Cited by: §III-C.
-  (2015) Cider: consensus-based image description evaluation. In , pp. 4566–4575. Cited by: §IV-B.
-  (2001) Music industry strikes sour note for academics. Science 292 (5518), pp. 826–827. External Links: Cited by: §I.
-  (1992) MIMIC functions. Cryptologia 16 (3), pp. 193–214. Cited by: §I, §II.
-  (2014) Linguistic steganalysis using the features derived from synonym frequency. Multimedia Tools and Applications 71 (3), pp. 1893–1911. Cited by: §I.
-  (2018) RNN-stega: linguistic steganography based on recurrent neural networks. IEEE Transactions on Information Forensics and Security. Cited by: §I, §I, §II, §II, §III-D, §IV-C.
-  (2019) Behavioral security in covert communication systems. arXiv preprint arXiv:1910.09759. Cited by: §I, §II.
-  (2019) A fast and efficient text steganalysis method. IEEE Signal Processing Letters 26 (4), pp. 627–631. Cited by: §IV-C.
-  (2018) Automatically generate steganographic text based on markov model and huffman coding. arXiv preprint arXiv:1811.04720. Cited by: §I, §II, §II.
-  (2018) RITS: real-time interactive text steganography based on automatic dialogue model. In International Conference on Cloud Computing and Security, pp. 253–264. Cited by: §I, §I, §II, §II, §III-D.
-  (2017) Image captioning with object detection and localization. In International Conference on Image and Graphics, pp. 109–118. Cited by: §II.
-  (1996) Watermarking by numbers. Nature 384 (6609), pp. 514. Cited by: §I.
-  (2019) Neural linguistic steganography. arXiv preprint arXiv:1909.01496. Cited by: §I, §I, §II, §II, §III-D.