Ensemble Reversible Data Hiding

01/15/2018 ∙ by Hanzhou Wu, et al. ∙ Southwest Jiaotong University 0

The conventional reversible data hiding (RDH) algorithms often consider the host as a whole to embed a payload. In order to achieve satisfactory rate-distortion performance, the secret bits are embedded into the noise-like component of the host such as prediction errors. From the rate-distortion view, it may be not optimal since the data embedding units use the identical parameters. This motivates us to present a segmented data embedding strategy for RDH in this paper, in which the raw host could be partitioned into multiple sub-hosts such that each one can freely optimize and use the embedding parameters. Moreover, it enables us to apply different RDH algorithms within different sub-hosts, which is defined as ensemble. Notice that, the ensemble defined here is different from that in machine learning. Accordingly, the conventional operation corresponds to a special case of our work. Since it is a general strategy, we combine some state-of-the-art algorithms to construct a new system using the proposed embedding strategy to evaluate the rate-distortion performance. Experimental results have shown that, the ensemble RDH system outperforms the original versions, which has shown the superiority and applicability.



There are no comments yet.


page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

Reversible data hiding (RDH) [1], also called reversible watermarking, enables us to hide a payload into a host by slightly altering the host without introducing noticeable artifacts. And, for a receiver, both the original host content and the embedded information can be fully reconstructed. It is quite desirable in applications that require no degradation of the original content such as remote sensing and military. RDH is fragile, meaning that, when the marked host was manipulated, one will find it is not authentic and the original host may not be fully retrieved.

A number of RDH algorithms [2]-[12] have been reported, most of which use the prediction errors (PEs) of the cover elements to achieve RDH by histogram shifting (HS) [2] or its variants [10]. The prediction procedure enables us to produce a prediction error histogram (PEH). The HS operation allows us to reversibly embed a secret payload into the PEH. The conventional RDH algorithms often consider the host as a whole such that all PEH bins are processed with the identical parameters, which may be not optimal from the rate-distortion optimization view. This has been utilized in Li et al’s work [6], which uses multiple histograms modification (MHM). As mentioned in their work, the data embedding capacity in a single layer is limited. Thus, the distortion may be high when the payload increases for a high layer embedding.

There are two reasonable explanations. First, their work uses a chessboard prediction pattern [10]. It allows us to embed a half payload into the dot set. After that, the cross set can be used for embedding the other half payload. However, after embedding in the dot set, the cross set will use the altered pixels for prediction, which may reduce the prediction accuracy, resulting in degradation of the rate-distortion performance. Second, the two embedding channels use the same prediction and data embedding procedure. Actually, different predictors or data embedding algorithms have different performance on different subhosts. Data embedding in smooth region corresponds to better performance. It is quite desirable to apply such pixel prediction or embedding algorithms that they provide superior performance to a smooth subhost. For a complex subhost, other potential algorithms may be preferred. For example, in an image, different image blocks may have different local characteristics, which allows us to separately embed data into them by different RDH algorithms111Two RDH algorithms could be considered as different as long as the core steps are different, e.g., the predictor is different while the others are the same..

This has motivated us to present an ensemble data embedding strategy for RDH in this paper, in which the raw host is divided into multiple subhosts so that each subhost enables us to apply different RDH algorithms and separately use the optimized parameters. Experiments on public image dataset have demonstrated the superiority and applicability.

The rest of this paper are organized as follows. In Section II, we present the ensemble data embedding strategy, followed by a detailed scheme in Section III. Experiments on public dataset are provided and analyzed in Section IV. Finally, we conclude this paper in Section V.

Ii Ensemble Reversible Embedding

Let and respectively denote the cover image to be embedded and its original version, where , e.g., . if X was never embedded. We expect to embed a message into X to generate such a marked image that the distortion between Y and O, denoted by , is as low as possible. For compactness, we will sometimes say “pixel ” representing a pixel located at position , whose value is . And, X represents a pixel-set containing all the pixels belonging to X.

We collect subhosts of X, i.e., , , …, , where . And, for some , there may exist . Notcie that, one can also set that the subhosts are disjoint in practice. In RDH, we would like to use the noise-like component to carry a payload. For example, in prediction based RDH methods, a predictor is required to predict the cover elements to be embedded, by which the PEs can be obtained. We use

to denote the noise-like vector of

. For compactness, we consider since one can always keep the non-embedded elements unchanged. We are to modify , to hide secret data. To better generalize it, we consider the embedding unit as a vector, rather than a single element (though this assumption will not be mentioned again). Namely, we divide each into disjoint vector-units, each of which is sized as . Thus, we have , and rewrite , where . Notice that, may be further partitioned into sub-components for embedding.

We expect to find suitable RDH algorithms and embedding parameters to hide to separately so that the resulting overall distortion can be as low as possible.

Let and be a set of candidate data embedding algorithms and a set including all possible parameter-sets. We expect to find such and that all can be carried by and the overall distortion is as low as possible. Notice that, will be orderly embedded into with and . Namely, will be embedded into (corresponding to ) using and for . For some , if , an element belonging to both sets may be modified twice. Thus, the data extraction procedure may be performed in an inverse manner.


be the marked host by orderly embedding …, into …, using …, and …, . Here, for all , we have and . There may exist some such that or .

We use to denote all the elements that were used to carry …, , namely, the elements in are belonging to at least one of …, . Let denote the overall distortion between and O. Therefore, our optimization task can be written as:


We here limit ourselves to an additive distortion defined as:


where exposes the cost of changing to . The additive assumption is reasonable since the interaction between two pixels that are far away from each other can be roughly ignored. And, the interaction between adjacent pixels can be captured by well designing . A commonly used additive distortion measure is mean squared error (MSE), i.e.,

For compactness, we here use to represent the distortion between and O. Accordingly,


Since we embed into only when have been previously embedded into respectively, we have


where corresponds to a subset of since there may be for some . We rewrite:


can be determined before we compute . and rely on the data embedding algorithms and parameters. corresponds to a minimized , i.e., .

We write:






where are empirical factors. Actually, for , reflects the interaction between and . From an implementation view, the determination of can be avoided. That is, in practice, one can determine the overall distortion excluding previously and then focus on optimizing the embedding algorithm and parameters for since the subhosts are orderly embedded222It is also due to the “additive” distortion and inclusion-exclusion principle.. Thereafter, the distortion introduced by can be easily determined.

Eqs. (6-8) show the state-transition equations to determine . It essentially breaks the distortion optimization down into a sequence of subproblems. During the distortion optimization process, one can construct and , which are used for data embedding simultaneously. It is seen that, most conventional algorithms use , which is a special case of our model. The time complexity to enumerate all for Eq. (1) is . By applying Eqs. (6-8), it is significantly reduced as

. By using a small number of candidate RDH algorithms and heuristically optimizing the data embedding parameters, the enumeration complexity could further decline significantly.

Fig. 1: The RDH algorithm distribution maps of the proposed ensemble algorithm for the -layer embedding (i.e., the multiple embedding strategy was applied) for each test image: (ad) Airplane, (eh) Lena, (il) Baboon, (mp) Lake, (qt) Boat and (ux) Peppers. Here, we use , and the border area, i.e., , is ignored for better presentation. The red, green, blue, white areas are representing Wu et al’s method, Li et al’s method, Hsu et al’s method and Ni et al’s method, respectively. Moreover, the black blocks indicate that they cannot carry additional data, i.e., no RDH algorithm was applied. The input cover image of a higher layer embedding is a roatated version (a degree of 90) of the marked image obtained from the previous-layer embedding. It means that, the maps shown here have been rotated, e.g., (a) is rotated by a degree of 360 (corresponding to the original image, not that in the previous layer), (b) is rotated by a degree of 450, (c) and (d) for a degree of 540 and 630, respectively. Additionally, since we fix here, the corresponding rate-distortion performance may be not optimal for the above maps.

Iii Detailed Ensemble RDH Scheme

We present a detailed scheme in this section. However, it is always free for us to design any new ensemble scheme. We divide X to two subsets, denoted by and , i.e.,


where is a system parameter and a pixel position is indexed from (1, 1) to . will be used to store auxiliary data such as the secret key. will carry m and other side information such as the location map. We segment into image blocks from up to down and left to right, denoted by , , …, . Here, , and for all , . One may consider that, for all , we have . Thus, we have for all . For simplicity, we assume that for all , where is a positive integer. Therefore, we have .

We expect to find suitable RDH algorithms and parameters to embed into . And, some auxiliary data for data extraction and image recovery are embedded into . Since HS has been a most popular operation in RDH, we will use HS-based RDH algorithms for experiments. In detail, four state-of-the-art algorithms, i.e., Ni et al. [2], Li et al. [6], Wu et al. [9] and Hsu et al. [4], are used to construct an ensemble RDH system. There are three reasons for why we select the four algorithms. First, they are HS based algorithms providing superior rate-distortion performance, and are relatively easy to simulate. Second, they allow us to use two pairs of peak-zero points at a time to embed the secret data. It is convenient for us to use them to build a new RDH system and optimize the embedding parameters. Third, a core contribution of these works is the pixel prediction procedure. Clearly, the prediction value of a cover pixel in [2] can be fixed as zero all the time. In Li et al.

’s work, they use the mean value of neighbors to estimate the current pixel. In Wu

et al.’s work, a second-order prediction procedure is applied. And, in Hsu et al.’s work, side corrections are considered. These different predictors enable us to better choose a suitable RDH algorithm for a subhost during the optimization process, which could benefit the overall rate-distortion performance. However, it is still pointed that, it is always free for a data hider to choose the candidate RDH algorithms. We cannot guarantee that our choice is optimal. One may take into account more candidate algorithms, which requires a higher computational cost.

Iii-a Preprocessing

We use a key to produce a permutation of , denoted by , to control data embedding order. It means that, we will first embed into . Then, we embed into and so on. will be self-embedded into some pixels in by LSB replacement, where the original LSBs of the specified pixels will be considered as a part of m. We have to self-embed and in advance as well. The process is similar to . In default, we can set . The size of side information for storing , and will have ignorable impact on the pure embedding payload. Since is slightly modified, its impact on the overall distortion can be always roughly ignored during optimization.

Iii-B Data Embedding

Assuming that, have been previously carried by , we are to embed . We apply the four RDH algorithms mentioned above to . The RDH algorithm that results in the lowest overall distortion will be selected as the final algorithm for . It is noted that, for each RDH algorithm, during data embedding optimization, the side information involves three aspects, i.e., the location map , the data-embedding parameters and the secret key . is constructed to avoid underflow and overflow problem, which will be embedded into together with . Meanwhile, and should be embedded into by LSB replacement. The LSBs of the specified pixels of will be embedded into as well.

In addition to and , we need to embed the index of the selected RDH algorithm into by a similar way. As mentioned above, since (which may be losslessly compressed in advance) should be embedded into , it may limit the pure embedding capacity of . It indicates that, it is possible that, the four algorithms are all non-embeddable, i.e., they all cannot carry . In this case, we will not embed secret bits into including the side information. It also means that, . Thus, we need another bit to tell a decoder whether the present subhost is embedded or not. The bit will be embedded into as well. In case that the LSBs of are not enough to carry the auxiliary data, one can use the second-LSB-plane, for which the original bits should be recorded and embedded into as well. There has another suitable way to deal with the above problem. Namely, we replace a part of the LSBs of with the auxiliary data for the present subhost. The modified LSBs of will be recorded and embedded into the next subhost333The pure data embedding rate should exclude those bits for recovering the original LSBs of .. After processing , we continue to process until m and the necessary auxiliary data are all completely embedded.

Iii-C Data Extraction and Image Recovery

For a data receiver, he first extracts and identifies data embedding order of the encoder side. Notice that, it is required that the LSBs for storing at the encoder side should not be overridden. Then, the receiver extracts m in an inverse manner. Namely, he will first extract from the marked , then extract from the marked , and so on.

Fig. 2: The rate-distortion performance comparison between the state-of-the-art RDH algorithms introduced by Ni et al. [2], Hsu et al. [4], Li et al. [6], Wu et al. [9], and the proposed ensemble algorithm for the six standard test images. We here vary from 2 to 8 for rate-distortion optimization during each-layer embedding (). The multiple embedding strategy was utilized to carry more data.

Accordingly, m can be finally reconstructed. Meanwhile, the side information such as the location maps and the original LSBs of can be correctly recovered as well. This allows the original image to be perfectly reconstructed.

It is mentioned that, one may use the multiple embedding strategy to embed as many message bits as possible, i.e., an original image may be embedded multiple times at the encoder side. In this case, the data extraction and image recovery process can be performed in a similar way. It is inferred that, the multiple embedding strategy ensures reversibility.

Iv Performance Evaluation and Analysis

In this section, we will present experiments on public image dataset for performance evaluation and analysis. As mentioned above, four state-of-the-art RDH algorithms introduced by Ni et al. [2], Li et al. [6], Wu et al. [9] and Hsu et al. [4], are used to construct a new ensemble RDH system. For each candidate algorithm, we introduce the important configuration in our simulation below.

For Ni et al.’s algorithm, we generate the histogram directly from the cover image. Two pairs of peak-zero histogram bins are selected out for data embedding with the HS operation. For the two zero-bins, we record their positions and construct a location map losslessly compressed by arithmetic coding. It will be embedded into the cover image. The original location map corresponds to a binary map with the same size of the image, where “1”s represent the pixels of zero-bins and “0”s for the others. Notice that, it is possible that the number of “1” is zero. In addition, we assume that the absolute difference value of a peak-bin and the corresponding zero-bin should be no less than 2, which is to avoid extraction ambiguous.

For the algorithms introduced in Li et al., Wu et al. and Hsu et al., the processes are similar. First of all, the boundary pixels are adjusted into the reliable range and recorded to constitute a location map compressed by arithmetic coding. Then, the pixels to be embedded are predicted according to the corresponding predictor. Thereafter, the secret data and auxiliary data are embedded by the corresponding operation. Since the candidate algorithms use prediction-errors and HS, we embed the secret data by using two pairs of peak-zero bins444One has to record the occurrences of zero-bins in the PEH for reversibility even though the number of occurrences of zero-bins is often zero., which is the same as the original ones.

In our simulation, we consider as a square number, i.e., = , and set in default. For a given image, we change from 2 to 8 by a step value of 1 for optimization. The best rate-distortion performance will be considered as the result since the data sender always has the freedom to choose better parameters. The multi-layer embedding strategy is applied to all candidate algorithms and the proposed ensemble system. Notice that, when to use the multi-layer embedding strategy for the proposed ensemble system, the optimized value of may be different for each embedding layer. As mentioned previously, there may exist some that it cannot carry additional data. To deal with this problem, after a single-layer embedding, the entire marked image will be rotated by a degree of 90 such that the new pixel-blocks may be different (if we divide from left to right and from up to down). Notice that, in real world, there has no need to directly rotate the image, but to change the way for block division. Furthermore, in Li et al.’s work, they divide a PEH into 16 sub-PEHs in default. In the new ensemble system, for all possible , when to use their work, we use only one sub-PEH, which can reduce the size of side information.

We take six standard test images555Downloaded at http://sipi.usc.edu/database/database.php?volume=misc Airplane, Lena, Baboon, Lake, Boat and Peppers with a size of from smooth to complex for rate-distortion performance evaluation. The peak signal-to-noise ratio (PSNR, dB) is determined to evaluate the marked image quality. We focus on the pure data embedding rate, which does not include side information such as the location map. Fig. 1 shows the algorithm distribution maps for the -layer embedding for each test image by the proposed ensemble system. It can be seen that, different image blocks (corresponding to different subhosts) have different characteristics and therefore use different candidate algorithms, which has demonstrated the applicability of the segmented and ensemble strategy.

Fig. 2 shows the rate-distortion performance comparison between the candidate algorithms and the new ensemble system. It is observed that, the new system can not only provide a relatively higher pure data embedding capacity, but also introduce a relatively lower distortion. It indicates that the proposed work has the potential to significantly improve the rate-distortion performance, which demonstrates the superiority. It is also observed that, for the Peppers image, the PSNRs of our ensemble system are slightly lower than Hsu et al.’s method for relatively low embedding rates. This indicates that, the candidate algorithms may not fully benefit the ensemble system, indicating that, we could consider more candidate RDH algorithms. It also implies that, the used image-block division method may not well exploit the statistical characteristics of the covers, which leads us to design more efficient ensemble system in the future.

V Conclusion and Discussion

A number of RDH algorithms have been reported in the past twenty years. They have moved the field ahead rapidly. Most of them consider the cover as a whole for data embedding. Actually, different subhosts of the cover may have different characteristics, which implies that, we may be able to apply different data embedding algorithms to different subhosts so that better performance can be achieved. This paper presents a novel segmented and ensemble embedding strategy for RDH, which can deal with the above requirement. We also present a detailed ensemble RDH scheme. Experimental results have shown that, the proposed work could significantly improve the performance. The proposed work may have potential in RDH.

In addition, the ensemble perspective may be applicable to other subfields of information hiding such as steganography [13]. In this sense, one may redefine the “ensemble” term. And, a core research topic may be to formulate the optimization problem and find the optimal solution.


This work was partly supported by NSFC (No. 61502496, U1536120, U1636201, U1736119, and 61772529) and the National Key Research and Development Program of China (No. 2016YFB1001003). It was also partly supported by the Key Lab of Information Network Security and the Ministry of Public Security of China.


  • [1] J. Tian, “Reversible data embedding using a difference expansion,” IEEE Trans. Circuits Syst. Video Technol., 13(8):890-896, Aug. 2003.
  • [2] Z. Ni, Y. Q. Shi, N. Ansari and W. Su, “Reversible data hiding,” IEEE Trans. Circuits Syst. Video Technol., 16(3): 354-362, Mar. 2006.
  • [3] B. Ou, X. Li, Y. Zhao, R. Ni and Y. Shi, “Pairwise prediction-error expansion for efficient reversible data hiding,” IEEE Trans. Image Process., 22(12): pp. 5010-5021, Dec. 2013.
  • [4] F. Hsu, M. Wu and S. Wang. Reversible data hiding using side-match prediction on steganographic images,” Multimed. Tools Appl., 67(3): 571-591, Dec. 2013.
  • [5] H. Wu, H. Wang, Y. Hu and L. Zhou, “Efficient reversible data hiding based on prefix matching and directed LSB embedding,” In: Proc. Int. Workshop Digital-forensics and Watermarking, pp. 455-469, Oct. 2014.
  • [6] X. Li, W. Zhang, X. Gui and B. Yang, “Efficient reversible data hiding based on multiple histogram modification,” IEEE Trans. Inf. Forensics Security, 10(9): 2016-2027, Sept. 2015.
  • [7] H. Wu, H. Wang and Y. Shi, “Dynamic content selection-and-prediction framework applied to reversible data hiding,” In: IEEE Int. Workshop Inf. Forensics Security, pp. 1-6, Dec. 2016.
  • [8] B. Ma and Y. Shi, “A reversible data hiding scheme based on code division multiplexing,” IEEE Trans. Inf. Forensics Security, 11(9): 1914-1927, Sept. 2016.
  • [9] H. Wu, H. Wang and Y. Shi, “PPE-based reversible data hiding,” In: ACM Workshop Inf. Hiding Multimed. Security, pp. 187-188, Jun. 2016.
  • [10] V. Sachnev, H. Joong Kim, J. Nam, S. Suresh and Y. Shi, “Reversible watermarking algorithm using sorting and prediction,” IEEE Trans. Circuits Syst. Video Technol., 19(7): 989-999, Jul. 2009.
  • [11] H. Wu, W. Wang, J. Dong, Y. Chen and H. Wang, “Reversible embedding to covers full of boundaries,” arXiv:1801.04752, Jan. 2018.
  • [12] H. Wu, Y. Shi, H. Wang and L. Zhou, “Separable reversible data hiding for encrypted palette images with color partitioning and flipping verification,” IEEE Trans. Circuits Syst. Video Technol., vol.27, no. 8, pp. 1620-1631, Aug. 2017.
  • [13] H. Wu, H. Wang, H. Zhao and X. Yu, “Multi-layer assignment steganography using graph-theoretic approach,” Multimed. Tools Appl., vol. 74, no. 18, pp. 8171-8196, Sept. 2015.