Reversible data embedding, also called reversible data hiding , is a special fragile technique that could benefit sensitive applications that require no distortion of the cover. It works by hiding a message such as authentication data within a cover by slightly altering the cover. At the decoder side, one can extract the hidden data from the marked content. And, the original content can be perfectly reconstructed.
The reversible data embedding can be modeled as a rate-distortion optimization problem. We hope to embed as many message bits as possible while the introduced distortion is low. A number of algorithms have been proposed in the past years. Early algorithms use lossless compression to vacate room for data embedding. More efficient approaches are introduced to increase the data embedding capacity or reduce the distortion such as difference expansion , histogram shifting  and other methods [3, 4]. Nowadays, advanced algorithms use prediction-errors (PEs) [5, 6, 7, 8, 9] of the cover to hide secret data since PEs could provide superior rate-distortion performance.
In order to avoid underflow and overflow problem during data embedding, boundary pixels should be adjusted into the reliable range and recorded as side information, which will be embedded into the cover image together with the secret data. The existing algorithms often assume that, the used cover image is natural and thus the size of side information could be small, which will have little impact on the pure embedding capacity. However, even for natural images, there may exist a lot of boundary pixels. This implies that, the side information may have significant impact on the pure embedding capacity.
We use the image database BOSSBase  for explanation. We assume that the pixels with a value of 0/255 are boundary pixels. Fig. 1 shows the number of boundary pixels in each image. It is observed that, there are many images that have lots of boundary pixels, implying that the corresponding side information may require a lot of bits. In reversible data embedding, a commonly used operation to construct the side information is first to assign one bit to each pixel representing whether the present pixel is a boundary or not. Then, the resulting binary matrix also called the location map is losslessly compressed. This operation is effective when the number of boundary pixels is small. However, it may have poor performance in images full of boundary pixels, especially for the case that boundary pixels are widely distributed. Fig. 2 shows an example. Regardless of the lossless compression algorithm, the compression ratio for the location map would be intuitively very low.
We sort all images in BOSSBase according to the number of the boundary pixels. We choose the top-200 largest number of boundary pixels of images out and construct the losslessly compressed location maps by arithmetic coding. Fig. 3 shows the size of the losslessly compressed location map for each image. It is observed that, the sizes of the compressed location maps are all very large, indicating that, many existing algorithms may carry a very low pure payload and even cannot carry extra bits for those natural images full of boundary pixels. This has motivated the authors in this paper to propose an efficient algorithm to address this very important problem.
The rest are organized as follows. In Section II, we introduce the proposed reversible data embedding framework for images that have lots of boundary pixels. Then, we conduct experiments to show the performance in Section III. Finally, we conclude this paper in Section IV.
Ii Proposed Framework
The proposed work involves three steps. First, all pixels are preprocessed by prediction, for which the number of boundary pixels will be significantly reduced, resulting in a small size of the compressed location map. Then, any suitable reversible embedding operation can be applied to the preprocessed image to carry a payload. Finally, the data extraction and image recovery can be performed similarly.
Ii-a Prediction-based Preprocessing
Let denote the original image sized with the pixel range . For compactness, we sometimes consider as the set including all pixels and say “pixel ” meaning a pixel located at position whose value is . is called as a boundary pixel if
where is a predetermined parameter relying on the data embedding operation. It is always assumed that .
We are to preprocess to generate a new image and a location map . First, we divide into two subsets, i.e.,
Then, we use to predict . In detail, for each , we determine its prediction value by:
where returns a nearest integer. It is easy to modify Eq. (1) slightly in case that a pixel position is out of the image.
We use a threshold to generate an image , i.e.,
The principle behind Eq. (2) is that, if the prediction value of a pixel is close to the boundary value, its original value should be close to the boundary value as well. Fig. 4 shows two examples by using the predictor in Eq. (1). It is seen that, for both images, there exist strong correlations between the original values and the prediction values. Thus, we can adjust the raw value into the reliable range according to its prediction value. For each , we continue to compute its prediction value in by:
We use a threshold to generate another image from by:
The pixel values of must be in range . We will adjust the pixels in into the range to generate the final image . In detail, for all possible , we compute it as follows:
will not contain boundary pixels. We record such pixel positions that . We construct a -ary location map to address this issue, i.e.,
where is corresponding to the pixel .
Ii-B Reversible Data Embedding
We embed the required payload and the losslessly compressed into , rather than . It is seen that, there has no need to construct a new location map since does not contain boundary pixels. It is inferred that, one can use many existing state-of-the-art algorithms for reversible data embedding since the data embedding operation here is open to design. We will not focus on the detailed reversible embedding operation.
Ii-C Data Extraction and Image Recovery
Suppose that, we have embedded , and data embedding parameters into , resulting in a marked image . Notice that, has been compressed in advance. For a receiver, he needs to extract the embedded data and reconstruct from . It is straightforward to reconstruct , and from . Our goal is to reconstruct . With Eqs. (5, 6), it is straightforward to reconstruct from and by:
Thereafter, we reconstruct from . First, we initialize . Then, we predict all corresponding to by Eq. (3). According to , is finally identified by:
Similarly, we initialize . We predict all by Eq. (1). With , can be reconstructed as:
Therefore, and can be perfectly reconstructed. Notice that, , and are parameters that should be embedded into previously.
Iii Performance Evaluation and Analysis
The core contribution of our work is that, we introduce an efficient losslessly processing technique to significantly reduce the sizes of location maps for images full of boundary pixels. To verify the performance, we choose the 200 images mentioned in Fig. 3 for experiments. For an original image , we set and use arithmetic coding to losslessly compress the corresponding location map. For the corresponding , we define as a boundary pixel if . We first use and to compare the compression performance.
As shown in Fig. 5 (a), the number of boundaries is significantly reduced. In Fig. 5, “before preprocessing” corresponds to and the other one corresponds to . As shown in Fig. 5 (b), the size of location map is significantly reduced as well, meaning that, a sufficient pure payload could be carried. We define two ratios as follows:
We compute the mean value of and for the 200 images. They are and , respectively. It has shown that our work can significantly reduce side information, which would be quite helpful for subsequent data embedding operation.
Actually, different and will result in different performance. Fig. 6 shows the different location maps for the image shown in Fig. 2 due to different and . It is observed that results in a smaller number of boundary pixels. The reason is that, when to predict the pixels corresponding to , the contexts are prediction values resulting in degradation of the prediction accuracy. To further evaluate the impact due to different and , we perform experiments on the 200 images. Table I and II shows the mean values of and . It has verified our perspective. And, it is suggested to use .
With a preprocessed image , we need to embed together with the compressed and other parameters. We focus on the data embedding capacity (bits per pixel, bpp):
where represents the size of the maximum embeddable payload and shows the total number of pixels.
One can apply any efficient data embedding algorithms. We use the methods presented in ,  and  for experiments to evaluate rate-distortion performance. The PSNR is considered as the distortion measure, and is determined between and . We choose the top-40 largest number of boundary pixels of images for experiments. The image indexes in BOSSBase have been given in Table III. We compare and the corresponding distortion for both the original image and the corresponding preprocessed image. We vary and from 1 to 16 by a step of 1 for optimization. The one resulting in the maximum will be selected as the result since a data hider always has the freedom to choose and . During the data embedding, for the original image, the boundary pixels are recorded by a location map losslessly compressed by arithmetic coding. For fair comparison, the corresponding location maps for the preprocessed images are compressed by arithmetic coding as well. Thereafter, both the original image and the preprocessed image are embedded.
|Method in ||Method in ||Method in |
|/||Method in ||Method in ||Method in |
Experimental results show that, there exist a large ratio of images that the existing algorithms cannot be applied to them directly, namely, for the original images since the size of compressed location maps are too large to embed extra bits. As shown in Table IV, we count the number of embeddable images (i.e., ). In Table IV, “before” corresponds to the original image, and “after” corresponds to the preprocessed image. It can be observed that, the proposed work significantly improves the ability of carrying additional data for the existing works. We compare and PSNR for those embeddable images. Table V and VI show the results. It can be seen that, the proposed work significantly increases the capacity, and provides a high image quality. We further determine the mean value of and PSNR for the 40 test images using the three data embedding algorithms equipped with the preprocessing technique. Table VII provides the results, which has implied that, the proposed work has good ability to improve the rate-distortion performance of many existing algorithms on images containing lots of boundary pixels.
|/||Method in ||Method in ||Method in |
Iv Conclusion and Discussion
In practice, it is quite easy to acquire images full of boundary pixels such as medical images, remote sensing images and natural sceneries, e.g., white cloud and dark night. The existing works often focus on images having little boundary pixels and provide superior rate-distortion performance on them. However, they may not work very well for images full of boundaries. In this paper, we present an efficient losslessly preprocessing algorithm to reversible data embedding in images that contain lots of boundary pixels. The reversible embedding operation for the proposed work is open to design. Experimental results have shown that our work could significantly reduce the size of the side information, which can benefit reversible data embedding performance a lot. The proposed work could have good potential in reversible data embedding. The future work is to design data embedding algorithms that can well exploit the statistical characteristics of the preprocessed image.
-  J. Tian, “Reversible data embedding using a difference expansion,” IEEE Trans. Circuits Syst. Video Technol., 13(8): 890-896, Aug. 2003.
-  Z. Ni, Y. Q. Shi, N. Ansari and W. Su, “Reversible data hiding,” IEEE Trans. Circuits Syst. Video Technol., 16(3): 354-362, Mar. 2006.
-  B. Ma and Y. Shi, “A reversible data hiding scheme based on code division multiplexing,” IEEE Trans. Inf. Forensics Security, 11(9): 1914-1927, Sept. 2016.
-  H. Wu, H. Wang, Y. Hu and L. Zhou, “Efficient reversible data hiding based on prefix matching and directed LSB embedding,” In: Proc. Int. Workshop Digital-forensics and Watermarking, pp. 455-469, Oct. 2014.
-  F. Hsu, M. Wu and S. Wang. Reversible data hiding using side-match prediction on steganographic images,” Multimed. Tools Appl., 67(3): 571-591, Dec. 2013.
-  B. Ou, X. Li, Y. Zhao, R. Ni and Y. Shi, “Pairwise prediction-error expansion for efficient reversible data hiding,” IEEE Trans. Image Process., 22(12), pp. 5010-5021, Dec. 2013.
-  X. Li, W. Zhang, X. Gui and B. Yang, “Efficient reversible data hiding based on multiple histogram modification,” IEEE Trans. Inf. Forensics Security, 10(9): 2016-2027, Sept. 2015.
-  H. Wu, H. Wang and Y. Shi, “Dynamic content selection-and-prediction framework applied to reversible data hiding,” In: IEEE Int. Workshop Inf. Forensics Security, pp. 1-6, Dec. 2016.
-  H. Wu, H. Wang and Y. Shi, “PPE-based reversible data hiding,” In: ACM Workshop Inf. Hiding Multimed. Security, pp. 187-188, Jun. 2016.
-  P. Bas, T. Filler, and T. Pevny, “Break our steganographic system - the ins and outs of organizing BOSS,” In: Int. Workshop Inf. Hiding (IH’11), vol. 6958, pp. 59 C70, May 2011.