StegColNet: Steganalysis based on an ensemble colorspace approach

02/06/2020 ∙ by Shreyank N Gowda, et al. ∙ Tsinghua University 0

Image steganography refers to the process of hiding information inside images. Steganalysis is the process of detecting a steganographic image. We introduce a steganalysis approach that uses an ensemble color space model to obtain a weighted concatenated feature activation map. The concatenated map helps to obtain certain features explicit to each color space. We use a levy-flight grey wolf optimization strategy to reduce the number of features selected in the map. We then use these features to classify the image into one of two classes: whether the given image has secret information stored or not. Extensive experiments have been done on a large scale dataset extracted from the Bossbase dataset. Also, we show that the model can be transferred to different datasets and perform extensive experiments on a mixture of datasets. Our results show that the proposed approach outperforms the recent state of the art deep learning steganalytical approaches by 2.32 percent on average for 0.2 bits per channel (bpc) and 1.87 percent on average for 0.4 bpc.



There are no comments yet.


page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

Steganography is a means of covert communication in which secret information is embedded into some form of digital media, such as an image, video or text file [3]. In multimedia security, steganography forms a critical research topic [4]. In general, images are considered as the embedding medium due to minute changes in an image being imperceptible to the human eye [4]. The capacity for a steganographic algorithm represents the amount of data that can be embedded in an image before there is a noticeable visual change in the image [5]. Steganalysis is the process of detecting if a given image has information hidden in it or not [27]. In this regard, we can convert this problem into that of a simple classification problem. To detect if an image is embedded with information we propose the use of an ensemble color space model. Recently, it was seen an ensemble colorspace model [1] obtained excellent results on large scale image classification datasets such as imagenet [2]. Based on [1] we propose a novel steganalysis approach.

We use a colorspace approach to determine if an image is hiding information or not. We use ColorNet [1] and take the final activation map from each colorspace. We use weighted averaging to obtain a single feature map from all the individual feature maps that are generated by each colorspace. It was seen [1] that each color space had features explicit to themselves and this would help us detect minute changes in the image. We then use a levy-flight grey wolf optimization method (meta-heuristic approach) to select a smaller subset of features. Using these features, we classify the given image into one of two classes: containing concealed information or not.

Ii Related Work

Ii-a Steganography

Steganography algorithms can be classified broadly into four categories: 1) cover image size 2) embedding domain-based algorithms 3) nature of retrieval based algorithms 4) adaptive steganographic algorithms. In the case of 2-D images, the information is embedded onto the 2-D plane of the cover image. This embedding can be done over transform domain coefficients (such as discrete cosine transforms, Fourier transforms, etc.) or on the spatial domain (an example is LSB). The 3-D approaches essentially follow the same general procedure. However, the procedure is repeated on multiple planes (for instance RGB in a color image has 3 planes that can embed information). Image steganography on 3-D images can be made in either geometrical domain [5], representation domain [6] or topological domain [7]. Some of the transform-based steganographic algorithms include discrete Fourier transform (DFT) [9], discrete cosine transform (DCT), discrete wavelet transform [10], complex wavelet transform [11] among others. Here, frequency coefficients obtained after applying transforms are used to hide secret bits. Along with the security being improved, these algorithms are robust to image compression, cropping, scaling, etc. Off late, machine learning approaches have been proposed such as SVM (Support Vector Machine)[12], genetic algorithm approaches [13], neural network-based steganography [14]. Though these approaches are black-box approaches, they have shown good results.

Ii-B Steganalysis

Steganalysis is the method of trying to either determine a stego image (image where information is hidden) or extract the secret information. Our method deals with the former. We treat the problem at hand to be a classification problem, wherein, each image either contains some hidden information or not. There are two basic approaches to steganalysis: signature steganalysis and statistical steganalysis. Signature steganalysis is the method wherein patterns, or signatures relevant to various steganographic algorithms are searched for. The statistical approach searches for mathematical results to determine if the information is being hidden.

Signature steganalysis is further classified into specific embedding [16] and universal blind steganalysis [15]. Specific embedding approaches are impractical because we need to know what steganography approach has been used to embed information. Hence, universal blind steganalysis [8,17] is preferred. These approaches help in the extraction of high dimensional features. However, the curse of dimensionality occurs. Hence, a need to reduce feature size occurs. Some commonly used algorithms to do the same include wrappers, filters, etc. Filters are less complex; however, they perform poorly. Wrapper methods evaluate feature subset using predictive models [18]. However, wrappers are complex and time-consuming.

To overcome this, meta-heuristic approaches have been deployed. These approaches solve optimization problems by utilizing natural phenomena [19-20]. It was seen that Grey Wolf Optimization (GWO) performed better than other metaheuristic approaches for solving non-linear problems in a multi-dimensional space [19]. However, it has a slow convergence rate and gets trapped in local optima at times. It has been seen that GWO can be optimized by modifying it’s parameter A to obtain a quick convergence rate, better convergence precision and higher agility for global searching.

Iii Proposed Approach

Iii-a Overall architecture and effect of using color spaces

We consider steganalysis as a 2 class classification problem. The overall architecture is described in figure 1. The experimental analysis along with details regarding training set etc are explained in the next section. Recently, the effect of color spaces on image classification has been explored [1]. It was seen that individual color spaces inherited classification features explicitly to themselves. This helped us ponder about the ability to extract information in an image where there is secret information being embedded. Colornet [1] being an ensemble model, that could extract features specific to each colorspace, was an excellent choice to utilize to help us in determining if an image could have information hidden in it. The output of Colornet is a high-dimensional vector, which causes a computationally intensive execution. To reduce the number of features selected we have to use an optimization approach for feature selection. Figure 1 shows the architecture of the model.

Fig. 1: Two phases involved in the overall architecture of the model: training the model using colornet and detecting stego-image using feature map aggregation

Iii-B Optimization process for feature selection

Iii-B1 Feature selection using LF-Grey Wolf optimization

In GWO, the head of the pack is the . The next level of the hierarchy is ,  and finally followed by . GWO models the social hierarchy and mathematically illustrates the hunting procedure as an optimization problem. If X(t) and X(t) represent the position of prey and wolf at iteration ’t’, we can mathematically model the encircling process [19] with two coefficients A and C as shown in (1). A and C are calculated by (2).


Here, r and r are random vectors in [0,1], a is a parameter that decreases linearly from 2 to 0 over iterations and also helps to control step size D of a grey wolf. Implementation of the end of the hunting process is done by decreasing the value of A which in turn depends on a. Once a turns zero, it means that the wolves have stopped moving. The linear decrease in A helps to exploit search space with minimal exploration. Hence, this traps a local optimum.

The size of the aggregated feature map creates an issue in terms of the complexity of the algorithm and the overall time needed for execution. To deal with this, we propose the use of levy flight-based grey wolf optimization (LF-GWO) for feature selection based on Levy probability function in (3). Here,  represents position parameter,  represents scale parameter and  represents the collection of samples in the distribution. The above equation holds good for all positive values of  and 0 otherwise.


The parameter A is modified by the Levy flight function as A = L(S)*r1. This makes A take up values in a non-linear decrease. S is the position of the wolf and r1 is a random vector.

Iii-B2 Choice of optimization function

The reason for selection of LF-GWO is based in the statistical results obtained in [21]. It was seen that for 15 defined benchmark functions, the wilcoxon rank sum test of LF-GWO outperforms existing optimization approaches in terms of mean fitness values.

Figure 2 represents a comparison of the LF-GWO with Grey Wolf Optimization (GWO), Gravitational search algorithm (GSA), particle swarm optimization (PSO) and fast evolutionary programing (FEP) using a boxplot and a graph showing how quickly the convergence of the best fitness value is obtained with respect to the number of iterations. The box plot represents the benchmark function defined in equation 4 and the convergence map that of the function defined in equation 5.

Fig. 2: (a) Maps the convergence of the best fitness value with respect to number of iterations (b) Shows the box plot for the final best solution. Taken from [21]. Both graphs are representative of one benchmark function.

Iv Experimental Analysis

Iv-a Datasets and training

Most commonly used steganalysis datasets are the Bossbase [22] and BOWS2 [23]. Each contains 10000 grayscale images. However, the approach proposed is dependent on color, and as such, we use a dataset with color images. Hence, starting with the 10000 images of Bossbase [22] dataset, we generate a dataset by following the process done in [24]. We downsampled the full-resolution images to a size of 512x512. We then followed the process in [25], so that the training and testing scenarios were conducted in a similar environment. In [25], two datasets were created by using two demosaicing algorithms: Patterned pixel grouping (PPG) and Adaptive Homogeneity-Directed (AHD) and named BOSS-PPG-LAN and BOSS-AHD-LAN correspondingly. Further, by removing the down-sampling method, we can obtain two more datasets: BOSS-PPG-CRP and BOSS-AHD-CRP. By pairing a demosaicing algorithm with bilinear or bicubic kernels, we obtain four more datasets: BOSS-PPG-BIL, BOSS-AHD-BIL, BOSS-AHD-BIL, and BOSS-AHD-BIC.

We train our model by utilizing mini-batch stochastic gradient descent with the following parameters: learning rate : 0.0001, weight decay : 0.0005, step size : 5000, momentum: 0.75, gamma : 0.75, batch size: 32, maximum iterations: 40 x 104. Testing of the trained model was done for every 5000 iterations and accuracy in 40 x 104 iterations. HILL, SUNIWARD, CMD-C-SUNIWARD and CMD-C-HILL: 4 state of the art color steganography algorithms, were used as attacking targets for experimental analysis. The embedding payload was set to 0.2 bpc (bits per channel/band pixel) and 0.4 bpc. In order to select the most challenging scenarios and also follow similar conditions for result comparison, we followed the process executed in WISERNet [25].

Iv-B Results comparison

To compare our results, we considered three deep learning approaches for color steganalyzers, that are widely considered state of the art approaches: WISERNet [25], Deep Hierarchical Representations (DHR) [26] and Deep-CNN [27]. Experiments were conducted on the same datasets and using similar resources for a fair comparison. Popular steganography methods such as SUNIWARD [28], MiPOD [29], HILL [30] adopt an additive embedding distortion approach for minimizing framework [31]. Recently, CMD-C was proposed [32] by improvising the CMD approach for color images. We denote the CMD-C method using SUNIWARD and HILL as CMD-C-SUNIWARD and CMD-C-HILL respectively. Although DHR [26] and D-CNN [27] can be executed in channel-wise convolution, normal convolution and input concatenation as seen in [25], we show results only for the normal convolution as WiserNet [25] outperforms DHR and D-CNN in all cases. We also compare results with the Pixel Vector Cost (PVC) [33] and channel gradient correlation (CGC) [34].

The parameters used in terms of batch size and iterations were the same for all the comparisons. The other parameters were used as described in the original paper. Each experiment constituted 75 percent training images, i.e., 7500 images and 2500 images were used for testing. All experiments were performed 10 times and the average accuracy of testing was used. Table 1 compares the results of our approach with WISERNet (W-Net) [25], DHR [26], D-CNN [27], on BOSS-PPG-LAN (B-P-L), BOSS-PPG-BIC (B-P-Bc), BOSS-PPG-BIL (B-P-Bl), BOSS-AHD-BIC (B-A-Bc) and BOSS-AHD-BIL (B-A-Bl) with 0.2 bpc and table 2 with 0.4 bpc. As can be seen, the proposed method outperforms other state of the art methods for all but one case and also the percentage increase in detection is significant when patterned pixel grouping is performed on the datasets.

Dataset DHR D-CNN W-Net CGC PVC Proposed
B-P-L 0.6474 0.6562 0.7139 0.7231 0.7120 0.7741
B-P-Bc 0.6589 0.7124 0.7318 0.7278 0.7657 0.7912
B-P-Bl 0.7611 0.7487 0.8033 0.8120 0.8068 0.8316
B-A-Bc 0.6614 0.6627 0.7369 0.7168 0.7211 0.7368
B-A-Bl 0.7622 0.7647 0.8022 0.7981 0.7764 0.8044
TABLE I: Comparison of results for CMD-C-HILL stego images with 0.2 bpc. D-CNN is executed with 30 fixed SRM kernels. The best results are represented in bold font.
Dataset DHR D-CNN W-Net CGC PVC Proposed
B-P-L 0.7568 0.7941 0.8361 0.8268 0.8148 0.8724
B-P-Bc 0.7732 0.8068 0.8435 0.8314 0.8514 0.8814
B-P-Bl 0.87211 0.9045 0.9169 0.9165 0.9056 0.9381
B-A-Bc 0.7728 0.8141 0.8448 0.8412 0.8378 0.8468
B-A-Bl 0.8738 0.9067 0.9144 0.9044 0.9022 0.9088
TABLE II: Comparison of results for CMD-C-HILL stego images with 0.4 bpc. D-CNN is executed with 30 fixed SRM kernels. The best results are represented in bold font.

Further experimental analysis is done by mixing datasets as shown in [27]. Table 3 shows how the datasets were mixed. We further label the datasets in roman numerals for simplicity to display in the comparison of steganalyzers in table 4 and 5. BPL, BPBc, BPBl, BABc, BABl, BAL are further abbreviations of BOSS-PPG-LAN, BOSS-PPG-BIC, BOSS-PPG-BIL, BOSS-AHD-BIC, BOSS-AHD-BIL and BOSS-AHD-LAN.

Set-I - - -
Set-II - - -
Set-III - - - -
TABLE III: Representation of mixture of datasets. ✓implies dataset has been selected and - implies otherwise.

Similarly to tables 1 and 2, table 4 compares results on the above-mentioned mixture of datasets with 0.2 bpc. Table 5 compares the results with 0.4 bpc. As can be seen, the proposed method outperforms recent state of the art approaches, by a significant margin.

Dataset DHR D-CNN W-Net CGC PVC Proposed
Set-I 0.7237 0.7259 0.7675 0.7712 0.7734 0.8029
Set-II 0.7214 0.7217 0.7714 0.7710 0.7684 0.8026
Set-III 0.6722 0.6865 0.7284 0.7412 0.7388 0.7648
Set-IV 0.7164 0.7182 0.7671 0.7782 0.7684 0.8048
TABLE IV: Comparison of results for CMD-C-HILL stego images with 0.2 bpc on mixture of datasets. D-CNN is executed with 30 fixed SRM kernels. The best results are represented in bold font.
Dataset DHR D-CNN W-Net CGC PVC Proposed
Set-I 0.8241 0.8289 0.8594 0.8788 0.8641 0.9041
Set-II 0.8231 0.8417 0.8806 0.8762 0.8661 0.9021
Set-III 0.7812 0.7892 0.8316 0.8411 0.8421 0.8598
Set-IV 0.8161 0.8214 0.8893 0.8796 0.8812 0.9013
TABLE V: Comparison of results for CMD-C-HILL stego images with 0.4 bpc on mixture of datasets. D-CNN is executed with 30 fixed SRM kernels. The best results are represented in bold font.

V Conclusion

With recent developments of color based steganography algorithms, the need for a powerful steganalyzer is needed. We saw recently, that an ensemble model of colorspaces has a significant impact on classification results. We propose StegColNet as a powerful color image steganalyzer. We employ an ensemble colorspace strategy to determine if an image is protecting information or not. We use ColorNet and take the final activation map from each colorspace. We use weighted averaging to obtain a single feature map from all the feature maps that are generated by each colorspace. We then use a levy-flight grey wolf optimization method to select a smaller subset of features. Using these features, we classify the given image into one of two classes: containing concealed information or not.



  • [1]

    Gowda, S.N. and Yuan, C., 2018, December. ColorNet: Investigating the importance of color spaces for image classification. In Asian Conference on Computer Vision (pp. 581-596). Springer, Cham.

  • [2]

    Deng, J., Dong, W., Socher, R., Li, L.J., Li, K. and Fei-Fei, L. 2009, June. Imagenet: A large-scale hierarchical image database. In IEEE Conference on In Computer Vision and Pattern Recognition (pp. 248-255). IEEE.

  • [3] Kahn, D., 1996, May. The history of steganography. In International Workshop on Information Hiding (pp. 1-5). Springer, Berlin, Heidelberg.
  • [4] Cheddad, A., Condell, J., Curran, K. and Mc Kevitt, P., 2010. Digital image steganography: Survey and analysis of current methods. Signal processing, 90(3), pp.727-752.
  • [5] Li, N., Hu, J., Sun, R., Wang, S. and Luo, Z., 2017. A high-capacity 3D steganography algorithm with adjustable distortion. IEEE Access, 5, pp.24457-24466.
  • [6] Tsai, Y.Y., 2014. An adaptive steganographic algorithm for 3D polygonal models using vertex decimation. Multimedia Tools and Applications, 69(3), pp.859-876.
  • [7] Cheng, Y.M. and Wang, C.M., 2006. A high-capacity steganographic approach for 3D polygonal meshes. The Visual Computer, 22(9-11), pp.845-855.
  • [8] Chakraborty, S., Jalal, A.S. and Bhatnagar, C., 2017. LSB based non blind predictive edge adaptive image steganography. Multimedia Tools and Applications, 76(6), pp.7973-7987.
  • [9] Jayaram, P., Ranganatha, H.R. and Anupama, H.S., 2011. Information hiding using audio steganography–a survey. The International Journal of Multimedia and Its Applications (IJMA) Vol, 3, pp.86-96.
  • [10] Kumar, V. and Kumar, D., 2018. A modified DWT-based image steganography technique. Multimedia Tools and Applications, 77(11), pp.13279-13308.
  • [11] Narasimmalou, T. and Joseph, R.A., 2012, March. Discrete wavelet transform based steganography for transmitting images. In IEEE-International Conference On Advances In Engineering, Science And Management (ICAESM-2012) (pp. 370-375). IEEE.
  • [12] Gowda, S.N., 2016, September. Innovative enhancement of the Caesar cipher algorithm for cryptography. In 2016 2nd International Conference on Advances in Computing, Communication and Automation (ICACCA)(Fall) (pp. 1-4). IEEE.
  • [13] Chang, C.C., Yu, Y.H. and Hu, Y.C., 2008, December. Hiding secret data into an ambtc-compressed image using genetic algorithm. In Second International Conference on Future Generation Communication and Networking Symposia (Vol. 3, pp. 154-157). IEEE.
  • [14] Gowda, S.N., 2016, October. Using Blowfish encryption to enhance security feature of an image. In 2016 6th International Conference on Information Communication and Management (ICICM) (pp. 126-129). IEEE.
  • [15] Luo, X.Y., Wang, D.S., Wang, P. and Liu, F.L., 2008. A review on blind detection for image steganography. Signal Processing, 88(9), pp.2138-2157.
  • [16]

    Fridrich, J. and Goljan, M., 2004, June. On estimation of secret message length in LSB steganography in spatial domain. In Security, steganography, and watermarking of multimedia contents VI (Vol. 5306, pp. 23-35). International Society for Optics and Photonics.

  • [17] Kodovsky, J., Fridrich, J. and Holub, V., 2012. Ensemble classifiers for steganalysis of digital media. IEEE Transactions on Information Forensics and Security, 7(2), pp.432-444.
  • [18] Deng, H. and Runger, G., 2012, June. Feature selection via regularized trees. In The 2012 International Joint Conference on Neural Networks (IJCNN) (pp. 1-8). IEEE.
  • [19] Chhikara, R.R., Sharma, P. and Singh, L., 2018. An improved dynamic discrete firefly algorithm for blind image steganalysis. International Journal of Machine Learning and Cybernetics, 9(5), pp.821-835.
  • [20]

    Yao, X., Liu, Y. and Lin, G., 1999. Evolutionary programming made faster. IEEE Transactions on Evolutionary computation, 3(2), pp.82-102.

  • [21] Pathak, Y., Arya, K.V. and Tiwari, S., 2019. Feature selection for image steganalysis using levy flight-based grey wolf optimization. Multimedia Tools and Applications, 78(2), pp.1473-1494.
  • [22] Bas, P., Filler, T. and Pevný, T., 2011, May. ” Break our steganographic system”: the ins and outs of organizing BOSS. In International workshop on information hiding (pp. 59-70). Springer, Berlin, Heidelberg.
  • [23] Piva, A. and Barni, M., 2007, February. The first BOWS contest: break our watermarking system. In Security, Steganography, and Watermarking of Multimedia Contents IX(Vol. 6505, p. 650516). International Society for Optics and Photonics.
  • [24] Goljan, M., Fridrich, J. and Cogranne, R., 2014, December. Rich model for steganalysis of color images. In 2014 IEEE International Workshop on Information Forensics and Security (WIFS) (pp. 185-190). IEEE.
  • [25] Zeng, J., Tan, S., Liu, G., Li, B. and Huang, J., 2018. Wisernet: Wider separate-then-reunion network for steganalysis of color images. arXiv preprint arXiv:1803.04805.
  • [26] Ye, J., Ni, J. and Yi, Y., 2017. Deep learning hierarchical representations for image steganalysis. IEEE Transactions on Information Forensics and Security, 12(11), pp.2545-2557.
  • [27]

    Xu, G., 2017, June. Deep convolutional neural network to detect J-UNIWARD. In Proceedings of the 5th ACM Workshop on Information Hiding and Multimedia Security (pp. 67-73). ACM.

  • [28] Holub, V., Fridrich, J. and Denemark, T., 2014. Universal distortion function for steganography in an arbitrary domain. EURASIP Journal on Information Security, 2014(1), p.1.
  • [29] Sedighi, V., Cogranne, R. and Fridrich, J., 2016. Content-adaptive steganography by minimizing statistical detectability. IEEE Transactions on Information Forensics and Security, 11(2), pp.221-234.
  • [30] Li, B., Wang, M., Huang, J. and Li, X., 2014, October. A new cost function for spatial image steganography. In IEEE International Conference on Image Processing (ICIP) (pp. 4206-4210). IEEE.
  • [31] Fridrich, J. and Filler, T., 2007, February. Practical methods for minimizing embedding impact in steganography. In Security, Steganography, and Watermarking of Multimedia Contents IX(Vol. 6505, p. 650502). International Society for Optics and Photonics.
  • [32] Tang, W., Li, B., Luo, W. and Huang, J., 2016. Clustering steganographic modification directions for color components. IEEE Signal Processing Letters, 23(2), pp.197-201.
  • [33] Qin, X., Li, B., Tan, S. and Zeng, J., 2019. A novel steganography for spatial color images based on pixel vector cost. IEEE Access, 7, pp.8834-8846.
  • [34] Kang, Y., Liu, F., Yang, C., Xiang, L., Luo, X. and Wang, P., 2019. Color image steganalysis based on channel gradient correlation. International Journal of Distributed Sensor Networks, 15(5), p.1550147719852031.