As a simple yet efficient image processing operation, CE is typically used by malicious image attackers to eliminate inconsistent brightness for generating a visually imperceptible tampered images. CE detection algorithms play an important role in decision analysis for authenticity and integrity of digital images. Although some schemes have been proposed to detect contrast-enhanced images, the performance of such techniques is limited in the cases of pre-JPEG compression and anti-forensic attacks. Therefore, it is critical to develop robust and effective CE forensics algorithms.
With the efforts of researches in the past decade, a number of schemes [1-9] have been proposed to discriminate the contrast-enhanced images in uncompressed format. Stamm et al. [1,2,3] found that contrast enhancement would introduce the peaks and gaps into the image’s gray level histogram, which led to the specific high values in high-frequency components. Lin et al.
[6,7] revealed that contrast enhancement would disturb the inter-channel correlation left by color image interpolation and measured such correlation to distinguish the original and enhanced images. Futhermore, in order to recover the image processing history, the algorithms [10-13] of estimating parameters for constrast-enhanced images are developed.
Despite good performance of the above algorithms, their robustness is unsatisfactory in some cases, such as the CE of JPEG images (pre-JPEG compression) and the occurrence of anti-forensic attacks [14-19]. The reason lies in that the fingerprint left by CE operation would be destroyed. Based on such a phenomenon, some researchers attempt to propose more robust CE forensic algorithms, which contains two major branches: overcoming pre-JPEG compression  and defensing anti-forensic attack . Unfortunately, such methods can not address well both pre-JPEG compression and anti-forensic attacks. And to date there are no satisfactory solutions.
In this paper, we propose two robust CE detection algorithms based on convolutional neural networks (CNNs) to resist not only pre-JPEG compression but also anti-CE attacks. Firstly, discriminability analysis of CE forensics in pixel and histogram domains is presented. Then, inspired by the excellent performance of deep learning based techniques in various fields, we explore two types of CNNs architectures for CE forensics: pixel-domain CNNs (P-CNN) and histogram-domain CNNs (H-CNN). Especially for P-CNN, high-pass filter is used to reduce the affect of image contents and keep the data distribution balance cooperating with batch normalization . Additionally, the width of architecture is experimentally designed to learn better feature representation for CE forensics. Besides, as a lower dimensional yet effective feature, the histogram with 256 dimensions is fed into CNNs for constructing H-CNN. Experimental results show that our proposed methods outperform the state-of-the-arts schemes in the case of uncompression and comparable performance in the cases of pre-JPEG compression, anti-forensics attack, and CE level variation.
2 Proposed Robust Algorithm for detecting Contrast Enhancement Images
The existing algorithms are not robust against pre-JPEG compression, anti-forensics attack and CE level variation. In this paper, two deep learning-based algorithms, data-driven framework, are proposed to detect contrast-enhanced images by auto-learning effective features from database: P-CNN and H-CNN. Specifically, their architectures are as shown in Fig.1.
2.1 Pixel-Domain Convolutional Neural Networks
As a common way of contrast enhancement, gamma correction can be found in many image-editing tools. In this paper, we mainly focus on the detection of gamma correlation, which is typically defined as,
where denotes an input and represents the mapped value, . In order to simplify the discussion, the mapped value, , is normalized:
where . As well known, gamma correction would lead to the nonlinear changes in pixel domain and introduce the peak/gap bins into histogram domain [1-4]. A number of handcrafted features are designed based on such phenomenons.
In pixel domain, the difference between the original and enhanced images can be computed as follows, and the absolute value of difference is considered.
It can be seen from (3) that the discriminability in pixel domain is related with pixel value (image contents), , and parameter of gamma correction, . In order to describe such discriminability, the maximum of difference denoted by is considered. is obtained when partial derivative of with respect to is equal to .
The curve of function of on is shown in Fig.2. For the purposes of understanding, four groups of parameters are chosen in the following discussion: . It is easy to find that () () () ().
Fortunately, in spite of the changes of discriminability in pixel domain, the difference in pixel domain could be learned by deep learning-based method. Inspired by it, the P-CNN is proposed to detect enhanced image. The design of P-CNN is as follows.
Firstly, the high-pass filter is added into the front-end of architecture to eliminate the interfere of image content. Another advantage of using high-pass filter could be that it accelerates training by cooperating with batch normalization. Because that the histogram of high-pass filtered images approximately follows the generalized Gaussian distribution, which is similar to batch normalization . In particular, we experimentally find that the filter of the first-order difference along horizontal direction has better performance.
where , is the input image, is the output of the first layer, ’*’ represents the convolution operator.
Next, high-pass filtering layer are followed by four traditional convolutional layers. For each layer, there are four types of operations: convolution, batch normalization, ReLU and average pooling. The feature maps for each layer are 64, 16, 32, 128, respectively. The kernel size for convolutional and pooling operation is 3x3 with 1 stride, 5x5 with 2 strides. It should be pointed out that: 1) we experimentally find that the numbers of feature map for first convolutional layer is important for CE detection and it has better performance when the feature maps is 64. In other words, low-level feature would be more helpful; 2) instead of average pooling, the spatial pyramid pooling layer  is used in last convolutional layer to fuse multi-scale features. The convolutional layer is calculated as
where represents the batch normalization, ReLU, average pooling, and spatial pyramid pooling, respectively. For spatial pyramid pooling, three scales are chosen and lead to 2688 dimensional output.
In the end, the fully connected layer and softmax is followed by a multinomial logistic loss. The loss function is defined as,
where is the number of classes and
denotes the true label. In our experimental setup, Mini-batch Stochastic Gradient Descent is applied and the batch size is set as 120. The learning rate is initialized as 0.001, and scheduled to decrease 10% for every 10000 iterations. The max iterations is 100000. The momentum and weight_decay are fixed to 0.9 and 0.0005, respectively.
2.2 Histogram-Domain Convolutional Neural Networks
According to the report , the handcrafted feature based on histogram is also vulnerable. The peak and gap feature is easily destroyed by pre-JPEG compression and anti-forensic attacks. In order to detect the CE of JPEG compressed images, Cao et al. only used the numbers of gap bins as features. However, its performance for different gamma parameters is unstable and it does not work for anti-forensics attack, which could be caused by the unsteadiness of gap bins.
The reason why gamma correction could cause gap bins is that a strait range of values is projected to the wide one. For example, the values in the range will be changed to the range of
. Therefore, the probability of gap bins (zero bins) should be proportional to the ratio of wide range of values and corresponding strait range,
It can be found that , which means that the numbers of gap bins is ranged among CE parameters. The statistical distribution of gap bins for the original and enhanced images with is shown in Fig. 3. As can be seen that the numbers of gap bins for are larger than and the overlapping parts with original images for are less than
, which is consistent with the result of our theoretical analysis. Despite the instability of peak/gap bins, we believe that the effective feature could be auto-learned from histogram domain by using data-driven algorithm. Instead of designing features, the histogram-domain convolutional neural networks is constructed to achieve end-to-end self-learning detection. The H-CNN is proposed to self-learn better feature directly from histogram domain. In addition, as an input with low and fixed dimension, the histogram is suitable for convolutional neural networks. The architecture of H-CNN is shown in Fig 1 (b). Its input is the histogram of the image, namely a vector with 1x256 dimensions. Then, such an input layer is followed by two convolutional and three fully connected layers. The feature maps are 64, 64, 512, 1024, 2, respectively. Lastly, the softmax layer followed by a multinomial logistic loss is added to classify original and enhanced images. The parameters of convolutional layers and hyper-parameters are the same as the P-CNN.
3 Experimental Results
In order to verify the validity of proposed methods, four groups of experiments are conducted: ORG VS P-CE, JPEG-ORG VS JPEG-CE, ORG VS Anti-CE, and JPEG-ORG VS JPEG-CE-Anti-CE, where ORG is original images in uncompressed format, JPEG-ORG represents original images in JPEG format, P-CE and JPEG-CE denote enhanced versions of ORG and JPEG-ORG, respectively, and Anti-CE and JPEG-CE-Anti-CE represent enhanced images with anti-forensics attack for P-CE, JPEG-CE, respectively. The BOSSBase  with 10000 images is chosen to construct the dataset. Firstly, the images are centrally cropped into 128x128 pixel patches as ORG. Then, JPEG compression with is carried out for ORG to build JPEG-ORG. Next, gamma correction with
is implemented on ORG, JPEG-ORG to constitute P-CE and JPEG-CE. In the end, Anti-CE is produced by anti-forensics attacks [12,14] on P-CE. It should be noted that the reasons for our choice of pixe patch size are that 1) the detection for the images with lower resolution is much harder than higher resolution image; 2) 128x128 is a suitable size for tamper locating based on CE forensics; 3) our hardware configuration is limited. For each experiment, the training data, validation and testing data is 8000, 2000, 10000, respectively. The experiments about the proposed schemes are conducted on one GPU (NVIDIA TITAN X) with an open source framework of deep learning: Caffe .
3.1 Contrast Enhancement Detection For Contrast-Enhanced Images
The result for contrast-enhanced images in uncompressed format, is as shown in Table I. P-CNN is pixel-domain convolutional neural networks and H-CNN is histogram-domain convolutional neural networks. As seen from the Table 1, for Cao’s method, the detection accuracy for is much higher than one for . The reason is that gap feature is unstable among CE parameters, which is consistent with our analysis in Section II.
Inspired by transfer learning techniques , we further improve performance of P-CNN by finetuning the model forfrom the model for . P-CNN-FT achieve better performance than De Rosa’s and Cao’s methods and H-CNN have much better performance than the state-of-the-art schemes. It should be noted that the performance of H-CNN is better and more stable than the others. Such results demonstrated that the histogram domain feature should be effective for CE detection.
3.2 Robustness Against Pre-JPEG Compressed and Anti-Forensic Attacked Contrast-Enhanced Images
The performance of different methods for pre-JPEG compressed images with and anti-forensics attacked images are shown in Table 2,3,4 and 5. It can be seen from Table II that P-CNN and H-CNN have much higher detection accuracy than De Rosa’s and Cao’s methods and comparable performance with Li’s method. Besides, there is an interesting phenomenon that the performance of P-CNN has a significant improvement compared to P-CE detection. The reason may be attributed to that JPEG compression weakens the signal components in high frequence and the difference between original and enhanced images after JPEG compressing would be highlighted.
For anti-forensic attacks, Cao’s method does not work and there is a degradation in performance of H-CNN, especially, when anti-forensic method  is applied. Because that the anti-forensic attacks would conceal the peak/gap feature in histogram domain. In addition, the anti-forensics attacks based on histogram maybe has no or slight effect on pixel domain. Therefore, the P-CNN has best performance in this case. While the pre-compression and anti-forensic attack are put into together, as shown in Table 5, the proposed method have comparable with Li’ scheme.
In conclusion, De Rosa’s method is not robust for pre-JPEG compression and anti-forensics attack and Cao’s method is vulnerable for anti-forenisic attack. Furthermore, such prior algorithms are unstable in different gamma levels. Although Li’s method is better than previous works in the case of pre-JPEG compression and anti-forensic attack, its performance is unsatisfactory when no other operation is used. Comparing with the above schemes, the proposed P-CNN and H-CNN, achieve good robustness against pre-JPEG compression, anti-forenic attack, and CE level variation and H-CNN achieve much better performance in the case of no other operation.
3.3 Effect of the scale of training data
It is well known that the scale of data has an important effect on performance for deep-learning based method. In this part, we conducted experiments to evaluate the effect of the scale of data on performance of H-CNN and P-CNN. The images from BOSSBase are firstly cropped into 128x128 pixel patches with non-overlapping. Then these images are enhanced with . We randomly chose 80000 image pairs as test data and 5000, 20000, 40000, 80000 image pairs as training datas. Four groups of H-CNN, P-CNN are generated using above four training datas and the test data is same for these experiments. The result is as shown in Figure.4. It can be seen that the scale of training data has an slight effect on H-CNN with small parameters and the opposite happens for P-CNN. Therefore, larger scale of training data be beneficial to the performance of P-CNN and the performance of P-CNN would be improved by increasing training data.
The existing schemes for contrast enhancement forensics have an unsatisfactory performance, especially, in the cases of pre-JPEG comression and anti-forensic attacks. To deal with such problems, in this paper two robust CE forensics algorithms based on deep learning (H-CNN, P-CNN) are proposed. Such methods achieve end-to-end classification based on pixel and histogram domain. Experimental results show that our proposed H-CNN attains better performance than the state-of-the-art ones in the case of no other operation and proposed methods are robust against pre-JPEG compression, anti-forensic attack, and CE level variation.
-  M. C. Stamm and K. J. R. Liu, “Blind forensics of contrast enhancement in digital images,” in Proc. IEEE Int. Conf. on Image Processing, October 2008
-  M. Stamm and K. Liu, “Forensic detection of image manipulation using statistical intrinsic fingerprints,” Information Forensics and Security, IEEE Transactions on, vol. 5, no. 3, pp. 492–506, Sept 2010.
-  M. C. Stamm and K. J. R. Liu, “Forensic estimation and reconstruction of contrast enhancement mapping,” in Proc. IEEE Int. Conf. on Acoustics, Speech and Signal Processing, March 2010, pp. 1698–1701.
-  G. Cao, Y. Zhao, R. Ni, and X. Li, “Contrast enhancement based forensics in digital images,” IEEE Transactions on Information Forensic and Security, vol. 9, pp. 515–525, March 2014
-  Li H, Luo W, Qiu X, et al. Identification of various image operations using residual-based features[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2016.
-  Lin X, Li C T, Hu Y. Exposing image forgery through the detection of contrast enhancement[C]//Image Processing (ICIP), 2013 20th IEEE International Conference on. IEEE, 2013: 4467-4471.
-  Lin X, Wei X, Li C T. Two improved forensic methods of detecting contrast enhancement in digital images[C]//Media Watermarking, Security, and Forensics 2014. International Society for Optics and Photonics, 2014.
-  Wen L, Qi H, Lyu S. Contrast Enhancement Estimation for Digital Image Forensics[J]. arXiv preprint arXiv:1706.03875, 2017.
-  A. De Rosa, M. Fontani, M. Massai, A. Piva, and M. Barni, “Second-order statistics analysis to cope with contrast enhancement counter- forensics,” IEEE Signal Processing Letters, vol. 22, pp. 1132–1136, August 2015
-  H. Farid, “Blind Inverse Gamma Correction,” IEEE Transactions on Image Processing, vol. 10, no. 10, pp. 1428–1433, Oct 2001.
-  A. C. Popescu and H. Farid, “Statistical Tools for Digital Forensics,” 6th Intl. Work. on Info. Hiding & LNCS, vol. 3200, pp. 128–147, May 2004
-  G. Cao, Y. Zhao, and R. Ni, “Forensic estimation of gamma correctionin digital images,” in Proc. IEEE Int. Conf. on Image Processing, Sept 2010, pp. 2097–2100
-  Wang P, Liu F, Yang C, et al. Parameter estimation of image gamma transformation based on zero-value histogram bin locations[J]. Signal Processing: Image Communication, 2018.
-  M. Barni, M. Fontani, and B. Tondi, “A universal technique to hide traces of histogram-based image manipulations,” in Proc. of the ACM Workshop on Multimedia and Security, 2012, pp. 97–104.
-  G. Cao, Y. Zhao, R. Ni, and H. Tian, “Anti-forensics of contrast enhancement in digital images,” in Proc. of the ACM Workshop on Multimedia and Security, 2010, pp. 25–34.
-  C.-W. Kwok, O. C. Au, and S.-H. Chui, “Alternative anti-forensics method for contrast enhancement,” in Proc. of the Int. Conf. on Digital- Forensics and Watermarking, 2012, pp. 398–410.
-  P. Comesana-Alfaro and F. Perez-Gonzalez, “Optimal counterforensics for histogram-based forensics,” in Proc. IEEE Int. Conf. on Acoustics, Speech and Signal Processing, May 2013, pp. 3048–3052.
-  G. Cao, Y. Zhao, R. Ni, H. Tian, and L. Yu, “Attacking contrast enhancement forensics in digital images,” Science China Information Sciences, vol. 57, no. 5, pp. 1–13, 2014.
-  Ravi H, Subramanyam A V, Emmanuel S. ACE–An Effective Anti-forensic Contrast Enhancement Technique[J]. IEEE Signal Processing Letters, 2016, 23(2): 212-216.
-  He K, Zhang X, Ren S, et al. Spatial pyramid pooling in deep convolutional networks for visual recognition[J]. IEEE transactions on pattern analysis and machine intelligence, 2015, 37(9): 1904-1916.
Ioffe S, Szegedy C. Batch normalization: Accelerating deep network training by reducing internal covariate shift[C]//International conference on machine learning. 2015: 448-456.
-  http://agents.fel.cvut.cz/stegodata/
-  An open source framework of deep learning: http://caffe.berkeleyvision.org/
-  Pan S J, Yang Q. A survey on transfer learning[J]. IEEE Transactions on knowledge and data engineering, 2010, 22(10): 1345-1359.