Dual-Domain Fusion Convolutional Neural Network for Contrast Enhancement Forensics

10/17/2019 ∙ by Pengpeng Yang, et al. ∙ 0

Contrast enhancement (CE) forensics techniques have always been of great interest for image forensics community, as they can be an effective tool for recovering image history and identifying tampered images. Although several CE forensic algorithms have been proposed, their accuracy and robustness against some kinds of processing are still unsatisfactory. In order to attenuate such deficiency, in this paper we propose a new framework based on dual-domain fusion convolutional neural network to fuse the features of pixel and histogram domains for CE forensics. Specifically, we first present a pixel-domain convolutional neural network (P-CNN) to automatically capture the patterns of contrast-enhanced images in the pixel domain. Then, we present a histogram-domain convolutional neural network (H-CNN) to extract the features in the histogram domain. The feature representations of pixel and histogram domains are fused and fed into two fully connected layers for the classification of contrast-enhanced images. Experimental results show that the proposed method achieve better performance and is robust against pre-JPEG compression and anti-forensics attacks. In addition, a strategy for performance improvement of CNN-based forensics is explored, which could provide guidance for the design of CNN-based forensics tools.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 7

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Being a simple yet efficient image processing operation, CE is typically used by malicious image attackers to eliminate inconsistent brightness when generating visually imperceptible tampered images. CE detection algorithms play an important role in decision analysis for authenticity and integrity of digital images. Although some schemes have been proposed to detect contrast-enhanced images, the performance of such techniques is limited in the cases of pre-JPEG compression and anti-forensic attacks. Therefore, it is critical to develop robust and effective CE forensics algorithms.

Thanks to the efforts of researches in the past decade, a number of schemes Stamm and Liu (2008, 2010a, 2010b); Cao et al. (2014a); Li et al. (2016); Lin et al. (2013, 2014); Wen et al. (2018); De Rosa et al. (2015) has been proposed to discriminate contrast-enhanced images in uncompressed format. Stamm et al. Stamm and Liu (2008, 2010a, 2010b) found that contrast enhancement would introduce peaks and gaps into the image’s gray level histogram, which led to specific high values in high-frequency components. Lin et al. Lin et al. (2013, 2014)

revealed that contrast enhancement would disturb the inter-channel correlation left by color image interpolation and they measured such correlation to distinguish the enhanced images from the original images. Furthermore, in order to recover the image processing history, many algorithms for estimating parameters for contrast-enhanced images have been developed

Farid (2001); Popescu and Farid (2004); Cao et al. (2010b); Wang et al. (2018).

Despite the good performance obtained by the abovementioned algorithms, their robustness can be unsatisfactory in some cases, such as the CE of JPEG images (pre-JPEG compression) and the occurrence of anti-forensic attacks Barni et al. (2012); Cao et al. (2010a); Kwok et al. (2011); Comesana-Alfaro and Pérez-González (2013); Cao et al. (2014b); Ravi et al. (2015). The reason lies in that the fingerprint left by CE operation would be altered. Based on such a phenomenon, some researchers proposed more robust CE forensic algorithms, which can be divided into two major branches: overcoming pre-JPEG compression Cao et al. (2014a) and defending against anti-forensic attacks De Rosa et al. (2015). Unfortunately, neither one of these methods is capable of addressing both pre-JPEG compression and anti-forensic attacks. To date there are no satisfactory solutions for these problems.

With the rapid development of deep learning technique, and especially convolutional neural networks (CNNs), some researchers have recently attempted to use them for digital image forensics. A number of preliminary works exploring CNNs in a single-domain (such as the pixel domain

Barni et al. (2018), the histogram domainZhang et al. (2018), and the gray-level co-occurrence matrix (GLCM) Sun et al. (2018); Shan et al. (2019)) has been proposed for CE forensics. According to the report Sun et al. (2018), deep learning-based CE forensic schemes have achieved better performance than traditional ones. The schemes mentioned above try to deal with CE forensics task by feeding single-domain information to CNNs. However, each domain has its own advantages and disadvantages. For example, according to our experiments, the CNN working in the pixel domain is robust to post-processing but hard to get satisfactory performance. In addition, it is well known that histogram domain is effective for CE forensics task but fails to resist to CE attacks. Such situations give us strong incentive to explore fusion algorithm across multiple domains based on deep learning technique against pre-JPEG compression and anti-forensic attacks.

In this paper, we propose a novel framework based on dual-domain fusion convolutional neural network for CE forensics. Specifically, pixel-domain CNN (P-CNN) is designed for the pattern extraction of contrast-enhanced image in pixel domain. For P-CNN, high-pass filter is used to reduce the affect of image contents and keep the data distribution balance cooperating with batch normalization

Ioffe and Szegedy (2015)

. In addition, the histogram-domain CNN (H-CNN) is constructed by feeding an histogram with 256 dimensions into convolutional neural network. The features obtained from P-CNN and H-CNN are fused together and fed into a classifier with two fully connected layers. Experimental results show that our proposed method outperforms state-of-the-art schemes in the case of uncompressed images and obtains comparable performance in the cases of pre-JPEG compression, anti-forensics attack, and CE level variation.

The main contributions of this paper are:

1) we present a dual-domain fusion framework for CE forensics;

2) we propose and evaluate two kinds of simple yet effective convolutional neural networks based on pixel and histogram domains;

3) we explore the design principle of CNN for CE forensics, specifically, adding the preprocessing, improving complexity of architecture, and selecting training strategy that includes fine-tune technique and data augmentation.

The rest of this paper is organized as follows. Section 2 describes related works in the field of CE forensics. In Section 3, we formulate the problem and in Section 4 we present the proposed dual-domain fusion CNN framework. In Section 5, experimental results are reported. Conclusion is given in Section 6.

2 Related Works

CE forensics, as a popular topic in image forensics community, has been study for a long time. Early research works attempt to extract features from the histogram domain. Stamm et al. Stamm and Liu (2008, 2010a, 2010b) observed that the histogram of contrast-enhanced images presents peak/gaps artifacts, in contrast, that of un-enhanced image does not occur the peak/gaps, as shown in Fig 1. Based on such observation, they proposed the histogram-based scheme that the high frequency energy metric is calculated and decided by threshold strategy. However, the above method failed to detect CE image in previously middle/lower quality JPEG compressed images in which the peak/gaps artifacts also exits Cao et al. (2014a). Cao et al. Cao et al. (2014a) studied this issue and found that there exists notable difference between the peak/gap artifacts from contrast enhancement and those from JPEG compression, which is that the gap bins with zero height always appear in contrast-enhanced images. But the above phenomenon does not occur in the case of anti-forensics attack. As can be seen in Fig 1, the histogram of enhanced image with anti-forensics attack conforms to a smooth envelope, which is similar with the un-enhanced image.

Figure 1: Histogram of uncompressed image, contrast enhanced image with , contrast enhanced image in the case of anti-forensic attack, JPEG image that quality factor is equal to 70, respectively.

Instead of exploring the features in histogram domain, De Rosa et al. De Rosa et al. (2015) studied the possibility of using second order statistics to detect contrast-enhanced images even in the case of anti-forensics attack. Specifically, the co-occurrence matrix of a gray-level image was explored. According to the reportDe Rosa et al. (2015), several empty rows and columns appears in the GLCM of contrast-enhanced images, as shown in Fig 2, even after the application of anti-forensics attackBarni et al. (2012)

. Based on this observation, the authors tried to extract such feature from the standard deviation of each column of the GLCM. However, its performance still not satisfactory, especially for the other powerful anti-forensics attack

Cao et al. (2010b).

These algorithms described are based on handcrafted low-level features which is not easy to deal with the above problems simultaneously. With the development of data-driven technique, some researchers have started to study the deep feature represents for CE forensics via data-driven approach recently and existing methods

Sun et al. (2018); Barni et al. (2018); Zhang et al. (2018); Shan et al. (2019) focus on exploring in single-domain. Barni et al. Barni et al. (2018)

present a CNN containing a total of 9 convolutional layers in pixel domain which is similar with the typical CNNs used in the field of computer vision. Cong

et al. Zhang et al. (2018) explore the information in histogram domain and apply the histogram with 256 dimensions into VGG-based multi-path network. Sun et al. Sun et al. (2018) propose to calculate the gray-level co-occurrence matrix (GLCM) and feed it to a CNN with 3 convolutional layers. Although these approaches based on deep features in single-domain have obtained performance gain for CE forensics, they ignore multi-domain information which could be useful in the case that some features in single-domain are destroyed.

To overcome these limitation of exiting works, we propose a new deep learning-based framework to extract and fuse feature representation in pixel and histogram domains for CE forensics.

Figure 2: GLCM of uncompressed image, contrast enhanced image with , contrast enhanced image in the case of anti-forensic attack, JPEG image that quality factor is equal to 70, respectively.

3 Problem Formulation

As a common way of contrast enhancement, gamma correction can be found in many image-editing tools. In addition, according to the reportBarni et al. (2018), enhanced-images with gamma correction is harder to be detected than the enhance-images via the other way. Therefore, in this paper, we mainly focus on the detection of gamma correlation, which is typically defined as,

(1)

where denotes an input and represents the re-mapped value, . The problem addressed in this paper is how to classify the given image as contrast enhanced or non-enhanced image. Particularly, the robustness of proposed method against pre-JPEG compression and anti-forensics attacks is evaluated.

4 Proposed Method

In this section, we first make an overview of the proposed framework dual-domain fusion convolutional neural network, and then introduce the major components in detail.

4.1 Framework Overview

The proposed dual-domain fusion convolutional neural network is shown in Fig 3, which extracts the features from pixel and histogram domains by P-CNN and H-CNN, respectively, and then fuses them before feeding into the classifier with two fully-connected layers. Our end-to-end system would predict whether the image is a contrast enhanced or non-enhanced image.

Figure 3: The proposed dual-domain fusion convolutional neural network.

4.2 Pixel-Domain Convolutional Neural Network

Convolutional neural networks (CNNs) in pixel domain have been applied in image forensics and developed for specific forensic tasks recently. The common modification Yang et al. (2016, 2017) for the CNNs in forensics community is to add preprocessing layer that could weaken the effect of image content and improve the signal noise ratio. Inspired by this observation, we experimental study on preprocessing and find effective way for CE forensics (Section 5.3.1). Due to the limitation of hardware, we design a simple 4 layers CNN to keep the balance between performance and computational complexity. The architecture of proposed pixel-domain convolutional neural network is shown in Fig 4.

Figure 4: The architecture of proposed pixel-domain convolutional neural networks.

Firstly, the high-pass filter is added into the front-end of architecture to eliminate the interfere of image content. Another advantage of using high-pass filter could be that it accelerates training by cooperating with batch normalization. Because that the histogram of high-pass filtered images approximately follows the generalized Gaussian distribution, which is similar to batch normalization

Ioffe and Szegedy (2015). In particular, we experimentally find that the filter of the first-order difference along horizontal direction has better performance.

(2)

where , is the input image, is the output of the first layer, ’*’ represents the convolution operator.

Next, high-pass filtering layer is followed by four traditional convolutional layers. For each layer, there are four types of operations: convolution, batch normalization, ReLU and average pooling. The feature maps for each layer are 64, 16, 32, 128, respectively. The kernel size for convolutional and pooling operation is 3x3 with 1 stride, 5x5 with 2 strides. It should be pointed out that: 1) we experimentally find that the numbers of feature map for first convolutional layer is important for CE detection and it has better performance when the feature maps is 64. In other words, low-level feature would be more helpful; 2) instead of average pooling, the spatial pyramid pooling (SPP) layer

He et al. (2015) is used in last convolutional layer to fuse multi-scale features. The convolutional layer is calculated as

(3)

where represent the batch normalization, ReLU, average pooling, and spatial pyramid pooling, respectively. For spatial pyramid pooling, three scales are chosen and lead to 2688 dimensional output.

In the end, the fully connected layer and softmax is followed by a multinomial logistic loss. The loss function is defined as,

(4)

where is the number of classes and

denotes the true label. In our experimental setup, Mini-batch Stochastic Gradient Descent is applied and the batch size is set as 120. The learning rate is initialized as 0.001, and scheduled to decrease 10% for every 10000 iterations. The max iterations is 100000. The momentum and weight_decay are fixed to 0.9 and 0.0005, respectively.

4.3 Histogram-Domain Convolutional Neural Network

Figure 5: The architecture of proposed histogram-domain convolutional neural networks.

As well known, gamma correction would lead to the non-linear changes in pixel domain and introduce the peak/gap bins into histogram domain Stamm and Liu (2008, 2010a, 2010b); Cao et al. (2014a)

. A number of handcrafted features have been designed based on such phenomenons. Instead of designing features, the histogram-domain convolutional neural networks (H-CNN) is constructed to achieve end-to-end self-learning detection. The H-CNN is proposed to self-learn better feature directly from histogram domain. In addition, as an input with low and fixed dimension, the histogram is suitable for convolutional neural networks. The architecture of H-CNN is shown in Fig 5. Its input is the histogram of the image, namely a vector with 1x256 dimensions. Then, such an input layer is followed by two convolutional and two fully connected layers. The feature maps are 64, 64, 512, 1024, respectively. Lastly, the softmax layer followed by a multinomial logistic loss is added to classify original and enhanced images. The parameters of convolutional layers and hyper-parameters are the same as the P-CNN.

4.4 Dual-domain Fusion Convolutional Neural Network

According to the description in Section 1,2, the performance of CE system designed in single-domain is still not satisfactory. Fortunately, fusion strategies Mangai et al. (2010) provide a good solution to obtain higher performance and have been adopted in the community of digital image forensics Yang et al. (2017); Fontani et al. (2013)

. In this work, we assume that the features extracted from P-CNN and H-CNN are complementary for CE forensics, thus we propose a simple yet effective feature fusion framework for deep learning-based CE forensics to integrate multiple domains and construct the dual-domains fusion CNN (DM-CNN), as shown in Fig 3. Firstly, high-pass filtered images and the histogram are extracted from input images. Then the filtered images are fed into P-CNN with four 2D-convolutional layers and the histogram is fed into H-CNN with two 1D-convolutional layers. Note that for the purpose of fusion, P-CNN and H-CNN are slightly modified. The P-CNN of DM-CNN is composed of the convolutional layers extracted from the P-CNN. Besides, in order to ensure that the outputs of the P-CNN and H-CNN have the same dimension, one scale of spatial pyramid pooling in P-CNN is chosen and the number of feature map in the second convolutional layer of H-CNN is set to 128. The features output from of P-CNN and H-CNN are concatenated together and then fed into classification unit, which consists of two fully connected layers and one softmax layer followed by multinomial logistic loss. It is worth noting that due to the limitation of our hardware configuration, only dual-domains are fused in our system and it would be useful to ensemble features from the other domains.

5 Experimental Results

In order to verify the validity of proposed methods, we compared them with four other methods. De Rosa De Rosa et al. (2015), Cao Cao et al. (2014a) and Sun Sun et al. (2018) are proposed for CE forensic. The former two algorithms belong to traditional scheme and the last one is based on deep learning technique. Li Li et al. (2016) is proposed to identify various image operations using high-dimensional residual-based features. Four groups of experiments are conducted: ORG vs P-CE, JPEG-ORG vs JPEG-CE, ORG vs Anti-CE, and JPEG-ORG vs JPEG-CE-Anti-CE, where ORG is original images in uncompressed format, JPEG-ORG represents original images in JPEG format, P-CE and JPEG-CE denote enhanced versions of ORG and JPEG-ORG, respectively, and Anti-CE and JPEG-CE-Anti-CE represent enhanced images with anti-forensics attack for P-CE, JPEG-CE, respectively. The BOSSBase 12 with 10000 images is chosen to construct the dataset. Firstly, the images are centrally cropped into 128x128 pixel patches as ORG. Then, JPEG compression with is carried out for ORG to build JPEG-ORG. Next, gamma correction with is implemented on ORG, JPEG-ORG to constitute P-CE and JPEG-CE. In the end, Anti-CE is produced by anti-forensics attacks Cao et al. (2010b); Barni et al. (2012)

on P-CE and JPEG-CE. The reasons for our choice of pixel patch size are that 1) the detection for the images with lower resolution is much harder than higher resolution image; 2) 128x128 is a suitable size for tamper locating based on CE forensics; 3) our hardware configuration is limited. For each experiment, the training data, validation and testing data is 8000, 2000, 10000, respectively. The experiments about the proposed schemes are conducted on one GPU (NVIDIA TITAN X) with an open source framework of deep learning: Caffe

13.

5.1 Contrast Enhancement Detection: ORG vs PCE

The result for contrast-enhanced images in uncompressed format, is as shown in Table 1. P-CNN is pixel-domain convolutional neural networks and H-CNN is histogram-domain convolutional neural networks. DM-CNN denotes the dual-domain fusion CNN. As seen from the Table 1, for Cao’s method, the detection accuracy for is much higher than one for . The reason is that gap feature is unstable among CE parameters, which is consistent with our analysis in Section III. In addition, H-CNN has better performance than the above four schemes. Such results demonstrated that the histogram domain feature should be effective for CE detection. Besides, proposed fusion framework, DM-CNN, obtains best average detection accuracy. It should be mentioned that although the deep learning-based method proposed by Sun obtained slightly lower detection accuracy than DM-CNN, it has a much higher computational cost during the feature extraction of the GLCM in preprocessing.

Method AVE
De RosaDe Rosa et al. (2015) 94.02% 84.85% 78.37% 74.12% 82.84%
CaoCao et al. (2014a) 93.89% 93.90% 80.26% 81.40% 87.36%
LiLi et al. (2016) 93.63% 89.48% 90.76% 93.44% 91.83%
SunSun et al. (2018) 99.35% 99.21% 98.45% 98.80% 98.95%
P-CNN 94.70% 89.00% 78.00% 86.00% 86.93%
H-CNN 99.48% 99.45% 99.40% 99.07% 99.35%
DM-CNN 99.80% 99.72% 99.36% 99.41% 99.57%
Table 1: CE detection accuracy for contrast-enhanced images in the case that ORG vs P-CE. AVE is the average accuracy. Best results are marked in bold.

5.2 Robustness Against Pre-JPEG Compressed and Anti-Forensic Attacked Contrast-Enhanced Images

The performance of different methods for pre-JPEG compressed images with and anti-forensics attacked images are shown in Table 2, 3, 4. It can be seen from Table 2 that P-CNN, H-CNN, DM-CNN have much higher detection accuracy than De Rosa’s and Cao’s methods and comparable performance with the algorithms proposed by Li and Sun. Besides, there is an interesting phenomenon that the performance of P-CNN has a significant improvement compared to P-CE detection. The reason may be attributed to that JPEG compression weakens the signal components in high frequence and the difference between original and enhanced images after JPEG compressing would be highlighted.

QF Method AVE
De RosaDe Rosa et al. (2015) 81.50% 79.69% 75.16% 72.70% 77.26%
CaoCao et al. (2014a) 93.96% 93.75% 80.36% 81.57% 87.41%
LiLi et al. (2016) 99.11% 98.59% 97.75% 98.43% 98.47%
50 SunSun et al. (2018) 99.73% 99.62% 99.40% 99.75% 99.63%
P-CNN 98.20% 98.25% 96.70% 97.30% 97.61%
H-CNN 99.90% 99.80% 99.50% 99.78% 99.75%
DM-CNN 99.97% 99.90% 99.86% 99.96% 99.92%
De RosaDe Rosa et al. (2015) 83.99% 82.27% 77.47% 72.95% 80.67%
CaoCao et al. (2014a) 94.06% 93.77% 80.55% 81.56% 87.49%
LiLi et al. (2016) 98.54% 97.42% 96.22% 97.79% 97.49%
70 SunSun et al. (2018) 99.32% 99.12% 99.14% 98.89% 99.12%
P-CNN 98.60% 97.00% 95.70% 96.50% 96.95%
H-CNN 98.86% 99.03% 98.27% 97.68% 98.46%
DM-CNN 99.68% 99.51% 99.06% 99.40% 99.41%
Table 2: CE detection accuracy for pre-JPEG compressed images with different QFs. AVE is the average accuracy. Best results are marked in bold.

For anti-forensic attacks, Cao’s method does not work and there is a degradation in performance of H-CNN, especially, when anti-forensic method Cao et al. (2010b) is applied. Because that the anti-forensic attacks would conceal the peak/gap feature in histogram domain. In addition, the anti-forensics attacks based on histogram maybe have a slight effect on pixel domain. Therefore, the P-CNN has better performance than H-CNN in this case. When the fusion framework is used to merge pixel and histogram domains together, DM-CNN obtained the best detection accuracy. While the pre-compression and anti-forensic attack are put into together, as shown in Table 4, the proposed CNN gains comparable performance with Li and Sun’ scheme.

In conclusion, De Rosa’s method is not robust for pre-JPEG compression and anti-forensics attack and Cao’s method is vulnerable for anti-forenisic attack. Furthermore, such prior algorithms are unstable in different gamma levels. Although Li’s method based on high dimensional features is better than previous works in the case of pre-JPEG compression and anti-forensic attack, its performance is unsatisfactory when no other operation is used. The deep learning-based method proposed by Sun obtained slight lower detection accuracy than the proposed DM-CNN, but it has a much higher computational cost during the feature extraction of the GLCM in preprocessing. Comparing with the above schemes, the proposed DM-CNN achieves good robustness against pre-JPEG compression, anti-forensic attacks, and CE level variation and obtains the best average detection accuracy in all cases studied.

Attack Method AVE
De RosaDe Rosa et al. (2015) 61.67% 58.83% 55.32% 59.33% 58.79%
CaoCao et al. (2014a)
LiLi et al. (2016) 96.30% 95.54% 95.72% 96.55% 96.03%
Cao et al. (2010b) SunSun et al. (2018) 95.53% 89.94% 90.55% 92.42% 92.11%
P-CNN 97.90% 96.00% 96.50% 96.55% 96.74%
H-CNN 88.77% 73.65% 74.85% 78.42% 78.92%
DM-CNN 97.85% 95.97% 96.68% 97.18% 96.92%
De RosaDe Rosa et al. (2015) 69.85% 66.03% 62.29% 64.42% 65.65%
CaoCao et al. (2014a)
LiLi et al. (2016) 99.57% 99.38% 99.33% 99.51% 99.48%
Barni et al. (2012) SunSun et al. (2018) 99.48% 99.07% 99.08% 99.19% 99.21%
P-CNN 98.60% 98.50% 97.80% 98.00% 98.21%
H-CNN 98.82% 97.59% 97.57% 97.09% 97.77%
DM-CNN 99.72% 99.78% 99.70% 99.59% 99.70%
Table 3: CE detection accuracy in the case of anti-forensics attacks. denotes that the method does not work in this case. AVE is the average accuracy. Best results are marked in bold.
QF Method AVE
De RosaDe Rosa et al. (2015) 70.26% 67.85% 65.38% 66.52% 67.50%
CaoCao et al. (2014a)
LiLi et al. (2016) 99.90% 99.90% 99.90% 99.90% 99.90%
50 SunSun et al. (2018) 99.75% 99.63% 99.68% 99.57% 99.66%
P-CNN 99.90% 99.90% 99.90% 99.90% 99.90%
H-CNN 99.45% 99.40% 99.20% 99.20% 99.31%
DM-CNN 99.93% 99.96% 99.97% 99.94% 99.95%
De RosaDe Rosa et al. (2015) 68.68% 65.61% 62.24% 63.93% 65.12%
CaoCao et al. (2014a)
LiLi et al. (2016) 99.90% 99.90% 99.90% 99.90% 99.90%
70 SunSun et al. (2018) 99.32% 99.34% 98.60% 99.03% 99.07%
P-CNN 99.80% 99.75% 99.55% 99.80% 99.73%
H-CNN 97.35% 98.35% 97.80% 98.15% 97.91%
DM-CNN 99.92% 99.94% 99.95% 99.90% 99.93%
Table 4: CE detection accuracy for JPEG compressed images with different QFs and anti-forensics attack Cao et al. (2010b). denotes that the method does not work in this case. AVE is the average accuracy. Best results are marked in bold.

5.3 Exploration on the Strategy to Improve Performance of CNN-based CE Forensics

Although numerous deep learning-based schemes have been proposed for digital image forensics, to the best of our knowledge, until now no one focus on exploring the strategy for performance improvement of single CNN-based CE forensics. However, it is important for the neophyte to design the new CNN architecture in the community of image forensics. In order to fill such gap, we make a preliminary exploration in this work. Specifically, there are three parts: adding the preprocessing, improving complexity of architecture, and selecting training strategy, which includes fine-tune technique and data augmentation.

5.3.1 Preprocessing

Through protracted and unremitting efforts of researchers, the deep learning technique developed for computer vision (CV) tasks has been succeeded in image forensics. Differing from CV related tasks, classification on image forensic has little relation to the image content. Therefore, preprocessing technique evolved into a universal way to improve the signal-to-noise ratio (SNR). High-pass filtering has become one of most popular means in preprocessing stage. In this part, using P-CNN in the case of

as an example, we evaluate six kinds of high-pass filters, H1, V1, H2, V2, LAP, HP, respectively, that widely applied into image forensics and compare them with the case without preprocessing. The definition of these filter are shown in Table. 5 and performance of the above cases is presented in Fig 7. means the case without preprocessing. It can be seen that it is not good for CE forensic when non-preprocessing is used. In addition, first-order difference along horizontal direction has better performance. At the same time, the HP and LAP filter proposed for the other forensic task obtained worse performance, which indicates that it is necessary for image forensics to design different high-pass filters.

Table 5: The filters evaluated in this work.

5.3.2 Powerful Convolutional Neural Networks

Thanks to the development of deep learning technique in CV, more powerful CNNs (ResNet, XceptionNet, SENet) spring up at an increasing rate in recent years. However, because of the limitations in the forensics community, such as insufficient training dataset and hardware configuration, it would be difficult to evaluate all of them. In order to verify the effectiveness of powerful CNN in CE forensics, based on P-CNN, we replace its traditional convolutional layers with residual blocks that proposed in ResNet18. The result is shown in Fig 7. Comparing with the case of H1, detection accuracy of the Res_H1 increases by 0.65%. The above discussion, we make a conclusion that for CE forensics, powerful CNNs would enhance performance and preprocessing plays a more important role.

Figure 6: Performance on P-CNN with/without preprocessing and with powerful network. NON means the case of P-CNN without preprocessing. The others represent the P-CNN with LAP, V2, H2, V1, H1 filter in the preprocessing, respectively. Res_H1 denotes the P-CNN with H1 filter and residual blocks.

5.3.3 Training Strategy

It is well known that the scale of data has an important effect on performance for deep-learning based method and transfer learning technique

Pan and Yang (2009) also provide an effective strategy to train the CNN model. In this part, we conducted experiments to evaluate the effect of the scale of data and transfer learning strategy, respectively, on performance of CNN. For the former, the images from BOSSBase are firstly cropped into 128x128 pixel patches with non-overlapping. Then these images are enhanced with . We randomly chose 80000 image pairs as test data and 5000, 20000, 40000, 80000 image pairs as training datas. Four groups of H-CNN, P-CNN are generated using above four training datas and the test data is same for these experiments. The result is as shown in Figure 8. It can be seen that the scale of training data has a slight effect on H-CNN with small parameters and the opposite happens for P-CNN. Therefore, larger scale of training data is beneficial to the performance of P-CNN with more parameters and the performance of P-CNN would be improved by enlarging training data. For the latter, we compare the performance of P-CNN with/without transfer learning in the cases of . The P-CNN with transfer learning by finetuning the model for from the model for . As shown in Fig 9, P-CNN-FT achieves better performance than P-CNN.

Figure 7: Effect of the scale of training data.
Figure 8: Performance of the P-CNN and the P-CNN with fine-tune (P-CNN-FT).

6 Conclusion

The existing schemes for contrast enhancement forensics have an unsatisfactory performance, especially, in the cases of pre-JPEG compression and anti-forensic attacks. To deal with such problems, in this paper, a new deep learning-based framework dual-domain fusion convolutional neural networks (DM-CNN) is proposed. Such method achieve end-to-end classification based on pixel and histogram domains, which obtain great performance. Experimental results show that our proposed DM-CNN achieve better performance than the state-of-the-art ones and proposed method is robust against pre-JPEG compression, anti-forensic attack, and CE level variation. Beside, we explored on the strategy to improve performance of CNN-based CE forensics, which could provide guidance for the design of CNN-based forensics.

In sprite of good performance of exiting schemes, it is still a hard task to detect CE images in the case of post-JPEG compression with lower quality factors. The new algorithm should be designed to deal with this problem. In addition, the security of CNNs has drawn a lot of attention. Therefore, improving the security of CNNs is worth studying in the future.

7 Acknowledgements

This work was supported in part by the National Key Research and Development of China (No. 2016YFB0800404), the National NSF of China (Nos. 61672090, 61532005, 61332012, 61401408) and the Fundamental Research Funds for the Central Universities (Nos. 2018JBZ001, 2017YJS054). Pengpeng yang would like to acknowledge the CHINA SCHOLARSHIP COUNCIL, State Scholarship Fund, that supports his joint Ph.D program.

References

References

  • M. Barni, A. Costanzo, E. Nowroozi, and B. Tondi (2018) CNN-based detection of generic contrast adjustment with jpeg post-processing. In 2018 25th IEEE International Conference on Image Processing (ICIP), pp. 3803–3807. Cited by: §1, §2, §3.
  • M. Barni, M. Fontani, and B. Tondi (2012) A universal technique to hide traces of histogram-based image manipulations. In Proceedings of the on Multimedia and security, pp. 97–104. Cited by: §1, §2, Table 3, §5.
  • G. Cao, Y. Zhao, R. Ni, and X. Li (2014a) Contrast enhancement-based forensics in digital images. IEEE transactions on information forensics and security 9 (3), pp. 515–525. Cited by: §1, §1, §2, §4.3, Table 1, Table 2, Table 3, Table 4, §5.
  • G. Cao, Y. Zhao, R. Ni, H. Tian, and L. Yu (2014b) Attacking contrast enhancement forensics in digital images. Science China Information Sciences 57 (5), pp. 1–13. Cited by: §1.
  • G. Cao, Y. Zhao, R. Ni, and H. Tian (2010a) Anti-forensics of contrast enhancement in digital images. In Proceedings of the 12th ACM Workshop on Multimedia and Security, pp. 25–34. Cited by: §1.
  • G. Cao, Y. Zhao, and R. Ni (2010b) Forensic estimation of gamma correction in digital images. In 2010 IEEE International Conference on Image Processing, pp. 2097–2100. Cited by: §1, §2, §5.2, Table 3, Table 4, §5.
  • P. Comesana-Alfaro and F. Pérez-González (2013) Optimal counterforensics for histogram-based forensics. In 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 3048–3052. Cited by: §1.
  • A. De Rosa, M. Fontani, M. Massai, A. Piva, and M. Barni (2015) Second-order statistics analysis to cope with contrast enhancement counter-forensics. IEEE Signal Processing Letters 22 (8), pp. 1132–1136. Cited by: §1, §1, §2, Table 1, Table 2, Table 3, Table 4, §5.
  • H. Farid (2001) Blind inverse gamma correction. IEEE Transactions on Image Processing 10 (10), pp. 1428–1433. Cited by: §1.
  • M. Fontani, T. Bianchi, A. De Rosa, A. Piva, and M. Barni (2013) A framework for decision fusion in image forensics based on dempster–shafer theory of evidence. IEEE Transactions on Information Forensics and Security 8 (4), pp. 593–607. Cited by: §4.4.
  • K. He, X. Zhang, S. Ren, and J. Sun (2015) Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE transactions on pattern analysis and machine intelligence 37 (9), pp. 1904–1916. Cited by: §4.2.
  • [12] Http://agents.fel.cvut.cz/stegodata/. Cited by: §5.
  • [13] Http://caffe.berkeleyvision.org. Cited by: §5.
  • S. Ioffe and C. Szegedy (2015) Batch normalization: accelerating deep network training by reducing internal covariate shift. arXiv preprint arXiv:1502.03167. Cited by: §1, §4.2.
  • C. Kwok, O. C. Au, and S. Chui (2011) Alternative anti-forensics method for contrast enhancement. In International Workshop on Digital Watermarking, pp. 398–410. Cited by: §1.
  • H. Li, W. Luo, X. Qiu, and J. Huang (2016) Identification of various image operations using residual-based features. IEEE Transactions on Circuits and Systems for Video Technology 28 (1), pp. 31–45. Cited by: §1, Table 1, Table 2, Table 3, Table 4, §5.
  • X. Lin, C. Li, and Y. Hu (2013) Exposing image forgery through the detection of contrast enhancement. In 2013 IEEE international conference on image processing, pp. 4467–4471. Cited by: §1.
  • X. Lin, X. Wei, and C. Li (2014) Two improved forensic methods of detecting contrast enhancement in digital images. In Media Watermarking, Security, and Forensics 2014, Vol. 9028, pp. 90280X. Cited by: §1.
  • U. G. Mangai, S. Samanta, S. Das, and P. R. Chowdhury (2010) A survey of decision fusion and feature fusion strategies for pattern classification. IETE Technical review 27 (4), pp. 293–307. Cited by: §4.4.
  • S. J. Pan and Q. Yang (2009) A survey on transfer learning. IEEE Transactions on knowledge and data engineering 22 (10), pp. 1345–1359. Cited by: §5.3.3.
  • A. C. Popescu and H. Farid (2004) Statistical tools for digital forensics. In international workshop on information hiding, pp. 128–147. Cited by: §1.
  • H. Ravi, A. V. Subramanyam, and S. Emmanuel (2015) ACE–an effective anti-forensic contrast enhancement technique. IEEE Signal Processing Letters 23 (2), pp. 212–216. Cited by: §1.
  • W. Shan, Y. Yi, R. Huang, and Y. Xie (2019) Robust contrast enhancement forensics based on convolutional neural networks. Signal Processing: Image Communication 71, pp. 138–146. Cited by: §1, §2.
  • M. C. Stamm and K. R. Liu (2010a) Forensic detection of image manipulation using statistical intrinsic fingerprints. IEEE Transactions on Information Forensics and Security 5 (3), pp. 492–506. Cited by: §1, §2, §4.3.
  • M. C. Stamm and K. R. Liu (2010b) Forensic estimation and reconstruction of a contrast enhancement mapping. In 2010 IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 1698–1701. Cited by: §1, §2, §4.3.
  • M. Stamm and K. R. Liu (2008) Blind forensics of contrast enhancement in digital images. In 2008 15th IEEE International Conference on Image Processing, pp. 3112–3115. Cited by: §1, §2, §4.3.
  • J. Sun, S. Kim, S. Lee, and S. Ko (2018) A novel contrast enhancement forensics based on convolutional neural networks. Signal Processing: Image Communication 63, pp. 149–160. Cited by: §1, §2, Table 1, Table 2, Table 3, Table 4, §5.
  • P. Wang, F. Liu, C. Yang, and X. Luo (2018) Parameter estimation of image gamma transformation based on zero-value histogram bin locations. Signal Processing: Image Communication 64, pp. 33–45. Cited by: §1.
  • L. Wen, H. Qi, and S. Lyu (2018) Contrast enhancement estimation for digital image forensics. ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM) 14 (2), pp. 49. Cited by: §1.
  • P. Yang, R. Ni, Y. Zhao, and W. Zhao (2017) Source camera identification based on content-adaptive fusion residual networks. Pattern Recognition Letters. Cited by: §4.2, §4.4.
  • P. Yang, R. Ni, and Y. Zhao (2016) Recapture image forensics based on laplacian convolutional neural networks. In International Workshop on Digital Watermarking, pp. 119–128. Cited by: §4.2.
  • C. Zhang, D. Du, L. Ke, H. Qi, and S. Lyu (2018) Global contrast enhancement detection via deep multi-path network. In 2018 24th International Conference on Pattern Recognition (ICPR), pp. 2815–2820. Cited by: §1, §2.