Deep Multi-scale Discriminative Networks for Double JPEG Compression Forensics

04/04/2019 ∙ by Cheng Deng, et al. ∙ The University of Sydney Xidian University 0

As JPEG is the most widely used image format, the importance of tampering detection for JPEG images in blind forensics is self-evident. In this area, extracting effective statistical characteristics from a JPEG image for classification remains a challenge. Effective features are designed manually in traditional methods, suggesting that extensive labor-consuming research and derivation is required. In this paper, we propose a novel image tampering detection method based on deep multi-scale discriminative networks (MSD-Nets). The multi-scale module is designed to automatically extract multiple features from the discrete cosine transform (DCT) coefficient histograms of the JPEG image. This module can capture the characteristic information in different scale spaces. In addition, a discriminative module is also utilized to improve the detection effect of the networks in those difficult situations when the first compression quality (QF1) is higher than the second one (QF2). A special network in this module is designed to distinguish the small statistical difference between authentic and tampered regions in these cases. Finally, a probability map can be obtained and the specific tampering area is located using the last classification results. Extensive experiments demonstrate the superiority of our proposed method in both quantitative and qualitative metrics when compared with state-of-the-art approaches.



There are no comments yet.


page 4

page 16

page 17

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

With the rapid development of image acquisition tools and the popularity of social media, digital images are now widely used and have become the major information carrier. Due to the variety of available image processing tools, people can easily modify an image in any way they want [Li et al. (2017a), Li et al. (2017b)]. As a result, current digital technology has begun to erode trust in visual imagery in many fields, such as journalism, military, justice, commerce, medical applications, and academic research [Farid (2009), Liu and Chen (2014), Korus and Huang (2016), Chen et al. (2017), Dong et al. (2015)]. Consequently, digital image forensics, which aims to identify the original source of an image or determine whether or not the content of an image has been modified, has become increasingly important.

Since JPEG is the image format used by most digital devices, research into JPEG-related forensics has attracted significant attention [Liu et al. (2011), Liu et al. (2012), Thing et al. (2013)]. JPEG compression identification on bitmaps and double compression detection on JPEG images are two main research topics in JPEG forensics. The goal of JPEG compression identification on bitmaps is to detect the tampering traces of an image that has been previously JPEG-compressed and stored in lossless format. Thai et al. [Thai et al. (2017)]

proposed an accurate method for estimating quantization steps from a lossless format image that has experienced JPEG compression. Yang

et al. [Yang et al. (2015)] proposed a novel statistic named factor histogram for estimating the JPEG compression history of bitmaps. Li et al. [Li et al. (2015a)] provided a novel quantization noise-based solution to reveal the traces of JPEG compression. In this paper, we focus on double compression forensics on JPEG images. Many forensics techniques are inapplicable to JPEG images because compression can weaken certain traces of image tampering. However, recompression often appears when the JPEG image is tampered with and re-saved in JPEG format [Yang et al. (2014)]. These processes will leave specific traces of double compression; consequently, many related methods aim to detect double JPEG compression from histograms of the Discrete Cosine Transform (DCT) coefficient [Nguyen and Katzenbeisser (2013)]. Several works analyze or model the effect of JPEG compression. Yang et al. [Yang et al. (2016)]

presented a theoretical analysis on the variation of local variance caused by JPEG compression. Li

et al. [Li et al. (2015b)] presented a statistical analysis of JPEG noises, including the quantization noise and the rounding noise during a JPEG compression cycle.

Figure 1: Overview of our approach. The deep multi-scale discriminative networks contain a multi-scale module to extract features in different scale spaces and a discriminative module to judge whether another special network should be chosen for detection. Based on the final classification results, it is possible to determine whether the input image block is tampered or not.

The existing techniques for JPEG compression can be classified into two categories: the traditional method and the deep learning method. Many traditional algorithms for double JPEG compression have yielded relatively accurate detection results. Lukáš and Fridrich 

[Lukáš and Fridrich (2003)] estimated the primary quantization matrix and presented a periodicity of DCT coefficient histograms due to double compression. Popescu and Farid [Popescu and Farid (2004)] put forward some statistical correlations caused by digital tampering and analyzed the Double Quantization (DQ) effect. However, these methods cannot locate specific area that has been tampered. Lin et al. [Lin et al. (2009)] first proposed a fine-grained tampered image detection method that can locate the tampered region by investigating the DQ effect on DCT coefficients. Bianchi et al. [Bianchi et al. (2011)] designed a probability map to distinguish between tampered and original regions. Bianchi and Piva [Bianchi and Piva (2012)] proposed a kind of forensic algorithm to locate forged areas in JPEG format images by computing a likelihood map to represent the probability of each small DCT block being compressed once or twice. By modeling the DCT coefficient histograms as a mixture, the probability of blocks being tampered was obtained to locate tampered regions [Wang et al. (2014)][Yu et al. (2016)]. Liu [Liu (2017)] utilized ensemble learning with nearly -dimensional features from the spatial domain and from the DCT transform domain to address the challenging detection problems when . Another type of traditional method is based on Benford’s law. Fu et al. [Fu et al. (2007)] utilized a Benford’s law-based statistical model to distinguish between the tampered and authentic images, as the DCT coefficients of single compressed images obey the law but double compressed images do not. Li et al. [Li et al. (2008)] utilized mode-based first digit features to detect double compressed images and identify the primary quality factor of JPEG compression.

Almost all traditional methods require some artificially designed features for detection. However, designing such features is sometimes difficult and requires a large amount of theoretical research and experimentations. Another drawback is that traditional methods may disable detection when

due to the small difference between the statistical features of a tampered image and an authentic image in this case. Recently, deep neural network (DNN) in image processing and computer vision 

[Tang et al. (2015), Yan and Shao (2016), Fu et al. (2016), Hu et al. (2017)] has been used with great success when applied to the image forensics field. Baroffio et al. [Baroffio et al. (2016)] proposed a deep learning method to solve the problem of camera source identification. Chen et al. [Chen et al. (2015)]

utilized a convolutional neural network (CNN) to detect median filtering operations in images. Bayar and Stamm 

[Bayar and Stamm (2016)] utilized a new convolutional layer to detect universal image manipulation and proposed a network structure based on actual forensic evidence. In double JPEG compression forensics, Wang and Zhang [Wang and Zhang (2016)] explored eight different CNNs to solve the tampering localization problem; in this study, eight networks were trained on single-compressed images and double-compressed images with different . In an estimation of , one corresponding network is selected to detect the tampered regions. However, this method extracts features directly without taking full account of some characteristics of the double JPEG compression process, limiting the improvement of the detection effect. Finally, this method does not propose a solution for tougher cases where , which is also a general difficulty for most existing methods. Amerini et al. [Amerini et al. (2017)] explored the use of a spatial domain CNN and its combination with the histograms of DCT for the image forgery detection. The author suggested that further research is in progress. Barni et al. [Barni et al. (2017)] utilized CNNs in pixel domain, noise domain and DCT domain to perform detection task respectively. This method can obtain comprehensive information from a JPEG image but is quite complicated.

In this paper, we propose a novel method of deep multi-scale discriminative networks (MSD-Nets) for double JPEG compression forensics to detect tampered regions automatically. This generalization method does not require an estimation of and the networks can detect JPEG images with any . The multi-scale module consists of a combination of three single-scale networks connected in parallel [Zhang et al. (2016)][Korus and Huang (2017)]. This module is designed to extract features from the histograms of DCT coefficient after preprocessing, and is better able to describe the effective features in different scale spaces than its single-scale counterparts. The outputs of these networks are fused with different weights. A discriminative module then follows behind to judge whether another specially designed network should be chosen for the detection task. This module can improve the ability of our networks to distinguish the small statistical difference between a tampered region and an authentic region when , which most existing methods can not do. After detecting all the blocks of a JPEG image, a specific probability map of the detection result is obtained and a precise localization of the tampered area is achieved. Fig. 1 visualizes the whole framework of our proposed method. The experimental results on the Synthetic and Florence datasets demonstrate that the proposed method outperforms state-of-the-art traditional methods and a representative deep method.

The rest of this paper is organized as follows. In Section II, the features to be extracted are introduced, after which the MSD-Nets for double JPEG compression forensics are presented in Section III. The experimental results, and analysis are included in Section IV. Finally, conclusions are drawn in Section V.

2 Features in Double JPEG Compression

In this section, the image tampering model and the DQ effect are first described. From these statistical characteristics, useful features to be employed in solving the binary classification problem of double JPEG compression forensics are extracted. Finally, the difference between DCT coefficient histograms in single and double compression situations is evaluated, showing the effectiveness of the features utilized in our networks.

2.1 Image Tampering Model

In the JPEG compression process, an input image is first divided into many blocks, after which the discrete cosine transform is applied to every block. After the quantization process, a rounding function is applied to the DCT coefficients. These quantized DCT coefficients are later encoded via entropy encoding. The main reason for compression information loss lies in the quantization process [Thai et al. (2016)], whose quantization table is related to a particular compression quality factor from 0 to 100 in the form of an integer. As we know, a higher quality factor represents a lower amount of losses of image information, vice versa.

Figure 2: Example of image tampering model: (a) source image with a selected region , (b) original image , (c) tampered image .

Image tampering is often accompanied by double JPEG compression. Traditionally, the process of image splicing includes three steps, as shown in Fig. 2. These steps are as follows:

1) Choosing a compressed JPEG image , the quality factor of which is , and decompressing it.

2) Replacing a region of with a selected region from another image .

3) Saving the new tampered image in JPEG format with a quality factor , where represents the authentic region of .

In this model, is considered to be doubly compressed. If is in non-JPEG format, is undoubtedly singly compressed. However, if is in JPEG format, is regarded as a region that does not follow the law of double compression, as the DCT grids of have a very low probability of matching . Accordingly, is regarded as an image with a singly compressed region and a doubly compressed region .

2.2 Double Quantization Effect

In the JPEG compression process, the main reason for information loss is quantization, which can leave traces from the histograms of the DCT coefficient. Here, the DQ effect causes periodic peaks and valleys in histograms after the process of double JPEG compression.

Figure 3: Examples of different histograms in the DCT coefficients position (0,1): (a), (b), (c) single-compressed image when , , and ; (d), (e), (f) double-compressed image when (d) , , (e) , , and (f) , .

Fig. 3 illustrates the difference between double-compressed DCT coefficient histograms and single-compressed DCT coefficient histograms. From Fig. 3

, we can see that the histograms of single-compressed images are generally in accordance with a generalized Gaussian distribution, but the histograms of double-compressed images have periodic peaks and valleys due to the DQ effect. When

, the statistical properties of a double-compressed images are easily concealed by double compression. Thus, tampering detection becomes a tough problem in this situation.

To solve the problem of double JPEG compression forensics by classifying the singly and doubly compressed parts of a JPEG image, the DCT coefficient histograms are utilized as the input features.

2.3 Tampering Detection Features

In order to obtain the features that can be directly input into the networks, some pre-processing operations are needed. The DCT coefficients from the header file of a JPEG image are extracted first. The DCT coefficients contain a direct current (DC) coefficient and multiple alternating current (AC) coefficients. Taking an DCT block as an example, the DC coefficient is the first number of the DCT coefficient matrix, and AC coefficients are the other 63 numbers. In this paper, only AC coefficients with distributions that are different from the DC coefficients are selected to remove the impact of DC coefficients.

Because of the variable sizes of the histograms, and for the purpose of controlling the computational consumption without losing significant information, a symmetric interval containing the peak of the histogram is selected as the features. Fig. 4 shows a more detailed illustration. First, the second to the tenth coefficients, which are arranged in zigzag order, are chosen to organize the primary features. Then, the values of the positions are utilized to construct the final features. If represents the feature set of a block from a JPEG image, and represents the histogram of DCT coefficients corresponding to the position at the

th frequency form the zigzag order of the block, we have the following vector:


Therefore, a 279 () length feature vector is obtained from each JPEG image block. The parameter selection will be explained in the last part of Section IV.

3 Tampering Detection via MSD-Nets

In this section, our proposed model will be described in detail. During the preprocessing process, the histograms of DCT coefficient from a block of a JPEG image are first extracted. After that, three networks trained by different scale data can automatically extract the different statistical features of the same histograms. Then, a discriminative module is employed to judge whether another special network should be chosen for detection, which is designed to improve classification accuracy for tougher cases when . A special network in the discriminative module can distinguish the small statistical difference between a tampered region and an authentic region in this case. Finally, a proposed localization scheme is utilized to obtain the probability map and simultaneously output the last two classification results.

Figure 4: The illustration of building histograms of DCT coefficients: (1) select the second to the tenth coefficients which are arranged in zigzag order, (2) extract the histograms with a length of 31 from each coefficient to construct the final features. Note only three blocks and three histograms are shown for succinctly explaining. Different colors are used to distinguish different blocks.

3.1 Network Architecture

In order to solve the problem of double JPEG compression forensics, DNN is utilized to automatically extract the features from the DCT coefficient histograms and classify them. Our method does not need to estimate the first compression quality factor of the JPEG image, but instead utilizes a scheme of overall training. The image blocks, which have quality factors ranging from 50 to 95, are utilized for training to update the network parameters. In order to improve the effect of classification, four different DNNs are designed with similar structures but different parameters for training and testing.

The first part of MSD-Nets can be regarded as a three-channel structure for the process of extracting different features. This structure is inspired by the findings of a large number of experiments: namely, that the fusion of multi-scale networks trained by image blocks at different scales can increase the valuable information in various scale spaces for double JPEG compression tampering detection. This method of increasing the input features can introduce more effective information for classification, so that the characteristics of the network extraction are no longer limited to the single feature. This facilitates the extraction of more diversified and detailed information so that a better classification result can be obtained.

The pre-trained multiple DNNs can be utilized to extract the multi-scale features of DCT coefficient histograms automatically and features are aggregated through a process of weighted fusion. A three-scales model with three different networks, which are trained by , is designed using and size blocks respectively. Each kind of the block dataset consists of tampered and authentic blocks in the same quantity. The network trained by blocks can extract features in small-scale space, while the network trained by blocks can extract features in large-scale space, ensuring the richness of the feature in various scale spaces. The specific model selection will be explained in Section IV.

A variety of fusion methods have been tried, combining three kinds of features in the fully connected layer or using another special DNN to automatically select the weights. Finally, fixed weights have been found to yield better classification results. Therefore, the result-oriented fixed weights are utilized in our multi-scale feature DNN. The fusion process can be defined as:


where represents the result after the fusion process. , , and

represent the output values after the softmax layer in three different networks.

, , and represent the weights of , , and blocks respectively.

The architecture of a single network structure. Layer Input Conv1 Pool1 Conv2 Pool2 Full1 ReLUs Full2 ReLUs Full3 Softmax Output Size 2791 2771100 1381100 1361100 671100 10001 10001 10001 10001 10001 21 21 Kernel - 31 31 31 31 - - - - - - - Stride - 1 2 1 2 - - - - - - - Feature Map - 100 - 100 - 1000 - 1000 - - - - Property - - Max - Max - - - - - - - Initialization (Weight, Bias) - Xavier Constant - Xavier Constant - Xavier Constant - Xavier Constant - Xavier Constant - -

Figure 5: of the proposed MSD-Nets structure and the simple network structure on (a) the Synthetic dataset, (b) the Florence dataset.

In Fig. 5, Score is utilized to compare the experimental results of our MSD-Nets structure with a simple network on a 100000-block dataset. It is obvious that this specially designed structure for double JPEG compression forensics outperforms a simple network structure.

In addition, a divide and conquer strategy is implemented in our proposed method. Due to the statistical characteristics of the DQ effect, most of the existing schemes perform poorly when distinguishing the small statistical difference between a tampered region and an authentic region when . Hence, some specific data ( tampered image blocks in the case of , and

image blocks without tampering) are utilized to train a special DNN in our algorithm framework. This training pattern ensures the effectiveness of the network when

. The detailed discriminative module is described in Algorithm 1.

The architecture of a single channel network structure is shown in Table 3.1. The network contains two alternating convolutional layers, two pooling layers and three fully connected layers. A softmax layer is utilized in the last of the structures to obtain the classification probability of each class.

1) Convolutional Layer: A convolutional layer employs convolution and non-linearity operations to the input data, reduces the number of free parameters and simultaneously improves generalization [Lin et al. (2016)]. A size kernel is selected and the number of feature maps (that is, the number of kernels) is set to 100. The stride is set to 1. Hence, each feature map becomes a vector with a size of and the output of the first convolutional layer becomes . Similarly, the output of the second convolutional layer becomes . The concrete convolutional operation is represented as:


where * represents convolution, is the th feature map of layer , represents the trainable weight that connects the th feature map of layer with the th feature map of layer , and represents the bias parameter of the th feature map of layer .

2) Pooling Layer:

While the extracted features can be utilized for classification after the convolutional layer, this may give rise to some challenges in calculation and be prone to over-fitting. Hence a pooling layer which can combine the outputs of neuron clusters into a single neuron is utilized 

[Ciresan et al. (2011)] [Scherer et al. (2010)]

and a max pooling selected to extract the maximum value from each of a cluster of neurons at the prior layer. The size of the pooling is

and the stride is 2. Thus, only the max value within the local area to the next layer is calculated.

3) Fully Connected Layer:

The fully connected layer connects each neuron in one layer to each neuron in another layer. The weights and the bias of the network can be adaptively renewed in the fully connected layers because of the error propagation procedure. Therefore, the last classification result will be fed back to automatically guide the feature extracting process, after which the learning mechanism can be set up

[LeCun et al. (1998)]. In our network, the first two fully connected layers have 1000 outputs and the last one has 2 outputs.

4) ReLUs Nonlinearity:

ReLUs is the abbreviation of Rectified Linear Units. Following the first two fully connected layers, ReLUs is utilized because of its ability to facilitate fast convergence in some large models trained on some large datasets

[Krizhevsky et al. (2012)]

. This layer applies the non-saturating activation function as:


where represents the input patch of the features.

Double JPEG compression tampering detection can be regarded as a two-classification problem: doubly compressed region (authentic region) and singly compressed region (tampered region). Hence, after a softmax layer, the classification probability of the two classes can be obtained. The parameter selection will be explained in Section IV.

Meanwhile, we compare our method with SVM classifier which is mentioned in [Li et al. (2008)] by inputting the

histograms into SVM classifier. Then we find that SVM classifier has poor performance on these histograms. Most of the reason is that traditional machine learning techniques usually have no ability to process raw data. When the histograms are utilized for classification directly without handcrafted feature extraction, these techniques can hardly work. Hence, the actual benefits of deep network is to achieve representation learning automatically and capture important features easily for forensics.

1:: DCT coefficient histograms of input image blocks; : the first value of the result after fusion; : the second value of the result after fusion; : output value of the special DNN; : Threshold;
2:Result of whether the block is tampered or not;
3:if  then
4:     Compute the output of the special DNN with the input of ; Set as the final result;
5:     if  then
6:         The input block was manipulated.
7:     else
8:         The input block is authentic.
9:     end if
11:     Set as the final result;
12:     if  then
13:         The input block was manipulated.
14:     else
15:         The input block is authentic.
16:     end if
17:end if
Algorithm 1 Threshold-Based Discriminative Module Path Selection Algorithm

3.2 Tampered Region Localization

In order to locate the tampered area more precisely, a input JPEG image is first divided into many overlapping blocks ( is set to 64 according to testing). We then compute the DCT coefficient histograms with a size of from each block, and input to the MSD-Nets. The final probability pair is and after the softmax layer is obtained later. Finally, the value , which represents the probability that the block is singly compressed (i.e. tampered), will be assigned to the small block in its center. Each overlapping block from image has a small block in its center, so the number of these overlapping blocks from image is the same as the number of these small central blocks . Therefore, can be computed as:


After assigning all of the blocks, a tampering detection probability map with a size of is obtained. Finally, the pixel values of the blocks on the edge of

are padded as 0 and a

tampering detection result map is obtained. The whiter areas represent the higher probability that this particular block is tampered. It is natural that the binary classification result map can be easily obtained with a threshold of .

4 Experimental Results

In this section, the results of extensive experiments in both quantitative and qualitative metrics are provided and compared. We compare our method with two representative traditional methods [Bianchi and Piva (2012)] [Yu et al. (2016)] and a deep method [Wang and Zhang (2016)], denoted respectively as BP, YH, and WZ in plot legends. In addition, we compare our method with a traditional method, named LA [Liu (2017)]. For fair comparison, we run BP, YH, and WZ according to their public codes, and reimplement the algorithm of LA based on the original paper.

4.1 Dataset

UCID Dataset. One of the most widespread lossless datasets in image forensics: the Uncompressed Color Image Database (UCID) [Gerald and Michal (2003)]. The UCID dataset contains 1338 images in TIFF format with a resolution of .

Synthetic Dataset. This dataset was synthesized from UCID dataset. 1200 images from UCID dataset are randomly selected for experiments (800 as training sets, 200 as validating sets and 200 as testing sets). To create JPEG images for training, validating, and testing, each original TIFF image is first compressed with a given quality factor . The left region that should not be doubly compressed is then replaced with the corresponding region in the original image. Finally, the image is compressed again with another quality factor . Each image achieves 100 possible combinations of .

Detection accuracy achieved on the Synthetic dataset. Method 50 55 60 65 70 75 80 85 90 95 50 Proposed 0.5391 0.5251 0.5476 0.7378 0.9194 0.9557 0.9804 0.9808 0.9767 0.9758 BP 0.4756 0.5023 0.6136 0.7021 0.8955 0.8512 0.8981 0.9375 0.9339 0.9222 YH 0.5085 0.5111 0.5970 0.7080 0.8704 0.9225 0.8926 0.9020 0.9349 0.9170 WZ 0.5167 0.5080 0.5082 0.5048 0.4980 0.6126 0.7793 0.8466 0.9122 0.9365 LA 0.4986 0.5230 0.5653 0.7108 0.8223 0.8684 0.9217 0.9308 0.9455 0.9759 60 Proposed 0.5118 0.5040 0.5769 0.5150 0.7348 0.9109 0.9806 0.9769 0.9651 0.9880 BP 0.5195 0.4782 0.4300 0.5332 0.6748 0.8330 0.8571 0.8949 0.9264 0.9206 YH 0.5160 0.5026 0.4967 0.5531 0.7129 0.7259 0.9131 0.8910 0.9382 0.9144 WZ 0.5094 0.5061 0.5054 0.5028 0.5065 0.6689 0.8104 0.8238 0.9138 0.9332 LA 0.5323 0.5375 0.5000 0.5628 0.6463 0.7557 0.8233 0.8724 0.9205 0.9603 70 Proposed 0.4875 0.5328 0.6194 0.5204 0.5097 0.5902 0.9147 0.9535 0.9587 0.9746 BP 0.5804 0.4980 0.4736 0.4893 0.4473 0.5029 0.6787 0.8125 0.8893 0.9163 YH 0.5947 0.5332 0.4844 0.5000 0.4788 0.5137 0.7705 0.9098 0.9320 0.9242 WZ 0.5171 0.5091 0.5189 0.5054 0.5024 0.5083 0.7341 0.8542 0.8905 0.9284 LA 0.5810 0.6053 0.5799 0.5433 0.5010 0.5822 0.7410 0.8508 0.8995 0.9493 80 Proposed 0.5494 0.4947 0.6224 0.6463 0.7456 0.6822 0.5362 0.7614 0.9565 0.8941 BP 0.5000 0.5000 0.4993 0.5407 0.5000 0.4945 0.4622 0.5358 0.7331 0.8887 YH 0.5029 0.4993 0.5000 0.5101 0.5020 0.5007 0.4919 0.5306 0.8158 0.9261 WZ 0.5011 0.5042 0.5061 0.5031 0.5017 0.5258 0.5232 0.6737 0.8535 0.8548 LA 0.6162 0.6151 0.6509 0.6415 0.6353 0.6021 0.4994 0.6810 0.8220 0.8958 90 Proposed 0.4862 0.5754 0.5277 0.5647 0.5527 0.5150 0.5992 0.6936 0.5207 0.7344 BP 0.4694 0.5156 0.4290 0.4632 0.4951 0.5400 0.5020 0.5000 0.4987 0.7829 YH 0.5000 0.5000 0.5026 0.4935 0.5055 0.5205 0.4941 0.5000 0.5000 0.7738 WZ 0.5088 0.4990 0.5013 0.5061 0.5104 0.4955 0.5526 0.5931 0.5153 0.7662 LA 0.5936 0.5768 0.5895 0.5470 0.5419 0.6094 0.6711 0.5566 0.7012 0.7792

Detection accuracy achieved on the Florence dataset. Method 50 55 60 65 70 75 80 85 90 95 50 Proposed 0.5328 0.5553 0.5701 0.7775 0.8635 0.8996 0.9174 0.9158 0.9298 0.9519 BP 0.5023 0.5739 0.6587 0.7004 0.8532 0.9071 0.9318 0.9407 0.9553 0.9148 YH 0.5204 0.3984 0.5215 0.6366 0.8373 0.9003 0.9310 0.9416 0.9563 0.9200 WZ 0.5178 0.4997 0.4728 0.4507 0.4288 0.6422 0.7071 0.6587 0.7222 0.9618 LA 0.5204 0.5517 0.6423 0.7648 0.8654 0.8815 0.8977 0.9090 0.9129 0.9225 60 Proposed 0.5502 0.5705 0.5791 0.6008 0.7981 0.9010 0.9106 0.9129 0.9361 0.9463 BP 0.5747 0.5401 0.4992 0.5975 0.7027 0.7661 0.8933 0.9111 0.9175 0.8733 YH 0.6018 0.6021 0.5137 0.5355 0.6295 0.7345 0.8871 0.9111 0.9218 0.8851 WZ 0.5445 0.5338 0.5135 0.4901 0.4599 0.6936 0.7514 0.6938 0.7909 0.9819 LA 0.5446 0.5240 0.5104 0.5794 0.6910 0.8271 0.8448 0.8771 0.8865 0.8975 70 Proposed 0.5586 0.5981 0.7525 0.6176 0.5863 0.7130 0.9113 0.9239 0.9229 0.9526 BP 0.5911 0.6063 0.5938 0.5336 0.5031 0.5912 0.8052 0.8815 0.9156 0.8691 YH 0.5996 0.6080 0.6262 0.5883 0.4843 0.5841 0.7771 0.8744 0.9175 0.8812 WZ 0.5969 0.5860 0.5631 0.5404 0.5186 0.4966 0.9442 0.7601 0.8220 0.9824 LA 0.6802 0.6492 0.5818 0.5292 0.5179 0.5817 0.7608 0.8198 0.8587 0.8756 80 Proposed 0.5375 0.5272 0.7905 0.7521 0.8448 0.8236 0.5569 0.8996 0.9253 0.9456 BP 0.5265 0.5467 0.5278 0.5505 0.5782 0.5457 0.4996 0.6449 0.8871 0.8572 YH 0.5075 0.5716 0.5508 0.5488 0.5936 0.5856 0.4878 0.5809 0.8814 0.8690 WZ 0.4816 0.4720 0.6495 0.5841 0.5635 0.5576 0.5251 0.8560 0.9240 0.9666 LA 0.5813 0.5881 0.7302 0.7237 0.6652 0.5767 0.5229 0.6777 0.7952 0.8288 90 Proposed 0.5507 0.5531 0.6910 0.6634 0.6937 0.6183 0.7692 0.8587 0.5482 0.9169 BP 0.5083 0.5053 0.5057 0.5034 0.5169 0.5112 0.5297 0.5557 0.5002 0.6473 YH 0.5113 0.5138 0.4926 0.4455 0.5466 0.6271 0.5104 0.6017 0.5708 0.7509 WZ 0.5029 0.5190 0.5561 0.5209 0.4959 0.5255 0.6884 0.7406 0.5021 0.9721 LA 0.5188 0.5375 0.5398 0.5162 0.5475 0.5594 0.6973 0.6606 0.5142 0.7021

Florence Dataset111 The Image Dataset for Localization of Double JPEG compression (Florence Dataset) is a public dataset containing 100 full-resolution raw color images from three different digital cameras: Nikon D90, Canon EOS 450D, and Canon EOS 5D. These are converted to TIFF format images and compressed by , , and in this dataset. Only a region in the central position of each image is utilized. The compressed dataset is chosen for experiments (64 for training, 16 for validating and 20 for testing). This dataset is also utilized in the experiments of [Bianchi and Piva (2012)].

Finally, low-resolution synthesized JPEG images from the Synthetic dataset and high-resolution synthesized JPEG images from the Florence dataset are obtained. The left half of each JPEG image is singly compressed, providing a convenient comparison process for experiments. This measure of processing data facilitates to gain balanced samples from the same image in the following steps.

4.2 Quantitative Experiments

After generating a set of specific compressed images using the Synthetic dataset, cropping is first performed to divide each image into many blocks, and 48 blocks on the Synthetic dataset are obtained. Hence, a positive set with elements and a negative set with the same number of elements on the Synthetic dataset are obtained. Similarly, the number changes to on the Florence dataset, and and block datasets are obtained in the similar way. The dataset utilized in our discriminative module is rather special, as it consists of two parts: singly compressed blocks with , and doubly compressed blocks.

of the data is utilized for network training, with the remaining

being used for validating. Three multi-scale feature DNNs and one special feature DNN are trained. The very popular Caffe implementation 

[Jia et al. (2014)]

is utilized for the training task. Because of the huge computation complexity of the network, our experiments utilize NVIDIA GTX TITAN X to accelerate the process. The optimization method we used is Stochastic Gradient Descent. The value of the learning rate is 0.0005. The batch-size is 200, and the momentum is set to 0.9. The number of epochs is set to 20 to ensure network convergence.

For testing, the rest of the 200 images in the Synthetic dataset and 20 images in the Florence dataset are utilized. After dividing each image into overlapping blocks with a stride of 8, these blocks are input into the MSD-Nets. In the weighted fusion step, the weight of blocks is set to , the weight of blocks is set to , and the weight of blocks is set to . Subsequently, the final result map is obtained.

Figure 6: Results achieved on the Synthetic dataset by the proposed method and the comparison methods of BP, YH, WZ, and LA. (a)-(b) average values of the detection accuracy and corresponding to .
Figure 7: Results achieved on the Florence dataset by the proposed method and the comparison methods of BP, YH, WZ, and LA. (a)-(b) average values of the detection accuracy and corresponding to .

For quantitative experiments, the output probability map of the MSD-Nets is binarized to generate the final result map with two regions: a single-compressed region and a double-compressed region. Therefore, a pixel with a value of 0 represents being doubly compressed, while a value of 1 represents being singly compressed.

Accordingly, the metrics accuracy () and can be measured as:


where is the total number of the single-compressed blocks, and is the total number of the double-compressed blocks. True positive and true negative stand for the numbers of blocks in double-compressed and single-compressed regions that are correctly identified. and denote the number of blocks which are erroneously detected as double-compressed blocks and the number of blocks that are falsely detected as single-compressed blocks. is a comprehensive indicator of and .

The detection accuracy achieved on the Synthetic dataset and the Florence dataset is shown in Table 4.1 and Table 4.1. In order to ensure the integrity of the experiment, the situation of is retained.

Table 4.1 shows that the detection accuracies of our method are mostly to percent higher than BP, YH and WZ in the case of , especially when . When , the performance of our method are almost comparable to LA that is specially designed for down-recompression discrimination. In almost all cases when , the detection accuracies of our method are to percent higher than other methods. Similarly, according to Table 4.1, the detection accuracies of our method are to percent higher than other methods, even higher than LA, in most cases of . When , the superiority of our method is more prominent. In most cases of , the detection accuracies of our method are to percent higher than other methods, and the superior performances of our method are especially evident when . Generally, depending on the superior capability of the multi-scale module in feature extraction and the application of the discriminative module to address the challenging detection problems when , our method gains significantly higher accuracy when as well as .

Figure 8: Results achieved on the Synthetic dataset by the proposed method and the comparison methods of BP, YH, WZ, and LA when . (a)-(b) average values of the detection accuracy and corresponding to .
Figure 9: Results achieved on the Florence dataset by the proposed method and the comparison methods of BP, YH, WZ, and LA when . (a)-(b) average values of the detection accuracy and corresponding to .

In order to compare different approaches more intuitively, we also draw some figures. Because of the fact that detection performance is related to and , we calculate the average values of and for all the images in different . Fig. 6 and Fig. 7 illustrate the detection results on the Synthetic and Florence datasets, respectively.

It is evident that our method outperforms other methods in terms of accuracy and when as well as . In general, the detection results of each method are significantly worse when or . Moreover, as we can see, the accuracy of each method is no more than in the case where . As the compression quality factor ranges from to , the of each image is almost higher than the in this case.

Fig. 8 and Fig. 9 illustrate the average detection results when in different . We find that our method has significantly higher accuracy and in this case. This greatly depends on the application of the discriminative module. Moreover, the multi-scale module also contributes to the overall improvement in detection.

These results show that our method has a notable superiority in double JPEG compression forensics. In addition, based on the slightly different trends in the results on the two datasets, our approach has more stable performance on the high-resolution dataset.

Comparison of the proposed method and LA on Synthetic dataset. Method Average detection time Feature dimension Target area LA 570 ms/pic 94976 Global detection Proposed 60 ms/pic 279 Accurate location

Table 4.2 shows the further comparison between our method and LA on Synthetic dataset. Since our method uses a much smaller number of features (-D) than LA (nearly -D), the average detection time of our method is 60ms for each picture, almost 10 times faster than LA. Additionally, our method can accurately locate the tampered area of the JPEG image but LA is often used to determine the authenticity of the entire image.

4.3 Qualitative Experiments

To ensure the comprehensiveness of the experiment, we not only manually synthesize JPEG images using Adobe Photoshop, but also manipulate JPEG images automatically via Matlab. Fig. 10 shows the effectiveness and robustness of our method. We can observe from the last two columns that there are almost no tampered regions in image 1 that conform to our cognition. Although the refrigerator in the original image 2 is shrunken and inserted into another image, our method still yields a superior detection result.

Figure 10: A group of successful results of the proposed approach: (a) an original image 1, (b) an original image 2, (c) the tampered image 3 after narrowing, , , (d) the mask, (e) the classification probability map of image 1, (f) the detection result map of image 1, (g) the classification probability map of image 3, (h) the detection result map of image 3. In order to reflect the performance more intuitively, our results have not been subjected to any filtering.
Figure 11: Detection results on the Synthetic dataset: (a) original images, (b) tampered images compressed with , , and , (two at the top which are human unrecognizable due to the reasonable semantic information they have), , , and , (two at the bottom which are easy to recognize due to the abnormal semantic information then have), (c) tampering masks, (d)-(f) detection results of BP, YH, and WZ, (g) probability maps of our method, (h) detection results of our method.
Figure 12: Detection results on the Florence dataset: (a) original images, (b) tampered images compressed with , , and , (two at the top), , , and , (two at the bottom) are automatically synthesized by Matlab, the central square area of each image is replaced by a block with the same content, (c) tampering masks, (d)-(f) detection results of BP, YH, WZ, (g) probability maps of our method, (h) detection results of our method.

Additional results compared with the methods of BP, YH, and WZ on different datasets are shown in Fig. 11 and Fig. 12. In order to ensure the diversity of the experiment, images from different datasets are selected and tampered in different ways.

The accuracy of detection using different parameters. Dataset/Number of connections 1 conv 2 conv 3 conv 4 conv Synthetic 0.687 0.690 0.682 0.685 Florence 0.726 0.738 0.733 0.734 Dataset/Kernel size Synthetic 0.690 0.682 0.681 0.679 Florence 0.738 0.731 0.727 0.726 Dataset/Number of kernels 50 100 150 200 Synthetic 0.677 0.690 0.688 0.689 Florence 0.725 0.738 0.733 0.736 Dataset/Feature dimensions 11 21 31 41 Synthetic 0.661 0.683 0.690 0.686 Florence 0.682 0.704 0.738 0.740 Dataset/Network model Model1 Model2 Model3 Model4 Synthetic 0.677 0.685 0.679 0.690 Florence 0.723 0.734 0.728 0.738

Fig. 11 shows the detection results of artificially tampered images on the Synthetic dataset. It is evident that our method has fewer misclassified points. Simultaneously, our method is rarely influenced by interference information, such as the content of the sky in the second lines.

Fig. 12 shows the detection results of automatically synthesized images on the Florence dataset. The results show that our method is rarely affected by the content of the image and performs better in cases of both and , although all of the methods perform worse when . When , the superior performance of our method is derived from the discriminative module. The multi-scale features extracted by our networks can help to improve the classification effect and reduce the interference of the invalid information.

4.4 Parameter Selection

In this section, many experiments are implemented in order to reveal the relationship between global accuracy and parameter selection. Different DNN model parameters and structures are tested to construct better networks and specify the network parameters. Table 4.3 presents a comparison of the size of kernels, the number of convolutional layers, the number of kernels, the dimension of features, and the model composed of multiple networks which are trained by different scale data. Model1 to Model4 represent different network structures: 1) a network trained by blocks; 2) two fused networks trained by and blocks; 3) two fused networks trained by and blocks; 4) three fused networks trained by , , and blocks. The values in Table 4.3 which are in bold represent the parameters or structure we finally select.

5 Conclusion

This paper proposes a novel double JPEG compression forensics method based on deep multi-scale discriminative networks. The multi-scale features extracted by the multi-scale module derive more effective information from DCT coefficient histograms and achieve better performance in the tampering detection process. With reference to the statistical characteristics of the DQ effect, a discriminative module is also designed to capture the small difference between authentic and tampered images in tougher cases when . Finally, the automatic localization of specific tampered regions is realized. Extensive experimental results confirm that our MSD-Nets outperform several state-of-the-art methods on two public datasets.

In the future, it will be necessary for us to design a pretreatment process for filtering quantization noise. In addition, further efforts will be made to consider adding both the image content information and semantic information to assist in double JPEG compression forensics.


  • [1]
  • Amerini et al. (2017) Irene Amerini, Tiberio Uricchio, Lamberto Ballan, and Roberto Caldelli. 2017. Localization of JPEG double compression through multi-domain convolutional neural networks. In

    Computer Vision and Pattern Recognition Workshops

    . 1865–1871.
  • Barni et al. (2017) M. Barni, L. Bondi, N. Bonettini, P. Bestagini, A. Costanzo, M. Maggini, B. Tondi, and S. Tubaro. 2017. Aligned and non-aligned double JPEG detection using convolutional neural networks. J. Vis. Comm. Image Represent. 49 (2017), 153–163.
  • Baroffio et al. (2016) Luca Baroffio, Luca Bondi, Paolo Bestagini, and Stefano Tubaro. 2016. Camera identification with deep convolutional networks. arXiv:1603.01068v1 (2016).
  • Bayar and Stamm (2016) Belhassen Bayar and Matthew C. Stamm. 2016. A deep learning approach to universal image manipulation detection using a new convolutional layer. In Proc. 4th ACM Workshop on Information Hiding and Multimedia Security. 5–10.
  • Bianchi and Piva (2012) Tiziano Bianchi and Alessandro Piva. 2012. Image forgery localization via block-grained analysis of JPEG artifacts. IEEE Trans. Inf. Forensics Security 7, 3 (2012), 1003–1017.
  • Bianchi et al. (2011) Tiziano Bianchi, Alessia De Rosa, and Alessandro Piva. 2011. Improved DCT coefficient analysis for forgery localization in JPEG images. In Proc. IEEE Int. Conf. on Acoustics, Speech and Signal Processing. 2444–2447.
  • Chen et al. (2017) Chenglong Chen, Jiangqun Ni, Zhaoyi Shen, and Yun Qing Shi. 2017. Blind forensics of successive geometric transformations in digital images using spectral method: theory and applications. IEEE Trans. Image Process. 26, 6 (2017), 2811–2824.
  • Chen et al. (2015) Jiansheng Chen, Xiangui Kang, Ye Liu, and Z. Jane Wang. 2015. Median filtering forensics based on convolutional neural networks. IEEE Signal Process. Lett. 22, 11 (2015), 1849–1853.
  • Ciresan et al. (2011) Dan C. Ciresan, Ueli Meier, Jonathan Masci, Luca M. Gambardella, and Jurgen Schmidhuber. 2011. Flexible, high performance convolutional neural networks for image classification. In

    Proc. Int. Joint Conf. on Artificial Intelligence

    , Vol. 22. 1237–1242.
  • Dong et al. (2015) Yongsheng Dong, Dacheng Tao, Xuelong Li, Jinwen Ma, and Jiexin Pu. 2015.

    Texture classification and retrieval using shearlets and linear regression.

    IEEE Trans. Cybern. 45, 3 (2015), 358–369.
  • Farid (2009) H Farid. 2009. A survey of image forgery detection. IEEE Signal Process. Mag. 26, 2 (2009), 16–25.
  • Fu et al. (2007) Dongdong Fu, Yun Q. Shi, and Wei Su. 2007. A generalized Benford’s law for JPEG coefficients and its applications in image forensics. Security, Steganography, and Watermarking of Multimedia Contents IX 6505 (2007), 65051L.
  • Fu et al. (2016) Xueyang Fu, Jiabin Huang, Xinghao Ding, Yinghao Liao, and John Paisley. 2016. Clearing the skies: a deep network architecture for single-image rain streaks removal. IEEE Trans. Image Process. PP, 99 (2016), 1–1.
  • Gerald and Michal (2003) Schaefer Gerald and Stich Michal. 2003. UCID: an uncompressed color image database. Storage and Retrieval Methods and Applications for Multimedia 2004 5307 (2003), 472–480.
  • Hu et al. (2017) Zhenhen Hu, Yonggang Wen, Luoqi Liu, Jianguo Jiang, Richang Hong, Meng Wang, and Shuicheng Yan. 2017. Visual classification of furniture styles. ACM Trans. Intell. Syst. Technol. 8, 5 (2017), 1–20.
  • Jia et al. (2014) Yangqing Jia, Evan Shelhamer, Jeff Donahue, Sergey Karayev, Jonathan Long, Ross Girshick, Sergio Guadarrama, and Trevor Darrell. 2014. Caffe: convolutional architecture for fast feature embedding. In Proc. 22nd ACM Int. Conf. on Multimedia. 675–678.
  • Korus and Huang (2016) P Korus and J. Huang. 2016. Multi-Scale fusion for improved localization of malicious tampering in digital images. IEEE Trans. Image Process. 25, 3 (2016), 1312–1326.
  • Korus and Huang (2017) Paweł Korus and Jiwu Huang. 2017. Multi-scale analysis strategies in PRNU-based tampering localization. IEEE Trans. Inf. Forensics Security 12, 4 (2017), 809–824.
  • Krizhevsky et al. (2012) Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. 2012. Imagenet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems. 1097–1105.
  • LeCun et al. (1998) Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner. 1998. Gradient-based learning applied to document recognition. 86, 11 (1998), 2278–2324.
  • Li et al. (2015a) Bin Li, Tian-Tsong Ng, Xiaolong Li, Shunquan Tan, and Jiwu Huang. 2015a. Revealing the trace of high-quality JPEG compression through quantization noise analysis. IEEE Trans. Inf. Forensics Security 10, 3 (2015), 558–573.
  • Li et al. (2015b) Bin Li, Tian-Tsong Ng, Xiaolong Li, Shunquan Tan, and Jiwu Huang. 2015b. Statistical model of JPEG noises and its application in quantization step estimation. IEEE Trans. Image Process. 24, 5 (2015), 1471–1484.
  • Li et al. (2008) Bin Li, Yun Q. Shi, and Jiwu Huang. 2008. Detecting doubly compressed JPEG images by using mode based first digit features. In IEEE 10th Workshop on Multimedia Signal Processing. 730–735.
  • Li et al. (2017a) Xuelong Li, Kang Liu, and Yongsheng Dong. 2017a. Superpixel-based foreground extraction with fast adaptive trimaps. IEEE Trans. Cybern. (2017).
  • Li et al. (2017b) Xuelong Li, Kang Liu, Yongsheng Dong, and Dacheng Tao. 2017b. Patch alignment manifold matting. IEEE Trans. Neural Netw. Learn. Syst. (2017).
  • Lin et al. (2016) Xiaodan Lin, Jingxian Liu, and Xiangui Kang. 2016. Audio recapture detection with convolutional neural networks. IEEE Trans. Multimed. 18, 8 (2016), 1480–1487.
  • Lin et al. (2009) Zhouchen Lin, Junfeng He, Xiaoou Tanga, and Chi-Keung Tang. 2009. Fast, automatic and fine-grained tampered JPEG image detection via DCT coefficient analysis. Pattern Recogn. 42, 11 (2009), 2492–2501.
  • Liu (2017) Qingzhong Liu. 2017. An approach to detecting JPEG down-recompression and seam carving forgery under recompression anti-forensics. Pattern Recogn. 65 (2017), 35–46.
  • Liu and Chen (2014) Qingzhong Liu and Zhongxue Chen. 2014. Improved approaches with calibrated neighboring joint density to steganalysis and seam-carved forgery detection in JPEG images. ACM Trans. Intell. Syst. Technol. 5, 4 (2014), 1–30.
  • Liu et al. (2011) Qingzhong Liu, Andrew H. Sung, and Mengyu Qiao. 2011. Neighboring joint density-based JPEG steganalysis. ACM Trans. Intell. Syst. Technol. 2, 2 (2011), 1–16.
  • Liu et al. (2012) Zhenli Liu, Xiaofeng Wang, and Jing Chen. 2012. Passive forensics method to detect tampering for double JPEG compression image. In IEEE Int. Symposium on Multimedia. 185–189.
  • Lukáš and Fridrich (2003) Jan Lukáš and Jessica Fridrich. 2003. Estimation of primary quantization matrix in double compressed JPEG images. In Proc. Digital Forensic Research Workshop. 5–8.
  • Nguyen and Katzenbeisser (2013) Hieu Cuong Nguyen and Stefan Katzenbeisser. 2013.

    Detecting resized double JPEG compressed images C using support vector machine. In

    Proc. IFIP Int. Conf. on Communications and Multimedia Security. 113–122.
  • Popescu and Farid (2004) Alin C. Popescu and Hany Farid. 2004. Statistical tools for digital forensics. In Proc. Int. Conf. on Information Hiding. 128–147.
  • Scherer et al. (2010) Dominik Scherer, Andreas Müller, and Sven Behnke. 2010. Evaluation of pooling operations in convolutional architectures for object recognition. Artificial Neural Networks–ICANN 2010 (2010), 92–101.
  • Tang et al. (2015) Ao Tang, Ke Lu, Yufei Wang, Jie Huang, and Houqiang Li. 2015. A real-time hand posture recognition system using deep neural networks. ACM Trans. Intell. Syst. Technol. 6, 2 (2015), 1–23.
  • Thai et al. (2017) Thanh Hai Thai, Rémi Cogranne, Florent Retraint, and others. 2017. JPEG quantization step estimation and its applications to digital image forensics. IEEE Trans. Inf. Forensics Security 12, 1 (2017), 123–133.
  • Thai et al. (2016) Thanh Hai Thai, R mi Cogranne, Florent Retraint, and Thi Ngoc Canh Doan. 2016. JPEG quantization step estimation and its applications to digital image forensics. IEEE Trans. Inf. Forensics Security 12, 1 (2016), 123–133.
  • Thing et al. (2013) Vrizlynn L. L. Thing, Yu Chen, and Carmen Cheh. 2013. An improved double compression detection method for JPEG image forensics. In IEEE Int. Symposium on Multimedia. 290–297.
  • Wang and Zhang (2016) Qing Wang and Rong Zhang. 2016. Double JPEG compression forensics based on a convolutional neural network. EURASIP J. Information Security 2016, 1 (2016), 1–12.
  • Wang et al. (2014) Wei Wang, Jing Dong, and Tieniu Tan. 2014. Exploring DCT coefficient quantization effects for local tampering detection. IEEE Trans. Inf. Forensics Security 9, 10 (2014), 1653–1666.
  • Yan and Shao (2016) Ruomei Yan and Ling Shao. 2016. Blind image blur estimation via deep learning. IEEE Trans. Image Process. 25, 4 (2016), 1910–1921.
  • Yang et al. (2014) Jianquan Yang, Jin Xie, Guopu Zhu, Sam Kwong, and Yun-Qing Shi. 2014. An effective method for detecting double JPEG compression with the same quantization matrix. IEEE Trans. Inf. Forensics Security 9, 11 (2014), 1933–1942.
  • Yang et al. (2015) Jianquan Yang, Guopu Zhu, Jiwu Huang, and Xi Zhao. 2015. Estimating JPEG compression history of bitmaps based on factor histogram. Digit. Signal Process. 41 (2015), 90–97.
  • Yang et al. (2016) Jianquan Yang, Guopu Zhu, and Yun-Qing Shi. 2016. Analyzing the effect of JPEG compression on local variance of image intensity. IEEE Trans. Image Process. 25, 6 (2016), 2647–2656.
  • Yu et al. (2016) L. Yu, Q. Han, X. Niu, S. M. Yiu, J. Fang, and Y. Zhang. 2016. An improved parameter estimation scheme for image modification detection based on DCT coefficient analysis. Forensic Sci. Int. 259 (2016), 200–209.
  • Zhang et al. (2016) Tong Zhang, Wenming Zheng, Zhen Cui, Yuan Zong, Jingwei Yan, and Keyu Yan. 2016. A deep neural network-driven feature learning method for multi-view facial expression recognition. IEEE Trans. Multimed. 18, 12 (2016), 2528–2536.