On the use of Benford's law to detect GAN-generated images

04/16/2020 ∙ by Nicolò Bonettini, et al. ∙ Università di Padova Politecnico di Milano 0

The advent of Generative Adversarial Network (GAN) architectures has given anyone the ability of generating incredibly realistic synthetic imagery. The malicious diffusion of GAN-generated images may lead to serious social and political consequences (e.g., fake news spreading, opinion formation, etc.). It is therefore important to regulate the widespread distribution of synthetic imagery by developing solutions able to detect them. In this paper, we study the possibility of using Benford's law to discriminate GAN-generated images from natural photographs. Benford's law describes the distribution of the most significant digit for quantized Discrete Cosine Transform (DCT) coefficients. Extending and generalizing this property, we show that it is possible to extract a compact feature vector from an image. This feature vector can be fed to an extremely simple classifier for GAN-generated image detection purpose.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

With the advent of modern deep learning solutions such as gans, a new series of image and video editing tools has been made available to everyone (e.g., Recycle-GAN

[1], StyleGAN [12], etc.). These techniques allow to synthesize realistic and visually-pleasant artificial images not resorting to complex cgi techniques required in the past. Unfortunately, this great step forward in technology came at a price. Indeed, gans can be maliciously used by everyone to generate very realistic image forgeries to manipulate people’s opinion through fake news spreading [3]. To counter this threat, the forensic research community has started to develop a series of techniques that detect fake gan-generated image [19, 20, 23].

All of the above-mentioned strategies are among the latest solutions for gan image detection. However, the cgi detection problem has been extensively investigated in the past multimedia forensic literature [10, 6, 33]. It is worth noting that previous methods aimed at exposing some specific cgi inconsistencies and artifacts from some characteristic statistical traces or according to a pre-defined model. These strategies were suggested by the knowledge of the available cgi algorithms that could have been applied to generate the fake image. However, gan-generated images can not be related to a well defined model, since each scheme presents its own peculiarities depending on the implemented architecture and training process. Indeed, as shown in [20], each architecture may introduce different traces, thus making their generalization a complex task. A detector that has been trained to detect images generated by a specific gan architecture could not be suitable for a different gan scheme.

For this reason, the approach analyzed in this paper focuses on identifying and analyzing statistical traces that make gan-generated images differ from natural photographs. Previous work has shown that, on natural digital images, the probability distribution of specific variables usually follows a pre-defined behavior that proves to be completely-altered whenever the image is modified. As an example, the distribution of the first significant digit of quantized dct coefficients follow Benford’s law

[30]. This property can be proved whenever the statistics of quantized dct coefficients shows an exponential decay and can be empirically-verified on real images.

Exploiting this property, many forensics detectors have been successfully proposed in the literature (e.g., for detection of JPEG compression [29, 25], face morphing [18], [7] synthetic imagery, etc.). Despite these premises, there is no current proof that gan-generated pictures should be statistically compliant to Benford’s law [2].

In this work, we investigates whether Benford’s law can be used for the detection of gan-generated images. The reported analysis is exploited to design a gan image detector which proved to be extremely accurate with a limited computational effort. More precisely, we verify that Benford’s law is not followed by gan images, and we propose a set of related features that could highlight this unfitting for an analyzed digital image. A simple supervised learning framework is then proposed to detect if an image is natural or gan-generated from the extracted features.

This solution is evaluated on an image corpus made available by the authors of [21], enriched by additional images obtained by more modern gans. We make use of more than gan-generated images obtained through different architectures trained on different tasks on different datasets. Results show that there is a trade-off between the chosen size of the proposed feature vector and the achieved accuracy. It is possible to either use a compact feature vector to obtain results comparable with the state-of-the-art, or a larger feature vector that allows improving against the more recent solutions proposed in the literature. This flexibility makes the proposed solution particularly suitable also for low-power devices not equipped with an advanced gpu, which might still need to detect whether images are fake or not (e.g., smartphones, tablets, etc.). Additionally, we discuss resilience to JPEG compression in order to better define the working conditions of the proposed method.

Ii Background

Benford’s law, which is also known as First Digit (fd) law or Significant Digit law, concerns the statistical frequencies of the most significant digits for the elements of a real-life numerical set. More precisely, the rule states that, given a set of measurements for some natural quantities (e.g., population of cities, stock prices, etc.), the statistics of their fd follows the distribution depicted in Fig. 1 and described by the equation

(1)

where is the fd in base (the generalized version of this law is presented in the next section). This has been empirically-observed over a vast range of natural quantities [34], but it is also possible to prove it in closed form for many exponentially-decreasing probability distributions [39]. It has also been observed that this rule is not well-fitted by fd statistics from altered data: whenever numbers are changed according to some selective strategies, fd frequencies deviate from their theoretical values [9]. As a consequence, this proof has been used as supporting evidence for detecting falsified accounts, fake financial reports, and frauds [36].

This property has been largely exploited in multimedia forensics to detect image tampering. In fact, natural image dct coefficients can be typically modeled by a Laplacian-like distribution [35], which naturally follow Benford’s law, and for this reason, the mentioned rule can be successfully used in image forensic applications [31].

A well-known application of Benford’s law in forensics is the study of JPEG compression traces [32]: the authors propose using such rule to verify if an image has been JPEG compressed once or twice. Milani et al. [25] exploit fd’s features to detect multiple JPEG compression, also showing robustness against rotation and scaling. Pasquini et al. [29] address the multiple JPEG compression detection problem by means of Benford-Fourier analysis. The same authors also investigate traces of previous hidden JPEG compression in uncompressed images [28].

This rule has also been successfully applied to other forensic problems. In [24]

, the authors show that it is possible to leverage fd distribution to roughly estimate the amount of processing that has been applied to a given image. The authors of

[27] apply Benford’s law to solve image contrast enhancement detection problem. In [22], the authors make use of this law to deal with splicing forgery localization.

Fig. 1: Benford’s law fd pmf considering base for fd computation.

Another interesting application of Benford’s law in image forensics is detecting computer graphics and computer generated images. To this purpose, Del Acebo et al. [7] model light intensity in natural and synthetic images, concluding that FD’s law is not followed by the latter. Makrushin et al. [18] show how to efficiently detect morphed faces using the fitting parameters of the Benford’s logarithmic curve as a features.

Anyway, detecting synthetic images is nowadays a timely and crucial forensic need due to the achievements of gan technology in generating highly-realistic fake photographs. This possibility has been recently used to create false image and video contents in deepfake political propaganda, revenge porn, fake news creation. For these reasons, during the last years multimedia forensics researchers have been focusing on designing reliable strategies to detect synthetic images.

To this purpose, [19]

proposes a method to detect image-to-image translation over social networks. Specifically, the authors compare different detectors fine-tuned for the binary classification task of gan-generated against natural image detection. The same authors also show how a model-specific fingerprint can be retrieved by gan generated images in order to identify the specific network used for image generation

[20]. In [21], authors apply an incremental learning strategy to train a gan-generated image detector that can be progressively updated in time as new images from different kinds of gans are processed. In [16], the authors propose a method to detect gan-generated images by analyzing the disparities in color components between real scene images and generated images. In [23], gan images are detected by analyzing saturation artifacts in pixel distributions. Moreover, if videos are analyzed, methods exploiting also the temporal evolution of frames have been proposed [11, 22].

Iii Motivations

Natural images, as many other natural processes, can be roughly approximated as autoregressive signals [8]. This is the rationale behind different historical as well as more recently proposed image compression [8, 26] and generation [38, 37] methods. From these assumptions, an image can be modeled as a complex autoregressive signal with a generally low-pass characteristics.

gan generator’s structures are usually composed by a concatenation of limited-support convolutional layers followed by non-linearities. Filters’ coefficients are optimized so that gan’s response to a given input belongs to the desired output class. However, in most gan implementations, practical and complexity reasons have led to the adoption of filters with a limited size. Therefore, if no recursive operations are applied in the network architecture, the output of a gan generator looks more like a signal filtered through a fir filter than a complex autoregressive process.

The rationale behind the proposed method is that the information related to the filter ideally used to generate the data under analysis can be used to discriminate natural images (with autoregressive and complex spectra) from gan-generated ones (generated through operations closer to fir filtering). This can be done analyzing the statistics of quantized dct coefficients.

More precisely, let us assume that an input grayscale image is partitioned into distinct

blocks, which are then mapped into the 2D-dct domain and further quantized. This processing chain is used by the JPEG coding standards and proves to be tailored to the spectral characteristics of images. Some of the past works highlight that, in the frequency domain, the quantized dct coefficient statistics of natural images must follow Benford’s law

[31].

Let us denote as the DCT coefficient at the -th frequency in zig-zag mode obtained from the -th block and quantized with step . It is possible to compute the corresponding fd with base as

(2)

As can only assume values (i.e., all possible digits defined in base apart from zero), its pmf computed over the blocks is composed by elements. For the sake of notational compactness, let us momentarily drop the indexes , , and . We can formally define the pmf as

(3)

where

(4)

This pmf for a natural image must follow the generalized Benford’s law equation

(5)

where is a scale factor, and parameterize the logarithmic curve, and is one possible value of the considered first digits in base .

The fitness between and can be measured by some divergence functions such as the Jensen-Shannon divergence

(6)

which is a symmetrized version of the well-known Kullback-Leibler divergence

(7)

Since proves to be unstable for biased pmfs, it is possible to use the symmetrized Renyi divergence

(8)

or the symmetrized Tsallis divergence

(9)

where

(10)

It is possible to prove that, whenever an image is altered (e.g., it is compressed/quantized a second time, etc.), Benford’s law is not verified anymore. In fact, many modifications redistribute image coefficients among the bins of the quantizer, thus the final pmf associated to quantized dct coefficients presents some oscillating probability values that deviate from the ideal distribution. For these reasons, many solutions in the past measured the divergence between the empirically-estimated and its ideal fitted version in order to find whether the image has been altered or not. In this paper we show that it is possible to adopt the same solution to detect gan-generated pictures.

Iv GAN-generated image detection

In this section we provide a formal definition of the gan-generated image detection problem and report all the technical details about the detection method we propose.

Iv-a Problem formulation

We define the gan-generated image detection problem as a two-class classification problem. Given an image , we want to understand whether it has been synthetically generated through a gan, or it is a natural photograph.

Formally, to solve this problem we consider a pipeline composed by two blocks: a feature extractor and a supervised classifier. The feature extractor implements the function , which turns the image into a more compact yet informative representation, i.e., the feature vector . The classifier implements the function such that: if the image is a natural one; if the image comes from a gan. With this framework in mind, we focus on designing the function based on Benford’s law, so that a simple classifier can be effectively used.

Iv-B Detection method

The feature extraction process is depicted in Fig. 

2. Given an image , we divide it in non-overlapping blocks with resolution pixel. From each block, we compute its 2D-dct representation. We then quantize it using a given quantization step (chosen for each coefficient according to a JPEG quantization matrix).

Fig. 2: Feature extraction pipeline considering a single divergence, quantization step , base and dct coefficient . Extraction process is repeated for multiple parameters.

Given a base , we compute the first digit of the -th quantized 2D-dct frequency sample from the -th block according to (2). We then compute the pmf according to (3). Examples of for different bases for both natural and gan-generated images are reported in Fig. 3. Finally, we fit generalized Benford’s law expressed in (5) by solving a mean square error minimization problem as

(11)

Comparing the computed pmf and the Benford fit , we compute Jensen-Shannon divergence , Renyi divergence , and Tsallis divergence as reported in Section III. Notice that we removed the dependency of Tsallis and Renyi divergence on as we keep it constant in our experiments.

Finally, considering a set of bases, a set of dct frequencies and a set of JPEG quality factors driving the quantization parameter , we obtain the final feature vector by concatenating all divergences as:

(12)

Notice the the feature vector size depends on how many dct coefficients, bases and quantization steps are used during the analysis. For instance, if we choose a single compression step, a single dct frequency and a single base, the feature vector will be composed by the concatenation of just three divergences, thus having dimensionality . Conversely, if we use multiple bases, frequencies and compression steps, we end up with a bigger vector. In our experiments, we consider vectors with dimensionality ranging from to , as shall be explained in Section V.

After feature computation, the vector is fed to a supervised classifier. In order to study the effectiveness of Benford-based features, we do not adopt unnecessarily complicated classifiers

. Specifically, we resort to a Random Forest classifier.

(a)
(b)
Fig. 3: Different pmf for natural (blue) and gan-generated images (oranges) are compared to the ideal Benford curve (dashed green) for different bases . Blue and orange curves deviates differently from the green one.

V Results

In this section we discuss the used datasets, the experimental setup, and finally report all the results achieved with the proposed technique for gan-generated image detection.

V-a Dataset

In order to build our dataset, we started from the publicly available gan-dataset released by Marra et al. [21]. Specifically, we considered a corpus composed by different sub-datasets of images obtained employing different architectures: Cycle-Gan [40] and ProGAN [13]. The first architecture is designed for image-to-image translation purpose, i.e., mapping an image of a given class (i.e., pictures of horses) to an image of another one (i.e., pictures of zebras). The second architecture is a generator able of creating natural looking pictures of different scenes depending on the used training data (e.g., bedroom pictures, bridges, etc.). Each dataset comprises both natural images and their gan-generated counterparts. All images are color images and have a resolution of pixel. The complete composition is reported in Table I and some examples of the more than images are reported in Fig. 4.

(a)
(b)
(c)
(d)
(e)
(f)
(g)
(h)
(i)
(j)
(k)
(l)
Fig. 4: Examples of original (top) and gan-generated (bottom) images belonging to the dataset proposed in [21].
Architecture Dataset Number of images
orange2apple
photo2ukiyoe
winter2summer
zebra2horse
Cycle-Gan photo2cezanne
photo2vangogh
photo2monet
facades
cityscapes
sats
lsun_bedroom
lsun_bridge
ProGAN lsun_churchoutdoor
lsun_kitchen
lsun_tower
TABLE I: Dataset composition

V-B Setup

As shown in Section IV, each feature depends on a selected set of bases , dct frequencies , and analysis JPEG qf describing the set of quantization steps . With regards to bases, we test all combinations of sets of bases containing from one to four elements. This leads to possible combinations of bases (i.e., from to ). Concerning the selected dct frequencies, we choose a limited amount of sets (i.e., including the first 9 frequencies in zig-zag order after DC) , similarly to other detectors available in literature [25]). Specifically, we only consider 9 sets obtained by progressively-adding one frequency at a time from the previous set (i.e., , , , ). As for the quantization step values, the final feature set was created concatenating the arrays of divergences obtained using JPEG qfs in the set . Considering all combinations of bases, frequencies and JPEG quantization steps, we obtain a total amount of different setups.

Fig. 5: Accuracy obtained with different feature vectors obtained changing the considered sets , and . Each vector has a different length and provides a different accuracy result.

For each feature vector described in Section IV (i.e., each setup), we trained a different Random Forest classifier performing Leave-One-Group-Out cross-validation over the various datasets as explained in [19]. The choice of Random Forest was suggested by the low complexity requirements, the generalization capabilities, and the resilience to small training datasets. Namely, given a dataset out of the complete set of dataset , we trained our model over the remaining , and we test over . Results are always shown on the leave-out dataset, and we report the maximum accuracy value among all the different setups. To provide a practical example, let us consider the situation in which we test on dataset . This consists of original images (i.e., apples and oranges) and gan images (apples turned into oranges and viceversa). The classifier was trained on all the other images (excluding those in orange2apple

) in order to avoid biasing the results with overfitting. We adopted the Random Forest implementation provided with the open source Scikit Learn Python library. After a grid search over several candidates, we fixed the number of Decision Trees to

, with bootstrap sampling enabled. We selected Gini index as splitting policy, leaving all the other parameters as their default values.

V-C Experiments

In this section we report all experimental results achieved to evaluate the proposed technique. Moreover, we report a comparison against baseline solutions. Finally, we provide some additional insights in terms of resilience to JPEG compression.

To select the baselines, we focused on the work proposed by Marra et al. [19] since, to the best of our knowledge, it is the only work to perform an extensive gan detection test over a large dataset of images. Specifically, we selected two baselines: a completely data-driven one based on deep learning; a solution based on hand-crafted features commonly used in the forensics literature.

Similarly to the solution in [19], we compared our approach with the Xception cnn, as the first baseline method. According to the results in [19], this set of features seems to provide the best results over most of the considered datasets. Starting from the pretrained model, we finetuned it on our dataset, following the same Leave-One-Group-Out strategy we adopted for the Random Forest training. We used of the training data for the actual training, and the remaining for validation, testing on the Leave-Out dataset. We resorted to Adam optimization algorithm, with an initial learning rate of

, training until reaching convergence on a validation plateau. We adopted the Keras implementation of Xception, performing the training in several hours on a workstation equipped with a NVIDIA Titan V gpu, a Intel Xeon E5-2687W and 256 GB of RAM.

The second baseline method operates a linear Support Vector Machine (SVM) on a set of handcrafted steganalysis rich-features (as suggested in 

[19]). These features have been successfully used in image forgery detection tasks as well [5]. The model has been trained following the same train/test strategy used for Xception, using the Scikit Learn implementation of svm.

Feature length and parameters. In the design of the proposed solution we considered different combinations of features obtained varying the parameters in the set , , and changing feature vectors’ lengths. As a matter of fact, it is necessary to evaluate how the vector length could impact on the classifier performance. Fig. 5 shows the average test accuracy obtained on all datasets considering all possible feature vectors. It is possible to notice that even the smallest feature vectors of just elements enable achieving an accuracy greater than . It is sufficient to use features to have accuracy higher than .

(a) single value
(b) all values
(c) single value
(d) all value
(e) single value
(f) all value
Fig. 6: Accuracy varying different parameters: (a) fix JPEG compressions; (b) fix JPEG compressions; (c) fix base; (d) fix bases; (e) fix DCT coefficient; (f) fix DCT coefficients. When we fix a single parameter, we average all results obtained by fixing that parameter to each possible value it can span.

In order to gain a better insight on the effect of using different bases, dct coefficients and quantization steps, we performed an analysis by keeping some parameters fixed, and just changing the others. Fig. 6(a) and (b) show the results with a single fixed quantization step value, and considering all the values, respectively. In both scenarios it is possible to notice that the greatest improvement is obtained when more than a single dct coefficient is used. Moreover, the more the coefficients, the better the results. Fig. 6(c) and (d) report the results with features from a single fixed fd base and from all the considered bases, respectively. It seems that using more than one base only marginally improve the results. As a matter of fact, both figures are very similar. Finally, Fig. 6(e) and (f) display the accuracy values obtained from the features of a single dct frequency and of the whole set of frequencies, respectively. . From these figures it is possible to note that the more the considered quantization steps, the higher the accuracy for any other parameters set. This is not particularly surprising, as Benford’s law is naturally linked to the used JPEG quantization.

Comparison against baseline. In order to compare against the selected baselines solution, we finetuned Xception network and trained a linear svm on the steganalysis features for each dataset according to the same procedure used for our Random Forest as suggested in [19]. Table II reports the breakdown of test accuracy scores for all datasets. The highest average accuracy among the considered methods is obtained by the proposed method, and it is higher than . It is also interesting to notice that the proposed solution is considerably better than the baseline cnn on winter2summer, sats and lsun_bedroom, which seem to be particularly though for the latter. These results highlight that, in order to properly train a very deep network like Xception, a much larger dataset probably is needed. However, this might be difficult to obtain in a reduced amount of time in a forensic scenario. On the contrary, the proposed feature vector is very compact, thus Random Forest does not suffer from a smaller training set. The baseline handcrafted method performs reasonably well, but the obtained accuracy is lower than that of the proposed method of almost 9% on average.

Dataset Proposed Xception Steganalysis
orange2apple
photo2ukiyoe
winter2summer
zebra2horse
photo2cezanne
photo2vangogh
photo2monet
facades
cityscapes
sats
lsun_bedroom
lsun_bridge
lsun_churchoutdoor
lsun_kitchen
lsun_tower
avg
TABLE II: Accuracy results compared to the baseline solutions for each dataset. Average accuracy (avg) is also reported. Best result per dataset in bold.

Resilience to JPEG compression. When images are shared online, JPEG compression is almost always applied in order to reduce network and storage requirements. Therefore, we measured the performance of the proposed method whenever a further JPEG compression is applied with different coding parameter configurations.

In a first scenario, gan-generated and real images have been randomly JPEG compressed considering quality factors distributed in . The originally-trained detector (on non-compressed images) was then tested on this newly compressed dataset In this situation, the proposed solution approaches a random guess accuracy. However, this situation is not completely unexpected. As a matter of fact, Benford’s law is strictly tailored to JPEG compression. Therefore, scrambling with JPEG coefficients statistics through recompression has a high impact on Benford’s features.

We therefore considered a second scenario, which is more realistic as shown in [19]. If we know that images might be JPEG recompressed, we can also train our system on JPEG compressed images. We therefore re-trained our method and the baseline on compressed images, and tested them on compressed images. In this situation, results improve as expected. As a matter of fact, the proposed solution accuracy decreases, but still remains higher than . In particular, results depend on the specific datasets and gan architecture. Indeed, all results related to ProGAN (i.e., last five datasets) show an almost optimal accuracy always higher than . Conversely, on Cycle-Gan images, only a couple of datasets exhibit accuracy grater than . In this situation, if computational is feasible for the adopted architecture, the baseline network might be preferable.

We then tested a third scenario, assuming that the analyst knows which is the quality factor adopted by the final JPEG compression stage (since it can be read from the bit stream). It is possible to train a different Random Forest classifier or Xception network for each quality factor. We therefore generated three versions of the dataset by recompressing it with quality factor , , and , respectively. For each quality factor, we trained the proposed method and Xception baseline using the aforementioned leave-one-group-out strategy. We did not consider steganalysis features anymore, as in [19] the authors already showed that they greatly suffer JPEG compression. Results are reported in Table III. It is possible to notice that for high quality factors, the proposed Benford-based method outperforms the baseline. Xception network shows better results starting from quality factor .

In the final testing scenario, we assume that the analyst wants to train a different classifier for each JPEG quality factor, and for each kind of image content. As an example, if the analyst is interested in detecting fake oranges with a given quality factors, he/she might train only on the orange2apple dataset, rather than the others. In this situation (i.e., known quality factor and kind of GAN training dataset), both the proposed method and the Xception baseline achieve an almost perfect result for each quality factor (i.e., , and ).

QF Dataset Proposed Xception
orange2apple
photo2ukiyoe
cityscapes
lsun_tower
orange2apple
photo2ukiyoe
cityscapes
lsun_tower
orange2apple
photo2ukiyoe
cityscapes
lsun_tower
TABLE III: Accuracy obtained using different JPEG quality factors.

V-D Analysis on faces.

All results shown so far are obtained not considering GANs generating face images. This is due to two main reasons. First, GANs that were trained to generate face images produce particularly realistic results lately. This make face images harder to detect as GAN-generated compared to other kind of imagery. Indeed, shadows and lightning very often respect physics law, thus making Benford’s law almost verified [7]. Second, face-generating GANs are often trained on common pristine face datasets [17], which makes the Leave-One-Group-Out testing strategy applied to now impracticable.

In the light of these considerations, we decided to create a specific dataset, composed by all the faces dataset in the original corpus, generated by ProGAN [13] ( images), StarGAN [4] ( images) and GlowGAN [15] ( images), plus some additional images generated by the more recently proposed StyleGAN2 [14] ( images). As pristine faces, we always consider images from the Celeb-A dataset [17].

ProGAN is trained to generate realistic faces similar to those from Celeb-A dataset [17]. StarGAN and GlowGAN are trained to obtain faces with different characteristics (e.g., hair colors, smiles, etc.). Finally, StyleGAN2 produces images at different qualities depending on its configuration parameter or as suggested by the authors [14], starting from the Flickr-Faces-HQ Dataset [13]. Some random images from those generated by StyleGAN2 are shown in Fig. 7. For each dataset, we train a Random Forest classifier considering 70% of the images as training set and 30% as test.

Table IV shows the achieved results on each test set. It is possible to notice that StarGAN and GlowGAN seems to be easier to detect. On the contrary, ProGAN and StyleGAN2 looks more challenging. These promising preliminary results motivate some future work with more extended face image datasets, also comparing against other baselines and in presence of editing operations.

(a)
(b)
(c)
(d)
(e)
(f)
Fig. 7: Examples of faces generated with StyleGan2 [14].
Dataset Proposed
progan_celeba
stargan_black_hair
stargan_blond_hair
stargan_brown_hair
stargan_male
stargan_smiling
glow_black_hair
glow_blond_hair
glow_brown_hair
glow_male
glow_smiling
stylegan2-0.5
stylegan2-1
avg
TABLE IV: Accuracy results for each face test dataset. Average accuracy (avg) is also reported. Accuracy higher than are reported in bold.

Vi Conclusions

In this paper we proposed a study on the use of the well-known Benford’s law for the task of gan-generated image detection. We proposed a strategy to extract Benford-related features from an image relying on different divergence definitions. We also showed how to combine these features in order to better exploit different bases as well as dct frequencies. Using these features, we performed a series of experiment based on a simple Random Forest classifier in order to study the amount of information captured by the features, rather than focusing on specializing a complex classifier.

Results show that gan-generated images often fail in respecting Benford’s law, thus can be discriminated from natural pictures. However, some kind of cnn architectures seem to produce images that are harder to detect than others. This motivates future studies on this topic. Moreover, this suggests to possibly embed Benford’s law into gan loss function in order to obtain even more realistic images.

References

  • [1] A. Bansal, S. Ma, D. Ramanan, and Y. Sheikh (2018) Recycle-GAN: unsupervised video retargeting. In

    European Conference on Computer Vision (ECCV)

    ,
    Cited by: §I.
  • [2] F. Benford (1938) The law of anomalous numbers. Proceedings of the American philosophical society. Cited by: §I.
  • [3] M. Brundage et al. (2018-02-20)

    The malicious use of artificial intelligence: forecasting, prevention, and mitigation

    .
    arXiv:1802.07228. External Links: 1802.07228v1 Cited by: §I.
  • [4] Y. Choi, M. Choi, M. Kim, J. Ha, S. Kim, and J. Choo (2018) StarGAN: unified generative adversarial networks for multi-domain image-to-image translation. In

    Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

    ,
    Cited by: §V-D.
  • [5] D. Cozzolino, D. Gragnaniello, and L. Verdoliva (2014-10) Image forgery detection through residual-based local descriptors and block-matching. In 2014 IEEE International Conference on Image Processing (ICIP), Vol. , pp. 5297–5301. External Links: Document, ISSN 2381-8549 Cited by: §V-C.
  • [6] D. Dang-Nguyen, G. Boato, and F. G. B. De Natale (2015) 3D-model-based video analysis for computer generated faces identification. IEEE Transactions on Information Forensics and Security (TIFS) 10, pp. 1752–1763. Cited by: §I.
  • [7] E. del Acebo and M. Sbert (2005) Benford’s law for natural and synthetic images.. In Eurographics Workshop on Computational Aesthetics in Graphics, Visualization and Imaging, External Links: Document Cited by: §I, §II, §V-D.
  • [8] E. J. Delp, R. L. Kashyap, and O. R. Mitcheli (1979) Image data compression using autoregressive time series models. Pattern Recognition 11, pp. 313–323. Cited by: §III.
  • [9] A. Diekmann (2007-04) Not the first digit! using Benford’s law to detect fraudulent scientific data. Journal of Applied Statistics 34, pp. 321–329. External Links: Document Cited by: §II.
  • [10] H. Farid and M. J. Bravo (2012) Perceptual discrimination of computer generated and photographic faces. Digital Investigation 8, pp. 226–235. Cited by: §I.
  • [11] D. Güera and E. J. Delp (2019)

    Deepfake Video Detection Using Recurrent Neural Networks

    .
    IEEE International Conference on Advanced Video and Signal-Based Surveillance (AVSS). External Links: Document, ISBN 9781538692943 Cited by: §II.
  • [12] T. Karras, S. Laine, and T. Aila (2019) A style-based generator architecture for generative adversarial networks. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Cited by: §I.
  • [13] T. Karras, T. Aila, S. Laine, and J. Lehtinen (2018) Progressive growing of GANs for improved quality, stability, and variation. In International Conference on Learning Representations, Cited by: §V-A, §V-D, §V-D.
  • [14] T. Karras, S. Laine, M. Aittala, J. Hellsten, J. Lehtinen, and T. Aila (2019) Analyzing and improving the image quality of StyleGAN. CoRR abs/1912.04958. Cited by: Fig. 7, §V-D, §V-D.
  • [15] D. P. Kingma and P. Dhariwal (2018) Glow: generative flow with invertible 1x1 convolutions. In NeurIPS, Cited by: §V-D.
  • [16] H. Li, B. Li, S. Tan, and J. Huang (2018) Detection of deep network generated images using disparities in color components. arXiv preprint arXiv:1808.07276. Cited by: §II.
  • [17] Z. Liu, P. Luo, X. Wang, and X. Tang (2015-12) Deep learning face attributes in the wild. In Proceedings of International Conference on Computer Vision (ICCV), Cited by: §V-D, §V-D, §V-D.
  • [18] A. Makrushin, C. Kraetzer, T. Neubert, and J. Dittmann (2018) Generalized Benford’s law for blind detection of morphed face images. In ACM Workshop on Information Hiding and Multimedia Security (IH&MMSec), External Links: Document, ISBN 978-1-4503-5625-1 Cited by: §I, §II.
  • [19] F. Marra, D. Gragnaniello, D. Cozzolino, and L. Verdoliva (2018) Detection of GAN-Generated Fake Images over Social Networks. IEEE International Conference on Multimedia Information Processing and Retrieval (MIPR). External Links: Document, ISBN 9781538618578 Cited by: §I, §II, §V-B, §V-C, §V-C, §V-C, §V-C, §V-C, §V-C.
  • [20] F. Marra, D. Gragnaniello, L. Verdoliva, and G. Poggi (2019) Do GANs Leave Artificial Fingerprints?. IEEE International Conference on Multimedia Information Processing and Retrieval (MIPR). External Links: Document, ISBN 9781728111988 Cited by: §I, §I, §II.
  • [21] F. Marra, C. Saltori, G. Boato, and L. Verdoliva (2019) Incremental learning for the detection and classification of GAN-generated images. arXiv:1910.01568v2. External Links: 1910.01568v2 Cited by: §I, §II, Fig. 4, §V-A.
  • [22] F. Matern, C. Riess, and M. Stamminger (2019) Exploiting visual artifacts to expose deepfakes and face manipulations. In IEEE Winter Applications of Computer Vision Workshops (WACVW), External Links: Document Cited by: §II, §II.
  • [23] S. McCloskey and M. Albright (2019) Detecting GAN-generated imagery using saturation cues. In IEEE International Conference on Image Processing (ICIP), External Links: Document Cited by: §I, §II.
  • [24] S. Milani, M. Fontana, P. Bestagini, and S. Tubaro (2016) Phylogenetic analysis of near-duplicate images using processing age metrics. In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), External Links: Document Cited by: §II.
  • [25] S. Milani, M. Tagliasacchi, and S. Tubaro (2014) Discriminating multiple JPEG compressions using first digit features. APSIPA Transactions on Signal and Information Processing 3, pp. 1–11. Cited by: §I, §II, §V-B.
  • [26] D. Minnen, J. Ballé, and G. D. Toderici (2018) Joint autoregressive and hierarchical priors for learned image compression. In Advances in Neural Information Processing Systems 31, S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett (Eds.), pp. 10771–10780. Cited by: §III.
  • [27] S. S. Moin and S. Islam (2017) Benford’s law for detecting contrast enhancement. In International Conference on Image Information Processing (ICIIP), External Links: Document Cited by: §II.
  • [28] C. Pasquini, G. Boato, and F. Pérez-González (2017-12) Statistical detection of JPEG traces in digital images in uncompressed formats. IEEE Transactions on Information Forensics and Security (TIFS) 12 (12), pp. 2890–2905. External Links: ISSN 1556-6013, Document Cited by: §II.
  • [29] C. Pasquini, G. Boato, and F. Pérez-González (2014) Multiple JPEG compression detection by means of Benford-Fourier coefficients. In IEEE International Workshop on Information Forensics and Security (WIFS), Cited by: §I, §II.
  • [30] F. Pérez-González, G. L. Heileman, and C. T. Abdallah (2007) Benford’s law in image processing. In IEEE International Conference on Image Processing (ICIP), External Links: Document Cited by: §I.
  • [31] F. Pérez-González, T. T. Quach, C. Abdallah, G. L. Heileman, and S. J. Miller (2015) Application of benford’s law to images. In Benford’s Law: Theory and Applications, S. Miller (Ed.), External Links: ISBN 9781400866595 Cited by: §II, §III.
  • [32] T. Pevny and J. Fridrich (2008-06) Detection of double-compression in jpeg images for applications in steganography. IEEE Transactions on Information Forensics and Security 3 (2), pp. 247–258. External Links: Document, ISSN Cited by: §II.
  • [33] N. Rahmouni, V. Nozick, J. Yamagishi, and I. Echizen (2017)

    Distinguishing computer graphics from natural images using convolution neural networks

    .
    In IEEE Workshop on Information Forensics and Security (WIFS), Cited by: §I.
  • [34] R. A. Raimi (1976) The first digit problem. The American Mathematical Monthly 83 (7), pp. 521–538. External Links: Document Cited by: §II.
  • [35] S. Smoot and L. Rowe (1996) Study of DCT coefficient distributions. In SPIE Symposium on Electronic Imaging (EI), Cited by: §II.
  • [36] K. Todter (2009-08) Benford’s law as an indicator of fraud in economics. German Economic Review 10, pp. 339–351. External Links: Document Cited by: §II.
  • [37] A. van den Oord, N. Kalchbrenner, L. Espeholt, k. kavukcuoglu, O. Vinyals, and A. Graves (2016) Conditional image generation with pixelcnn decoders. In Advances in Neural Information Processing Systems 29, D. D. Lee, M. Sugiyama, U. V. Luxburg, I. Guyon, and R. Garnett (Eds.), pp. 4790–4798. Cited by: §III.
  • [38] A. van den Oord, N. Kalchbrenner, and K. Kavukcuoglu (2016) Pixel recurrent neural networks. CoRR abs/1601.06759. External Links: 1601.06759, Link Cited by: §III.
  • [39] J. Wang, B. Cha, S. Cho, and C. -. J. Kuo (2009-06) Understanding benford’s law and its vulnerability in image forensics. In 2009 IEEE International Conference on Multimedia and Expo, Vol. , pp. 1568–1571. External Links: Document, ISSN Cited by: §II.
  • [40] J. Zhu, T. Park, P. Isola, and A. A. Efros (2017) Unpaired image-to-image translation using cycle-consistent adversarial networks. In IEEE International Conference on Computer Vision (ICCV), Cited by: §V-A.