Photo and video editing includes the insertion or removal of parts of the image, often performed by internal or external copy-move operations. The Poisson editing technique [1, 2] allows for seamless insertions and is now routinely used for special effects in movies, in software like Photoshop or in popular mobile phone applications. Most editing operations are driven by aesthetic goals. Yet their usage can easily become malicious and help forging false evidence, fake news, or alter results in scientific publications .
It is therefore of primary importance to provide public and professionals with reliable scientific tools detecting traces of any intentional alteration of a photograph. Several different techniques are relevant here: image splicing (internal or external) can be detected through its local alterations of the compression encoding and of the JPEG blocks [4, 5, 6], its inconsistent demosaicking traces [7, 8, 9], or directly [10, 11, 12]. Methods tracking other features such as noise inconsistencies, lightning inconsistencies, chromatic aberration inconsistencies, etc. were listed in the broad review .
Our paper focuses on a specific type of image splicing called “copy-move”. As its name indicates it consists in copying a region of the image and pasting it somewhere else. Rotation, scaling, change of contrasts and other manipulations are sometimes applied to the piece being copied before pasting it. The method can be used to replicate objects, but sometimes also to hide an object by a texture borrowed elsewhere in the image. Copy-move detection methods can be divided into two main categories: Block-based and keypoint-based. The block-based approaches try to match regions by blocks. In order to match the blocks more easily and more efficiently it is frequent to represent the block in a compact form by dimensionality reduction, e.g. with PCA, DCT  or DWT 
. The compact representation may also ensure the invariance of the detection to rotations by using Zernike moments[15, 16] or a similarity invariance with the Fourier-Mellin transform [17, 18]. These methods generally manage to detect the forged regions, but are computationally demanding. Instead of directly trying to match blocks, featured-based methods compute sets of keypoints and then match these keypoints. Many of these methods are based on SIFT [19, 20] or SURF . The descriptors associated to the keypoints are invariant to rotation, scaling and even moderate affine distortions. Yet, precisely out of too much robustness, these methods may cause false detections when similar objects are present in an image. As argued in , most methods therefore suffer from a false positive problem caused by the occurrence of “natural” self-similarity. This is the problem that we attempt to tackle here.
2 Copy-move matching with SIFT-like matching
Like in the SIFT algorithm  we start by computing a set of sparse keypoints. These keypoints are usually located in textured regions. Then a descriptor is associated to each of these keypoints. Finally the descriptors are matched to each other to define the detection. These three steps are summarized in the next paragraphs.
The keypoints correspond to the extrema of the normalized Laplacian scale-space. In practice they are computed using differences of Gaussians. The positions of the maxima are then found for each scale. To each keypoint is associated a scale and a principal orientation. A more detailed analysis can be found in .
This first, classic, SIFT step gives a list of keypoints. From each of these keypoints (consisting of a spatial position, a scale and an orientation), a square patch of size can be sampled. The gradients in both directions are then computed from these patches yielding an gradient patch
with vector values.
Contrary to SIFT, we keep these matrices for the matching step. Indeed, using histograms of gradients (HOGs) to represent the gradient patch would be too robust a representation and lead to the detection of natural repetitions. Hence, following  and . we encode the key point by its gradient patch . This allows for an invariance to uniform illumination changes. In SIFT, the descriptors are computed on a grayscale version of the image for matching applications. In the forgery case, it is interesting to consider all information available. So our gradient descriptors keep three channels, one for each color. For simplicity our matching step will be presented using a grayscale descriptor, but extends immediately to color as well. Color descriptors will be used for the experiments in Section 3.
Two naturally similar objects are rarely exactly similar. This is because there are always differences in their illumination in a real scene, and physical differences that do not necessarily catch the eye, in addition to the acquisition noise. Our matching process takes advantage of these serious variations to discriminate between similar objects and digital copies. Consider two keypoints and located respectively on the original object and on the forged copy, so that they match with a regular matching method such as SIFT  or . In that case the descriptors and associated to these keypoints should be exactly the same, namely
. Of course this perfect quality is not reached in practice. Several copy-move steps could introduce small differences such as: the interpolation due to a rotation or zoom or even a post-processing step such as the addition of noise and/or a compression after forgery. Nevertheless, we can enforce a very close match between each part of the descriptors by an exigence like. For the distance defined by
the suspicious match test is simply .
The key question is to fix the right detection threshold , to have a matching criterion that rejects genuinely similar objects while still detecting well copy-move forgeries. This threshold can be computed rigorously using the a-contrario theory  which is a probabilistic formalization of the non-accidentalness principle . This principle has shown its practical use for detection purposes such as segment detection , vanishing points detection 31]. The a-contrario theory provides a way to compute automatically detection thresholds while having a control on the number of false alarms (NFA). It replaces the usual -value by drawing into account the number of tests and therefore controlling the overall number of false alarms in a given detection task. The method only requires a simple a contrario stochastic model on the perturbation. We will consider for now that and
are derived from the same patch but one of them has been corrupted by Gaussian noise of variancei.e. since the descriptors consist of gradients where and are independent. Matching both descriptors requires that
. The probability of matching in this case is then
where follows a distribution with degree of freedom. We can therefore control the number of false detections by choosing the proper according to
where is the number of false alarms per number of tests and is inverse of the cumulative distribution function. The main point of formula (3) is that it reduces the initial method dependency on many detection parameters to just one, namely . We can argue that this last one is not critical. Indeed, even though the dependency on is strong, as long as the degradation is not too large, there will be a scale in which is small enough so the detection will work: indeed
is divided by two at each octave in the SIFT method. For example, this exigent threshold can work for a noise of 4, but requires the tampered area to be four times larger for a detection. It might be objected that zooming down also makes naturally similar objects become more similar. Yet our experiments indicate that this is not the case, indeed their small but significant differences encompass all scales. To summarize, granting that we allow for one false detection on average on a set of images, the method is parameterless as it adapts to the number of tests and to the patch size. Of course it might be coupled with an automatic noise estimator to give an good guess of. Assuming a perturbation noise of variance and the use of descriptors of size (derived from color patches), a number of false alarms of and testing on images with on average keypoints (this corresponds to the COVERAGE dataset presented in Section 3), Equation (3) gives . The advantage of using color descriptors in this case is either to increase the size of the descriptor (allowing for a larger and detecting more) or to reduce the spatial size for a same size of descriptor (allowing to detect smaller forgeries).
An interesting side effect is that this test is really fast to compute. Indeed to detect forgeries each keypoint must to compared against all others. Since we are comparing keypoints inside a single image this gives pairs to be tested, where is the number of descriptors. (Of course all descriptor self-matches are discarded). For large images the computation of the distance becomes quickly a bottleneck for distances that are costly. In our case it is not necessary to compute before doing the test, the test can be done during the computation of which allows for early stopping. Since most keypoints won’t match, the number of operations done per comparison is in practice much smaller than the size of the descriptor. An experimental verification of this fact is made in Section 3.
Finally we need to take into account all possible flips for the forged regions. While the matching process doesn’t detect flips it is possible to still detect them at the cost of a few more computations. The modified distance to test flips is then
where . Indeed when flipped, the indexes in one direction are reversed but also the gradients in that direction are opposite. Thanks to the rotation invariance all flips are taken into account by just testing the flip in one direction (in our case in the direction). In the end, we test each pair of keypoints with both distances to take into account flips. Having to do twice the computation is not a problem in practice as each test is very efficient.
|Dataset||Method||True detections||False detections|
In this section the images are all shown in grayscale even though they are originally in color. This allows for a better visualization of the matches. Nevertheless, the descriptors were color descriptors. We present results on three different datasets: GRIP , Image Manipulation (IM)  and COVERAGE  which is the dataset that inspired this study as it focuses on distinguishing forgeries from similar but genuine objects. All images shown in this section come from these datasets.
We decided to use descriptors of size for the IM dataset and otherwise as the images from the IM dataset are much larger than the ones from the other datasets. Indeed the size of the descriptors needs be chosen so to be smaller than the expected size of the forged regions. Each time the threshold was computed using Equation (3) from Section 2.3. We also verified that the number of comparisons done to compare two descriptors was much smaller than the size of a descriptor. For example for the image shown in Figure 1, of size and containing descriptors only comparisons were necessary on average for a descriptor size of that is almost the size of the descriptor used for SIFT. Thus the detection is really fast even for large images with a large number of keypoints.
Table 1 shows that while the methods focuses on being robust to similar objects and reduces as much as possible false detections, it is actually competitive with previous keypoint based methods. Moreover, the number of false alarm is definitely under control : only very little false alarms were found in all three datasets. One of these false detection is shown in Figure 3. We also verified that while robust to similar objects (and therefore very precise) the method still is robust to reasonable noise and compression.
Figure 2 shows different examples of successful detections. The method is able to detect well rotation, uniform illumination changes, scaling and compression. As can be seen, the more texture the forged region has the easier it is to detect. This is because a textured region will generate more keypoints and therefore will increase its chances of matching.
Figure 3 shows several failure examples. Most failures come from the fact that the method can’t deal with more severe distortions such as a tilt or non-uniform illumination change. The method also fails to detect flat regions, as no keypoints are computed on these regions. As for the false detection, it does not contradict the a contrario model. We requested at most false detection per images with the threshold given in Section 2.3, and we found one with images tested for the COVERAGE dataset.
In this paper we have presented an unsupervised method to detect copy-move forgeries that is not only invariant to rotation, scaling and global change of illumination, but also robust to the presence of similar but genuinely different objects or regions. The method, being parameter-less and very fast, can be included in the necessary long series of tampering tests applied to a suspicious image.
The limits of the method are closely linked to its strength. Because it is robust to the presence of naturally similar objects, it is less reliable in case of large degradation of a copied digital ones. We nevertheless found that the method is robust enough to usual noise and compression levels. An image that has been degraded too much is suspicious anyway, since nowadays the quality of an image taken with a mobile is very good. Thus only images with a good enough quality should be tested. Highly degraded images would be anyway suspicious regardless of any such more sophisticated examination. The second limit is the usage of sparse keypoints. These keypoints are only computed in regions that are contrasted enough (non-flat areas) which means that forgeries in these regions might not detected. Finally matching keypoints give anchor points and do not delimit forged regions precisely. A natural extension of the method would be to extract the forged regions from the anchor points while still keeping a good control over the number of false detections. Finally coupling the method with a noise estimator could arguably make it still more discriminant.
-  P. Pérez, M. Gangnet, and A. Blake, “Poisson image editing,” ACM TOG, 2003.
-  J. M. Di Martino, G. Facciolo, and E. Meinhardt-Llopis, “Poisson Image Editing,” IPOL, vol. 6, pp. 300–325, 2016.
-  E. Bik, A. Casadevall, and F. Fang, “The prevalence of inappropriate image duplication in biomedical research publications,” MBio, vol. 7, no. 3, pp. e00809–16, 2016.
-  Z. Lin, J. He, X. Tang, and C.-K. Tang, “Fast, automatic and fine-grained tampered jpeg image detection via dct coefficient analysis,” Pattern Recognition, vol. 42, no. 11, pp. 2492–2501, 2009.
-  Y. Cao, T. Gao, L. Fan, and Q. Yang, “A robust detection algorithm for copy-move forgery in digital images,” Forensic science int., vol. 214, no. 1-3, pp. 33–43, 2012.
-  T. Nikoukhah, R. Grompone von Gioi, M. Colom, and J.-M. Morel, “Automatic jpeg grid detection with controlled false alarms, and its image forensic applications,” in IEEE MIPR. IEEE, 2018, pp. 378–383.
-  A. Popescu and H. Farid, “Exposing digital forgeries in color filter array interpolated images,” Trans. on Signal Processing, vol. 53, no. 10, pp. 3948–3959, 2005.
-  P. Ferrara, T. Bianchi, A. De Rosa, and A. Piva, “Image forgery localization via fine-grained analysis of cfa artifacts,” IEEE TIFS, vol. 7, no. 5, pp. 1566–1577, 2012.
-  Q. Bammey, R. Grompone von Gioi, and J.-M. Morel, “Automatic detection of demosaicing image artifacts and its use in tampering detection,” in IEEE MIPR. IEEE, 2018, pp. 424–429.
-  T.-T. Ng, S.-F. Chang, and Q. Sun, “Blind detection of photomontage using higher order statistics,” in ISCAS. IEEE, 2004, vol. 5.
-  Y.-F. Hsu and S.-F. Chang, “Image splicing detection using camera response function consistency and automatic segmentation,” in Int. conf. on Multimedia and Expo. IEEE, 2007, pp. 28–31.
-  M. Huh, A. Liu, A. Owens, and A. Efros, “Fighting fake news: Image splice detection via learned self-consistency,” arXiv preprint arXiv:1805.04096, 2018.
-  H. Farid, “Image forgery detection,” IEEE Signal processing magazine, vol. 26, no. 2, pp. 16–25, 2009.
-  G. Li, Q. Wu, D. Tu, and S. Sun, “A sorted neighborhood approach for detecting duplicated regions in image forgeries based on dwt and svd,” in Int. conf. on Multimedia and Expo. IEEE, 2007, pp. 1750–1753.
-  D. Cozzolino, G. Poggi, and L. Verdoliva, “Efficient dense-field copy–move forgery detection,” TIFS, vol. 10, no. 11, pp. 2284–2297, 2015.
-  T. Ehret, “Automatic Detection of Internal Copy-Move Forgeries in Images,” IPOL, vol. 8, pp. 167–191, 2018.
-  S. Bayram, H. Sencar, and N. Memon, “An efficient and robust method for detecting copy-move forgery,” in ICASSP. IEEE, 2009, pp. 1053–1056.
-  W. Li and N. Yu, “Rotation robust detection of copy-move forgery.,” in ICIP, 2010.
-  I. Amerini, L. Ballan, R. Caldelli, A. Del Bimbo, and G. Serra, “Geometric tampering estimation by means of a sift-based forensic analysis.,” in ICASSP, 2010.
-  E. Ardizzone, A. Bruno, and G. Mazzola, “Detecting multiple copies in tampered images,” in ICIP. IEEE, 2010.
-  X. Bo, W. Junwen, L. Guangjie, and D. Yuewei, “Image copy-move forgery detection based on surf,” in MINES. IEEE, 2010, pp. 889–892.
-  B. Wen, Y. Zhu, R. Subramanian, T.-T. Ng, X. Shen, and S. Winkler, “Coverage—a novel database for copy-move forgery detection,” in ICIP. IEEE, 2016.
-  D. Lowe, “Object recognition from local scale-invariant features,” in CVPR. IEEE, 1999, vol. 2, pp. 1150–1157.
-  I. Rey Otero and M. Delbracio, “Anatomy of the SIFT Method,” IPOL, vol. 4, pp. 370–396, 2014.
-  Rafael Grompone and Viorica Pătrăucean, “A contrario patch matching, with an application to keypoint matches validation,” in ICIP. IEEE, 2015.
M. Rodríguez and R. Grompone von Gioi,
“Affine invariant image comparison under repetitive structures,”in IEEE ICIP. IEEE, 2018.
-  A. Desolneux, L. Moisan, and J.-M. Morel, From gestalt theory to image analysis: a probabilistic approach, vol. 34, Springer Science & Business Media, 2007.
-  D. Lowe, Perceptual organization and visual recognition, Kluwer Academic Publishers, 1985.
-  R. Grompone Von Gioi, J. Jakubowicz, J.-M. Morel, and G. Randall, “Lsd: A fast line segment detector with a false detection control,” IEEE PAMI, vol. 32, no. 4, pp. 722–732, 2010.
-  J. Lezama, R. Grompone, G. Randall, and J.-M. Morel, “Finding vanishing points via point alignments in image primal and dual domains,” in CVPR, 2014.
-  A. Davy, T. Ehret, J.-M. Morel, and M. Delbracio, “Reducing anomaly detection in images to detection in noise,” in ICIP. IEEE, 2018.
-  V. Christlein, C. Riess, J. Jordan, C. Riess, and E. Angelopoulou, “An evaluation of popular copy-move forgery detection approaches,” arXiv preprint arXiv:1208.3665, 2012.