Log In Sign Up

Exposing Fake Images with Forensic Similarity Graphs

by   Owen Mayer, et al.

In this paper, we propose new image forgery detection and localization algorithms by recasting these problems as graph-based community detection problems. We define localized image tampering as any locally applied manipulation, including splicing and airbrushing, but not globally applied processes such as compression, whole-image resizing or contrast enhancement, etc. To show this, we propose an abstract, graph-based representation of an image, which we call the Forensic Similarity Graph. In this representation, small image patches are represented by graph vertices, and edges that connect pairs of vertices are assigned according to the forensic similarity between patches. Localized tampering introduces unique structure into this graph, which align with a concept called "communities" in graph-theory literature. A community is a subset of vertices that contain densely connected edges within the community, and relatively sparse edges to other communities. In the Forensic Similarity Graph, communities correspond to the tampered and unaltered regions in the image. As a result, forgery detection is performed by identifying whether multiple communities exist, and forgery localization is performed by partitioning these communities. In this paper, we additionally propose two community detection techniques, adapted from literature, to detect and localize image forgeries. We experimentally show that our proposed community detection methods outperform existing state-of-the-art forgery detection and localization methods.


page 1

page 3

page 5

page 7

page 10

page 11


Community Detection and Classification in Hierarchical Stochastic Blockmodels

We propose a robust, scalable, integrated methodology for community dete...

A General Framework for Complex Network-Based Image Segmentation

With the recent advances in complex networks theory, graph-based techniq...

Spectral Recovery of Binary Censored Block Models

Community detection is the problem of identifying community structure in...

Network Detection Theory and Performance

Network detection is an important capability in many areas of applied re...

The algorithm of the impulse noise filtration in images based on an algorithm of community detection in graphs

This article suggests an algorithm of impulse noise filtration, based on...

CIIA:A New Algorithm for Community Detection

In this paper, through thinking on the modularity function that measures...

The Hyperspherical Geometry of Community Detection: Modularity as a Distance

The Louvain algorithm is currently one of the most popular community det...

I Introduction

Determining the authenticity of digital multimedia is an important social problem. Many important institutions rely upon truthful multimedia content, including news services, law firms, and intelligence agencies. As a result, multimedia forensics researchers have developed a variety of approaches to inspect images for evidence of forgery [stamm2013information]. Many of these approaches operate by directly identifying traces of manipulation, such as resampling [popescu2005exposing], double JPEG compression [bianchi2012image, farid2009exposing, ye2007detecting], or contrast enhancement [stamm2010forensic]. Other approaches have found significant success by identifying inconsistencies in imaging traces such as lens aberrations [johnson2006exposing, mayer2018lca], sensor noise [lukavs2006detecting, chen2007imaging] and general residual-based features [cozzolino2015splicebuster, cozzolino2016single]. More recently, deep learning based approaches have been found to be very effective for detecting and localizing image forgeries [bondi2017cvprw, huh2018forensics, salloum2018image, zhou2018learning, cozzolino2018camera, cozzolino2018noiseprint, mayer2018similarity, mayer2019similarity].

A recent deep-learning-based forensics approach called “Forensic Similarity” has been shown to be a promising technique for forgery detection and localization [mayer2018similarity, mayer2019similarity]. Earlier work in [mayer2018similarity] demonstrated utility in forgery localization, and later work [mayer2019similarity] proposed a technique using Forensic Similarity that outperformed prior art in forgery detection. Forensic Similarity is a technique that maps two small image patches to a score indicating whether the two patches contain the same or different forensic traces. These forensic traces are related to the source camera model and/or processing history of the image. Importantly, Forensic Similarity is effective even on “unknown” forensic traces that were not used to train the system. By identifying differences in forensic traces within an image, falsified images are exposed [mayer2018similarity, mayer2019similarity].

While Forensic Similarity has shown encouraging promise for forgery analysis, existing methods have several drawbacks. First, previously proposed forgery localization techniques based on Forensic Similarity require the selection of a reference patch in a region known to be unaltered. This type of approach is sensitive to the selection of reference patch, and is an unrealistic scenario for a forensic investigator, who may not know a priori which regions are unaltered. Second, the forgery detection approach using Forensic Similarity in [mayer2019similarity]

uses a heuristic detection measure. While the heuristic approach achieved high forgery detection rates, performance can be significantly improved by developing more powerful approaches.

To address these problems, in this work we build off of prior Forensic Similarity research to more accurately detect image forgeries and localize the falsified image regions. To do this, we propose a new multimedia forensics concept called the Forensic Similarity Graph, which is an abstract graph-based representation of an image that captures important forensics relationships among all image regions. Briefly, to do this we first sample small patches from an image and represent each patch as vertex of a graph. Then, we assign edge weights between vertices according to the forensic similarity value between each corresponding pair of patches. Localized image tampering (e.g. splicing or localized airbrushing) introduces particular structures, called communities, into this Forensic Graph Representation. We propose two techniques to identify and partition these communities, adapting community detection techniques from literature [fortunato2010community]. Using our proposed technique we expose localized image tampering with high accuracy, and do so without requiring a reference patches to be selected.

The remaining sections of this paper are organized as follows. In Sec. II, we discuss related works on forgery detection and localization, as well as related community detection literature. In Sec. III we first introduce the concept of the Forensic Similarity Graph, and describe how to compute it in an image. In this section we also show how localized tampering introduces identifiable structures, called “communities,” into the Forensic Similarity Graph. In Sec. IV, we propose two community detection techniques which are used to identify and partition these community structures. We show how these techniques are used to detect forged images and, in tampered images, localize the tampered regions. Furthermore, we describe how the community partitions can be converted to pixel-level forgery localization decisions. Finally, in Sec. V and Sec. VI, we conduct a series of experiments testing the efficacy of our proposed approach. Experiments show that our proposed approach outperforms naive approaches that do not consider community structure, improving upon heuristics proposed in prior art. We also show that our approach outperforms prior-art localization approaches on publicly available benchmark databases.

Ii Background and Related Work

Ii-a Image Forgery Detection and Localization

Multimedia forensics techniques explicitly draw the distinction between forgery detection and forgery localization. Forgery detection methods are those that determine if an image has been tampered or is alternatively unaltered. Forgery localization methods are those that, given a forged image, determine which regions of an image have been tampered. While these are distinctly different problems, the underlying mechanisms for analysis are similar and, as a result, are often studied in conjunction. In this paper, we propose techniques for both forgery detection and forgery localization.

Recently, researchers have leveraged advances in deep learning to perform both forgery detection and forgery localization with high accuracy. In early work by Bondi et al. a convolutional neural network (CNN) was used to generate deep-feature representations of an image’s source camera-model, and then detected and localized image forgeries via an iterative k-means clustering approach 

[bondi2017cvprw]. Subsequently, our work in [mayer2018similarity] showed that the similarity of camera-model related forensic traces can be directly measured using a CNN-based siamese network, which can be used to localize image forgeries. This idea was further refined by our research in [mayer2019similarity], where we proposed a more formal definition of Forensic Similarity, which is the quantifiable measure of similarity of forensic traces related to the source and/or processing between two images patches, and refined the siamese network technique.

Deep learning forensics research by Cozzolino et al. [cozzolino2018noiseprint, cozzolino2018camera] proposed a CNN that transforms an image to highlight artifacts associated with camera-specific traces. Inconsistencies in the resulting fingerprint map were then used to identify forged regions. Huh et al. developed a deep-learning technique to create a “consistency map” for forged images [huh2018forensics]. In this consistency map approach, regions of the image are highlighted which contain predictions of EXIF-based metadata that are inconsistent with the majority of the image. Huh et al. showed that taking spatial average of the consistency map and comparing to a threshold can be used to perform forgery detection.

The work in this paper directly builds from our prior research on Forensic Similarity in [mayer2019similarity]. In our prior work, we showed that tampered regions of an image can be localized by selecting reference patch, and highlighting the patches in the image that contain different forensic traces. Provided a patch in the unaltered region of the image was selected, the tampered regions are highlighted. Furthermore, we showed state-of-the-art forgery detection accuracy by computing the forensic similarity values between all patches, computing the mean of these values, and comparing that mean to a threshold. In unaltered images, this value would be high indicating all patches were forensically similar. Alternatively in tampered images, this value would be low indicating some patches were forensically dissimilar.

There are, however, drawbacks to existing forgery detection and localization approaches using Forensic Similarity. First, the forgery localization approach require a selection of a reference patch in the image. This is problematic since the investigator may not have knowledge of which regions of the image are unaltered, and the poor selection of a patch may lead to erroneous results. Second, the forgery detection approach, which utilizes the mean forensic similarity value of the image, does not adequately consider the complex forensic relationships that occur in a tampered image. For example, a large tampered region will be self-similar within the region, but dissimilar to the unaltered part of the image. This type of relationship is lost by averaging all similarity values. Furthermore, by utilizing the mean similarity value, the detection statistic becomes inherently tied to the size of the forged region. However, it is important to detect small forgeries with high accuracy. This problem also occurs in the forgery detection approach by Huh et al. [huh2018forensics], which utilizes the spatial average of their EXIF consistency map to detect forgeries.

In this paper, we propose techniques that address these drawbacks. To do this, we build a graph-based representation of the image, called the Forensic Graph Representation. This representation uses a graph to capture the forensic relationships among all patches of the image. In images that have been locally tampered, the tampered regions form communities in this graph. As a result, the graph can be analyzed for the existence of multiple communities, indicating tampering, and those communities can be partitioned to localize the tampered regions. By taking this approach, the proposed methods leverage the complex relationships captured by the graph representation to more accurately detect forgeries and localize tampered regions without, addressing the drawbacks of the existing Forensic Similarity approaches in [mayer2018similarity] and [mayer2019similarity].

Ii-B Community Detection in Graphs

A graph is a set of elements, with each element called a “vertex,” with “edges” that connect pairs of vertices. In this work, we propose a graph-based representation of images, with small image patches represented by vertices and edges assigned according to the similarity of their forensic traces. Communities are subsets of vertices with dense edges within their own community [fortunato2010community]. In Sec. III, we will show how to detect and localize forgeries by partitioning structures, called communities, in this graph representation.

To partition communities in general graphs, researchers have developed a number of techniques called “community detection” algorithms [fortunato2010community]. In this paper, we will directly build from two popular and successful techniques, namely Modularity Optimization [girvan2002community]

and Spectral Clustering 

[von2007tutorial]. Modularity is a quality index of how well a graph is partitioned into communities [fortunato2010community]. Modularity Optmization is a family of techniques that find high quality partitions of graphs by optimizing for Modularity. A second technique we build off of is Spectral Clustering [von2007tutorial]

. Spectral Clustering defines a matrix, called the graph Laplacian, whose eigenvector spectrum has unique properties. In particular, the eigenvalues of the graph Laplacian contains clues about the number of communities that exist, and the component values of the eigenvectors can be used to accurately partition the graph in communities 

[von2007tutorial]. Further details of each of these algorithms are described in Sec. IV.

(a) Original Image
(c) Forensic Similarity Matrix
(d) Forensic Similarity Graph with Community Partitions
(b) Edited Image, with
Patch Indices Overlaid
(b) Edited Image, with
Patch Indices Overlaid
Fig. 1: An example of community structure in the forensic similarity graph representation of a forged image. Original image (a) and edited image (b), where water stains were removed using a brush tool, with vertex indices overlaid. Editing credit to user “/u/rombouts.” The forensic similarity matrix in (c) shows the forensic similarity between each sampled image patch. The graph representation in (d) highlights the community structure, showing the patches associated with the tampered region appears as its own in-connected cluster. The graph in (d) is drawn in igraph [igraph], showing only edges and using the Kamada-Kawai force-directed layout method [kamada1989algorithm].

Iii The Forensic Similarity Graph

In this paper, we propose a new graph-based concept called the “Forensic Similarity Graph” of an image. The Forensic Similarity Graph captures important relationships among small regions of an image. These relationships bear evidence of localized tampering (e.g. splicing). Through analysis of this representation, we detect and localize falsified images. In later sections, we experimentally show that by using this proposed technique, we detect forgeries with higher accuracy than with techniques that do not capture such relationships.

In the Forensic Similarity Graph, each image patch is represented by a vertex. Edges are assigned between each pair of vertices with weight specified by the forensic similarity between those two corresponding image patches. In forged images, tampered regions form communities, where edges within the tampered region are densely connected with high weight, and are disconnected (or at least have low weight) to the regions outside of the tampered region. The term “community” aligns with literature on graph-based community detection [fortunato2010community, newman2004finding, girvan2002community], which are used to analyze community structures in other domains, such as social networks [yang2013community] and biological networks [sah2014exploring].

The procedure we propose is as follows:

  1. Sample patches from the image

  2. Calculate forensic similarity between all pairs of sampled patches, according to [mayer2019similarity]

  3. Convert the image into its graph representation, with patches as vertices and edges assigned according to the forensic similarity between patches

  4. Perform forgery detection and/or localization by analyzing community structure in the Forensic Similarity Graph

Here, we more formally describe the construction of the Forensic Similarity Graph. To construct the Forensic Graph of an image, we first sample patches from the image , where is the sampled patch and is the space of image patches. In this paper we use square patches that are regularly spaced and overlapped, typically we use a patch sizes of or and 50% to 75% overlap in both x and y directions. However, we note the proposed technique is general to irregularly shaped patches and arbitrary sampling locations, provided a reliable similarity measurement can be made.

We define the graph with vertex set . and edges which connect each unique pairing of vertices. Edges have weight equal to the forensic similarity between and , given that the similarity is larger than a threshold, and otherwise have weight . That is, the graph has edge weights


where is the edge weight between vertex and vertex , and is the forensic similarity between image patch and image patch . Unless otherwise specified, we use a threshold meaning that the graph is fully connected, with non-zero edges. In some cases, we use a threshold to improve the visual interpretation of the graph, or to improve algorithm performance. Here, we use forensic similarity as defined in [mayer2019similarity], which quantifies the similarity of the forensic traces across two image patches related to the source and/or processing history. A value of 1 indicates high similarity, and a value of 0 indicates dissimilarity. The edge weights create the edge weight matrix, which we refer to as the Forensic Similarity Matrix. In some literature, an edge weight matrix is sometimes referred to as the weighted adjacency matrix [von2007tutorial, fortunato2010community].

Community structures in this Forensic Similarity Graph are then used to identify and localize image forgeries. For example, in an image that has been locally tampered (e.g. a spliced image), an edge between a patch in the tampered region and a patch in the unaltered image region will have no-or-low weight, since they have dissimilar forensic traces. However, edges among patches within a tampered region are densely connected with high weight, provided that the entirety of the tampered region contain the same forensic traces, i.e. undergone the same tampering process (e.g. brush tool, resizing, etc). Additionally, edges among patches in the unaltered part of the image are similarly densely connected with high weight, since all unaltered patches have the same processing history. As a result, this forms subsets of vertices have have high edge-connectivity within the subset, called a community, and low edge-connectivity to other subsets.

In unaltered images, where every patch was captured by the same camera model and has the same processing history, all edges in the graph are expected to have high weight forming only a single community. In addition, images that have been uniformly/globally tampered will also have all edges of high weight, since all patches will have the same processing history. Examples of uniform tampering include global image resizing, JPEG re-compression, and global non-adaptive contrast enhancement.

Detecting community structures in general graphs has been studied in other domains [fortunato2010community, girvan2002community, newman2004finding, sah2014exploring]. In Sec. III, we adapt two community detection techniques from literature, and apply them to the Forensic Similarity Graph to perform forgery detection and localization. Later, in Sec. V and VI, we experimentally show that our proposed community detection based approaches achieve higher forgery detection accuracy than approaches that do not consider community structure.

Iii-1 Example

As an example, in Fig. 1 we show (b) a tampered image, and (a) the corresponding original image. In this tampered image, the person’s shirt has been edited using a brush tool to remove the appearance of rain drops on their shirt. The edited image was downloaded from the social media website, with editing credited to user “/u/rombouts.”

The tampered image is sampled by gridding it into patches, with 50% overlap. An index is assigned to each vertex and are shown in Fig. 1(b) at their corresponding patch locations. The edge weight matrix is shown in Fig. 1(c), where the vertices corresponding to the edited shirt patches, indices = , have low similarity to the rest of the image patches, and high similarity to other edited shirt patches. Similarly, the non-edited patches have low similarity to the edited patches, and high similarity to other non-edited patches. This suggests that in this tampered image example, there are two forensic “communities.”

In Fig. 1(d), the vertex-edge representation of this graph is shown. For visualization purposes, we only draw edges between vertices that have forensic similarity . This visual representation clearly highlights that two communities exist, one associated with the tampered region, and one associated with the non-tampered region. Vertex colors are assigned according to the partitioning determined by the a “modularity optimization” method [newman2004finding], with edges between communities grayed out. The graph is arranged according to the Kamada-Kawai force directed algorithm [kamada1989algorithm]. Layout and partitioning were performed using the igraph software package [igraph].

Iv Community Detection

In the previous section, we introduced the Forensic Similarity Graph of an image and provided intuition about how structures in this representation, called “communities,” indicate that the image has been locally tampered. In this section, we introduce techniques to detect and analyze this community structure in graphs in order to accurately detect forgeries and localize those tampered regions.

Forgery detection and forgery localization are two related, but very different problems in multimedia forensics, and are differentiated in literature [bondi2017cvprw, huh2018forensics]. Forgery detection techniques are used to indicate whether tampering has occurred in the image, and output a binary yes/no decision about the image. In this work, we propose community detection techniques that input a forensic graph and output a forgery detection decision,


We also propose community detection techniques that input a forensic graph, and output a tampering classification for each patch of the image, which is then used to segment the tampered regions of the image,


where is the community membership for the vertex in the graph, i.e. sampled image patch, and is the number of sampled image patches. We note that the nature of this type of localization analysis partitions the tampered region from the unaltered, but does not inherently identify which of the partitioned regions is the tampered one or unaltered one. This is why each vertex/patch is mapped into a community identifier as opposed to . In this paper we focus on the case of , since most evaluation databases only differentiate between the tampered and unaltered regions, and don’t differentiate between different tampered regions, if they exist. However, the techniques we propose are general to . Recent work has highlighted the need for techniques that identify more than one tampered region [hosseini2019unsupervised].

In the remainder of this section, we introduce two community detection techniques from literature, namely Spectral Clustering and Modularity Optimization, and adapt them to the problems of image forgery detection and localization. While there exists many different community structure analysis techniques, spectral clustering and and modularity optimization are two of the most popular and successful techniques [fortunato2010community].

Additionally, we show how to convert the patch-based localization partitions, which result from the community detection methods, into a pixel-level segmentation of forged and unaltered regions. This is done so that the proposed localization methods can be effectively compared to other forgery localization techniques that operate at the pixel-level, such as those in [cozzolino2015splicebuster, bondi2017cvprw, huh2018forensics].

Iv-a Spectral Clustering

In this method, we apply a technique called Spectral Clustering to the forensic similarity graph to detect forgeries and/or localize tampered regions. Spectral clustering is a technique for partitioning graphs into clusters [fortunato2010community, von2007tutorial, chung1996spectral], which leverages properties of the graph Laplacian matrix to determine community structures.

The graph Laplacian matrix, , is defined as


where is the edge weight matrix with elements equal to the forensic similarity between image patches and , and is the degree matrix with values


on the diagonal, and zeroes off-diagonal. Additionally, we define the normalized graph Laplacian


which regularizes the matrix.

The graph Laplacian has a number of special properties. In particular, it has spectrum of non-negative real valued eigenvalues . These eigenvalues a have property such that the multiplicity of

is equal to the number of disconnected communities in the graph. Furthermore, the eigenspace of

is spanned by the indicator vectors

of those k communities, where is the vertex membership of the community. For the normalized graph Laplacian, it this eigenspace spanned by  [von2007tutorial, chung1996spectral]. We use these properties for detecting forgeries and localizing the forged regions, described further in detail below.

(a) Original

(c) Original

(d) Edited
(b) Edited
(b) Edited
Fig. 2: Forgery detection scores on example original and manipulated images. Low values () and high values () indicate localized tampering.

Iv-A1 Forgery Detection

Here we use the fact that the multiplicity of is equal to the number of disconnected communities in the graph to decide whether an image has been forged. Typically, this multiplicity is used to determine the number of communities in graph by determining the number of consecutive eigenvalues less than a threshold, and is sometimes called the “Eigengap” heuristic [von2007tutorial]. However, the problem of forgery detection is different in that we need to know only whether the image is forged or unaltered. In other words, we are concerned only if there exists a single community (unaltered) or, alternatively, more than one community (localized tampered).

To do this, in our approach we calculate the second smallest eigenvalue . If this value is low, it is indication that at least two communities exist, and potentially more. If it is high, then it is an indication only one community exists, and the image has not been altered. As a result, our forgery detection decision rule becomes


where is the forensic similarity graph of the image, is second smallest eigenvalue of the graph Laplacian from Eq. (5), and is the decision threshold determined empirically depending on the desired operating point.

Examples of the spectral gap are shown in Fig. 2 for unaltered and forged images. In the two unaltered images(a) and (c) the spectral gap values of 318.42 and 235.56 are high, indicating that only one forensic community exists and no tampering has occurred. In the two spliced images (b) and (d) the spectral gap values of 76.75 and 4.28 are low, indicating that more than one forensic community exists and the tampering has occurred.

Iv-A2 Forgery Localization

To perform localization, we use the fact that the eigenspace of of the graph Laplacian is spanned by the indicator vectors to partition the graph vertices into forged and unaltered regions. In this paper we consider the case of , in order to partition the tampered versus unaltered regions. To do this, we first calculate the eigenvector associated with the eigenvector


where is the second smallest eigenvalue of the graph Laplacian matrix.

Next, we use the sign of each component of to assign each vertex into a community membership,


where is the predicted community membership of the graph vertex / image patch. This method approximates the (NP-Hard) Ratio cut algorithm for . When using the normalized graph Laplacian, it approximates the Normalized Cut algorithm [von2007tutorial].

An example of this algorithm is shown in Fig. 3, where we partition the image patches from the spliced image in Fig. 2(d) into two communities. In this figure, we show the histogram of values of . The components associated with patches in the tampered region are shown in blue, and are typically above 0. The components associated with patches in the unaltered regions are shown in orange, and are typically below 0. Ground truth was determined based on the central location of each image patch.

We note that this algorithm can be easily extended to consider . This is done by considering the eigenvector values for each vertex as a point in space and performing k-means clustering [von2007tutorial].

Fig. 3: Component values of for the edited image in Fig. 2(d). In this example, components of corresponding to the forged patches are greater than 0, and unaltered patches are less than 0.

Iv-B Modularity Optimization

Here, we apply a 2nd community detection technique for forgery detection and localization. In this method, we use a technique called Modularity Optimization to the Forensic Similarity Graph to detect and localize forgeries. Modularity, , i an index for the quality of how well a graph has been partitioned into communities [fortunato2010community], defined as:


where is the sum of edges in the graph, is the edge weight matrix, is the weighted degree of node , and the indicator function is 1 when both node and are assigned to be members of the the same community. The term is related to the expected weight of a randomly occurring connection between vertices and . Intuitively, if two vertices in the same community have a low connection expectation, but have a high weight between them, then modularity is increased.

Modularity Optimization is a family of techniques used to determine the underlying community memberships in a graph. These techniques operate by assign vertices to communities such that modularity is maximized,


Higher values of modularity indicate that community structure exists, and that a good partitioning has been found [fortunato2010community, newman2004finding, girvan2002community]. Modularity in the case of a single community is zero.

To calculate, we use a popular modularity optimization technique called the “fast-greedy” method, which agglomeratively forms communities by iteratively combining communities which yield the greatest modularity increase [newman2004fast, clauset2004finding], and reports the largest modularity value found. While there are a number of variations of modularity optimization techniques, the fast-greedy method is a popular technique [fortunato2010community] and operates on a reasonable timescale on our forensic graphs. Furthermore, we typically use a thresholded edge weight matrix (1), often with a threshold , for modularity optimization techniques. Using a thresholded edge weight matrix significantly reduces the complexity of the optimization, and we have anecdotally found that doing so produces more reliable modularity scores.

Iv-B1 Forgery Detection

To detect whether the image has been locally tampered, we compare the optimized modularity value, , found for the optimal partition to a threshold. If exceeds the threshold, it indicates that there is strong community structure and, as a result, indicates that one or more regions of the image has been tampered with. Our decision measure becomes


where is the forensic similarity graph of the image, is the optimized modularity value found on , and is the decision threshold, chosen empirically.

Iv-B2 Forgery Localization

Here, we localize tampered regions by using the optimized community memberships, , to assign each patch into a community,


where is the optimized community membership of the vertex/image patch determined by the modularity optimization algorithm. As with the Spectral Clustering method, we use to segment the unaltered region from the forged region(s). Though we note that this method is trivially extendable to .

As an example, the partitioning into the green and red coloring of Fig. 1 was determined by the Modularity Optimization method, using an edge weighting threshold of and .

Iv-C Pixel-level localization

The above localization techniques partition image patches into forged and unaltered regions. However, localization is often performed at the pixel level [huh2018forensics, bondi2017cvprw, cozzolino2015splicebuster]. Here, we describe how to convert patch-level community partitions to a pixel-level forgery localization.

We start by building a pixel map of localization predictions with dimensions in x and y that are the same as the image,


where is the predicted community membership for patch , and is the set of coordinates covered by patch . That is, for each pixel we sum the number of patches that contain that pixel and are also predicted to be in community . The investigator can vary to inspect the various communities. For , we choose to be the smaller of the two communities, since the largest community is likely to be the unaltered region.

Pixels near the edges and corners of an image are covered by less patches, and are more difficult to accurately predict. To address this, we create a normalized pixel map


where is the map of total patches that cover each pixel defined by


Finally, we smooth the normalized pixel map, using a Gaussian blur kernel, and compare to a threshold, chosen empirically.

In Fig. 4, we show an example. Fig. 4a shows an edited image from the Carvalho database [carvalho2013exposing] and Fig. 4b shows the ground truth mask, with the edited region highlighted in black. Fig. 4c shows the patch-level prediction map, determined using the Spectral Clustering method. Since patches were sampled from image with 50% overlap, pixels are covered by 1 to 4 patches, as shown in Fig. 4d. Fig. 4e shows the normalized prediction map, in which we can see more confident forgery localization along the corner and edges of the image. Finally, Fig. 4f shows the smoothed and thresholded version of the normalized prediction map, in which the tampered region is localized and matches the ground truth.

(a) Edited Image
(d) Patch Coverage Map
(b) Ground Truth
(b) Ground Truth
(f) Smoothing + Threshold
(c) Prediction Map
(c) Prediction Map
(e) Normalized Prediction Map
Fig. 4: An example showing the conversion from patch-level community partitions to pixel-level forgery localization, described in Sec. IV-C.

V Experimental Results: Forgery Detection

(a) Columbia [hsu06crfcheck]
(b) Carvalho [carvalho2013exposing]
(c) Korus [Korus2017TIFS]
Fig. 5: Example forgeries from the benchmark datasets.
(a) Columbia (Low PFA)
(c) Carvalho
(d) Korus
(b) Columbia
(b) Columbia
Fig. 6: Forgery detection ROC curves on benchmark datasets

We conducted a series of experiments to test the efficacy of our proposed approach for forgery detection and localization. In this section, we describe the experimental procedures and results for forgery detection. In Sec. VI, we conduct experiments to test forgery localization performance. In general, we found that the proposed Spectral Gap technique achieved highest forgery detection performance, and outperformed naive methods that do not include community structure.

We performed two sets of experiments for forgery detection. In the first experiment, we evaluated performance on three publicly available benchmark datasets, which contain unaltered and tampered images. In the second experiment, we evaluated performance on “synthetic” forgeries, which were created by copying and pasting a block from one image into another. While not realistic in appearance, the synthetic forgery experiment shows statistical characterizations of the system, controlling for the size of the forged area.

In these experiments we compared our proposed community detection techniques to several approaches. First, we compare to two naive approaches that utilize the same forensic similarity procedure of our proposed approach but do not consider community structure. These are 1) “Mean Similarity” which is the average forensic similarity edge weight as a detection measure and used in prior work [mayer2019similarity], and 2) “Minimum Similarity” using the minimum forensic similarity edge weight as a detection measure. A significant contribution of the Forensic Similarity Graph is that it captures complex relationships among regions of the image, in the form of communities, which enables more powerful analysis. The following experiments show that the proposed community detection based approaches outperform these naive approaches. These results highlight that identifying community structure is critical to accurately detecting image forgeries.

In addition, we compared against several state-of-the-art methods. We compared against the work by Huh et al. [huh2018forensics], which computes a deep-learning based self-consistency map and detects image forgeries by spatially averaging this map, and comparing to a threshold. They compute three types of self-consistency maps based on camera-model consistency, image consistency, or EXIF consistency. Our approach is most comparable to the “Camera” approach in [huh2018forensics], since forensic similarity network we used was trained on patches labeled according to their source camera model. We also compared to the work by Bondi et al. [bondi2017cvprw], in which an 18-elements feature vector is extracted with a CNN trained for camera model identification on 64 x 64 pixels patches. A simple clustering procedure is iteratively applied to separate the background from the foreground.

In these experiments, we calculated the forensic similarity of image patches using the deep-learning approach described in [mayer2019similarity], trained on four million image patches from 80 camera models. Two networks were tested, one with forensic patch size 128128, and a second with size 256256. Patches were sampled spanning the image with 50% overlap. The forensic similarity graph was constructed for each testing image as described in Sec. III. Finally a forgery decision was rendered based on the proposed detection methods in Sec. IV.

Columbia Carvalho Korus
Bondi et al. [bondi2017cvprw] 0.70 0.76 0.53
Huh ”Camera” [huh2018forensics] 0.70 0.73 0.15
Huh ”Image” [huh2018forensics] 0.97 0.75 0.58
Huh ”EXIF” [huh2018forensics] 0.98 0.87 0.55
Patch Size
Spectral Gap 128 0.95 0.95 0.65
256 0.94 0.97 0.59
Modularity Opt. 128 0.95 0.90 0.60
256 0.92 0.95 0.57
Min. Sim. 128 0.94 0.92 0.65
256 0.89 0.96 0.59
Mean Sim. 128 0.95 0.91 0.60
256 0.92 0.95 0.57
TABLE I: AP on Benchmarking Databases
Columbia Carvalho Korus
Bondi et al. [bondi2017cvprw] 0.07 0.03 0.01
Huh ”EXIF” [huh2018forensics] 0.51 0.17 0.01
Patch Size
Spectral Gap 128 0.56 0.43 0.16
256 0.60 0.80 0.05
Modularity Opt. 128 0.46 0.41 0.02
256 0.64 0.38 0.03
Min. Sim. 128 0.39 0.49 0.11
256 0.36 0.75 0.07
Mean Sim. 128 0.23 0.46 0.01
256 0.45 0.51 0.01
TABLE II: at on Benchmarking Databases
(a) ROC, Forgery Size
(b) at by Forgery Size
(c) AUC by Relative Forgery Size
Fig. 7: Synthetic forgery detection results. The plot (a) shows ROC curves for proposed and naive methods in synthetic forgeries with forgery block size of 256

256. Plot (b) shows the forgery detection probability for each method for different forgery block sizes. Plot (c) shows the ROC area-under-the-curve at challenging forgery block sizes using the spectral gap method with two different forensic patch sizes. In (c) forgery block size is displayed as relative area to the forensic patch size.

V-a Performance on Benchmark Datasets

In this experiment, we evaluated the performance of our approach on publicly-available benchmarking datasets. These databases are 1) the “Columbia Uncompressed Splicing Database” [hsu06crfcheck] containing 180 spliced and 183 authentic tiff images ranging from sizes to , 2) the “Carvalho DSO-1 Database” [carvalho2013exposing] containing 100 spliced and 100 authentic png images of size , and 3) the “Korus Realistic Tampering Dataset” [Korus2017TIFS] containing 220 tampered and 220 corresponding original tiff images of size . Examples of spliced images from each database are shown in Fig.5.

Fig. 6 shows receiver operating characteristic (ROC) curves for our proposed approaches, as well as the methods we compare against. The ROC curve shows the trade off of the probability of correct detection () of a forged image versus the probability of false alarm (), i.e. misclassification, of an unaltered image. In the majority of cases, our proposed approaches achieved highest performance, with the Spectral Gap method achieving highest detection rates. We note that the Huh et al. EXIF method achieves higher detection rates at high false alarm rates in the Columbia dataset, but was outperformed by our approaches at low false alarm rates. This is important because in many realistic conditions, forensic investigators must operate at low false alarm rates. This is because scenarios such as criminal or intelligence investigations, there can be a high cost to misidentifying an unaltered image as falsified.

Table I shows the mean-Average-Precision (mAP) scores for our proposed approaches as well as the methods we compared against. The mAP measure was chosen to compare to the reported scores in [huh2018forensics]. All of our proposed approaches achieved higher mAP scores than the exisitng Bondi et al., and Huh “Camera” approaches. The Huh “EXIF” method achieved a higher mAP score of 0.98 in the Columbia dataset. However, our proposed Spectral Gap method outperformed it on the more challenging Carvalho and Korus datasets, with mAP scores of 0.97 using a 256256 forensic patch size on Carvalho, and 0.65 using a 128128 forensic patch size on the Korus dataset.

Table II shows the detection rate () at the low false alarm rate of 0.01 for the tested approaches. Operating at low false alarm rates are important for investigators, due to the high cost of falsely identifying authentic content as tampered, and due to the high volume of unaltered content that they encounter. For the Columbia database, the Modularity Optimization approach with a patch size achieved highest detection rate of 0.64. The Spectral Gap approach achieved highest rates of 0.80 on the Carvalho dataset with a size of , and 0.16 on the Korus dataset with a size of .

Relative to the naive approaches, the Spectral Gap tends to outperform both of the “Mean” and “Minimum” in terms of mAP. These improvements are more noticeable in at low rates. This finding shows that detecting community structure is important for detecting image forgeries, especially at low false alarm rates.

The Modularity Optimization approach performed very well on the Columbia database, achieving the highest of 0.64 at . However, performance degraded significantly in the other databases. One possible reason behind this is that modularity optimization techniques are known to have difficulties detecting communities that are small relative to the size of the graph [fortunato2007resolution]. The forged regions in Columbia dataset are much larger relative to the size of the image than in the Carvalho and Korus datasets.

(a) (b) (c) (d) (e) (f) (g) (h) (i)
Fig. 8: Localization examples from the benchmarking datasets. Localization is performed using comparison methods, and our proposed Spectral Clustering method. Example images are from the (a) Columbia, (b-e) Carvalho, and (f-i) Korus datasets.

The results of this experiment show that our proposed technique exceeds prior-art forgery detection performance on the three tested benchmark datasets, and significantly outperforms these approaches in challenging datasets. The experiment also shows that significant benefit of using community-structure aware approaches was achieved at low false alarm rates.

V-B Characterizations on Synthetic Forgeries

In the above experiment, we observed that some forgeries were more challenging to detect than others. In this experiment, we studied the impact of the tampered region size on forgery detection performance. To do this, we started with a database of images from the VISION database [shullani2017vision], using the “natural” images from the 35 cameras in the database. This database was chosen since the camera models in the database were different from those used to train our proposed system, representing a more practical scenario. We then created synthetic forgeries by copying a small-block from one image into another at random locations. We refer to the size of this block as the forgery block size. We tested detection rates at different forgery block sizes, and for a given forgery block size, we created 1000 such forgeries. To control for the size of the image, we ensured that the host image was of size 3264 2448, the most common size in the dataset. For unaltered images, we used 1000 images randomly chosen of size 32642448. To ensure an appropriate similarity signal was present, the donor and host images were captured by different camera models. We then calculated the proposed detection measures on each forged and authentic image to evaluate performance.

Fig. 7 shows forgery detection performance for the two community approaches and two naive approaches, using an input patch size of 256256. Fig. 7(a) shows ROC curves for these approaches at the forgery block size. In this case, the Spectral Gap and Minimum Similarity methods demonstrated improved performance over the Modularity Optimization and Mean Similarity approaches. Notably, the Spectral Gap showed significant improvement over the community-naive Minimum Similarity method at low false alarm rates between 0 and 0.10.

Fig. 7(b) shows the positive detection rates of the four approaches at a false alarm rate of 0.05. In general, all approaches performed poorly at the smallest forgery size of , corresponding to the size of the CNN input, and improved with forgery block size. All approaches achieved with the forgery block size of . This experiment demonstrates that forgery size has a significant effect on detection performance. In particular, it impacted the Modularity Optimization method. Modularity optimization methods are well known to have difficulty in detecting communities much smaller than the size of the graph [fortunato2007resolution]. The largest forgery block size of is approximately 13% of the image size, whereas is less than 1% of the image size. Further experimentation on the impact of forgery size relative to the image size may lend further insight on this finding.

We further experimented using the Spectral Gap method at more challenging forgery block sizes. Fig. 7(c) shows the ROC area-under-the-curve (AUC) of forgery block sizes from to , for forensic patch sizes of and . The Spectral Gap method achieved an AUC for forgery block sizes above , when using an input forensic patch size of , and achieved an AUC for forgery block sizes above , when using an input forensic patch size of . We normalized the x-axis of the result relative to the forgery size, equal to the area of the forgery size divided by the area of the analysis patch size. We see that Spectral Gap performance steeply improves for forgery sizes larger than the analysis patch size.

The results of this experiment highlight strong dependence of the proposed approaches on the size of the tampered region. The Spectral Gap method performed well when forgery sizes were equal to or greater than the analysis patch size. At challenging forgery sizes, the Spectral Gap method showed performance improvements over the naive Minimum Similarity approach, demonstrating the importance of identifying community structures captured by the Forensic Similarity Graph. The Modularity Optimization method worked well only when the forgery size was large, demonstrating a dependence on size of the tampered region relative to the size of whole image for this approach.

Vi Experimental Results: Forgery Localization

In this set of experiments, we tested the efficacy of our proposed forgery localization approaches. To do this, we performed forgery localization on tampered images from the three publicly available datasets used in the prior experiments. We evaluated forgery localizations using a variety of scoring measures, and compared to existing-art methods. The methods we compare to include localization methods by Bondi et al. [bondi2017cvprw] and Huh et al. [huh2018forensics], which were compared against in the previous forgery detection experiments. In addition, we compared to two localization-only techniques, the rich-model based SpliceBuster [cozzolino2015splicebuster] and deep-learning based NoisePrint [cozzolino2018noiseprint] methods by Cozzolino et al. The results of our experiments shows that our proposed approaches outperform these existing-art methods in the majority of scenarios.

To construct the forensic similarity graph for each forged image, we first sampled the image using patches of size , with 75% overlap in each dimension. The smaller patch size and higher overlap was chosen to increase spatial resolution. Then, we calculated the forensic similarity between image patches using the deep-learning approach described in [mayer2019similarity], trained on four million image patches from 80 camera models. The forensic similarity graph was constructed for each forged image as described in Sec. III. Then, forgery localization was performed according to the proposed methods in Sec. IV. For the Spectral Clustering methods, we tested both cases with the non-normalized Laplacian, and the normalized Laplacian. For the modularity optimization approach, we used a edge threshold . Finally, the pixel-level forgery prediction map was created according to the method in Sec. IV-C using a Gaussian smoothing function with window size of .

(a) (b) (c) (d)
Fig. 9: Spectral clustering localization examples on images downloaded from the social media website Image editing credit to Reddit users “artunitinc” (a), “ene_due_rabe” (b-c), and “Hordon_Gayward” (d).

Fig. 8 shows un-thresholded localization examples on tampered images from the benchmark databases. From left to right, we show the edited image, the ground truth mask, and the localization results from the SpliceBuster algorithm [cozzolino2015splicebuster], the Huh et al. algorithm [huh2018forensics], the Noiseprint algorithm [cozzolino2018noiseprint], and our proposed Spectral Clustering algorithm. Row (a) corresponds to an example tampered image from the Columbia dataset, rows (b)-(e) correspond to tampered images from the Carvalho dataset, and rows (f)-(i) correspond to tampered images from the Korus dataset. These examples show the power of our proposed approach. In many cases, our proposed approach accurately localized the tampered region, even in scenarios where the other algorithms had difficulty such as in the Korus dataset.

In addition, in Fig. 9 we show localization results from images downloaded from a photo-manipulating competition forum on, a social media website. These images contain highly varying and complex tampering techniques. Still, our proposed Spectral Clustering localization algorithm accurately localized the tampered regions of the images. These examples show that our proposed technique is effective on realistically tampered images, including those downloaded from social media websites.

To quantify localization performance, we used several scoring measures. Namely, we used Matthews Correlation Coefficient111 (MCC), the score222, and the Area Under the Curve (AUC) of the reciever operatoring characteristic curve (ROC). The MCC and scores require a choice of threshold, we evaluated both cases where the threshold was chosen for each image, according to the approach in [huh2018forensics], and where the threshold was chosen for the entire database. Setting a single threshold for the entire database is a more realistic scenario, since the forensic investigator does not have ground truth data available for each image under investigation and must choose a threshold using outside information. The MCC and scores were calculated for each image and then averaged for each benchmark database, the AUC score was calculated using the ROC determined by all pixels in the database.

Columbia Carvalho Korus
Splice Buster [cozzolino2015splicebuster] 0.68 0.56 0.34
Bondi et al. [bondi2017cvprw] 0.50 0.35 0.22
Huh et al. EXIF [huh2018forensics] 0.87 0.46 0.23
NoisePrint [cozzolino2018noiseprint] 0.74 0.72 0.31
Modularity Opt. (t=0.7) 0.85 0.72 0.26
Spectral Clustering 0.86 0.80 0.38
Normed Spectral Clust. 0.84 0.77 0.25
TABLE III: MCC, per image threshold
Columbia Carvalho Korus
Splice Buster [cozzolino2015splicebuster] 0.63 0.53 0.28
Bondi et al. [bondi2017cvprw] 0.45 0.29 0.14
Huh et al. “EXIF” [huh2018forensics] 0.78 0.38 0.17
NoisePrint [cozzolino2018noiseprint] 0.68 0.67 0.24
Modularity Opt. (t=0.7) 0.80 0.65 0.19
Spectral Clustering 0.82 0.75 0.31
Normed Spectral Clust. 0.79 0.70 0.20
TABLE IV: MCC, per database threshold
Columbia Carvalho Korus
Splice Buster [cozzolino2015splicebuster] 0.78 0.62 0.36
Bondi et al. [bondi2017cvprw] 0.64 0.44 0.24
Huh et al. “EXIF” [huh2018forensics] 0.90 0.53 0.25
NoisePrint [cozzolino2018noiseprint] 0.81 0.75 0.32
Modularity Opt. (t=0.7) 0.88 0.73 0.26
Spectral Clustering. 0.89 0.82 0.39
Normed Spectral Clust. 0.88 0.78 0.26
TABLE V: F1, per image threshold
Columbia Carvalho Korus
Splice Buster [cozzolino2015splicebuster] 0.74 0.59 0.30
Bondi et al. [bondi2017cvprw] 0.59 0.39 0.18
Huh et al. “EXIF” [huh2018forensics] 0.82 0.45 0.20
NoisePrint [cozzolino2018noiseprint] 0.76 0.70 0.26
Modularity Opt. (t=0.7) 0.83 0.66 0.20
Spectral Clustering 0.86 0.77 0.32
Normed Spectral Clust. 0.83 0.70 0.20
TABLE VI: F1, per database threshold
Columbia Carvalho Korus
Splice Buster [cozzolino2015splicebuster] 0.79 0.68 0.56
Bondi et al. [bondi2017cvprw] 0.77 0.64 0.60
Huh et al. “EXIF” [huh2018forensics] 0.96 0.72 0.57
NoisePrint [cozzolino2018noiseprint] 0.85 0.76 0.58
Modularity Opt. (t=0.7) 0.95 0.94 0.73
Spectral Clustering 0.93 0.89 0.60
Normed Spectral Clust. 0.95 0.97 0.72
TABLE VII: Area Under the Curve (AUC)

Table III shows MCC scores for our proposed and comparison approaches on the three benchmark datasets, using per-image thresholds, meaning that a different threshold was chosen for each image. For the Columbia dataset, our proposed methods achieved high scores of 0.84 and above, with the Spectral Clustering method performing the highest of 0.86 among the proposed methods, and just under the score achieved by Huh et al. For the more challenging Carvalho dataset, all three of our proposed methods outperformed the comparison methods. The Spectral Clustering algorithm achieved an MCC score of 0.80, which is significantly higher than the next highest comparison method. For the most challenging Korus dataset, our proposed spectral clustering achieved the highest MCC score of 0.38.

Table IV shows MCC scores using per-database thresholds, meaning that a single threshold was used for each database and is a more practical scenario. In all three benchmark datasets, our proposed Spectral Clustering algorithm achieved the highest scores. In the Columbia dataset, all three of our proposed methods outperform the Huh et al. method, suggesting that our proposed technique is more consistent and does not require threshold tuning for each image.

Similar trends were found for the scores, which are shown with per-image thresholds in Table V and with per-database thresholds in Table VI. The proposed Spectral Clustering method achieved highest scores in the scenario where a single threshold was chosen for each database. For the per-image scenario, Spectral Clustering method achieved highest scores in the more challenging Carvalho and Korus datasets, and nearly achieved the highest score of 0.90 by the Huh et al. method.

Table VII shows the area under the curve (AUC) scores for each method. In the Columbia dataset, both the Modularity Optimization and Normalized Spectral Clustering method achieved high AUC scores of 0.95. The highest score of 0.96 was achieved by the Huh et al. method. All three of the proposed methods outperform all comparison methods in the more challenging Carvalho and Korus datasets. Notably, the Normed Spectral Clustering method achieved a 0.97 AUC on the Carvalho dataset, the next highest score by a comparison method was 0.72. The Modularity Optimization method achieved a AUC of 0.73 on the Korus dataset, and the next highest score by a comparison method was 0.60. While the Spectral Clustering method does not achieve the highest AUC in the Carvalho and Korus datasets, it does outperform prior art.

The results of this experiment shows that our proposed community detection techniques, and in particular the Spectral Clustering method, consistently outperforms existing art demonstrating the power of our proposed Forensic Similarity Graph based approach. This was shown on several benchmarking forgery databases with a variety of scoring measures. These and the forgery detection experiments highlight the importance of considering community structures when analyzing image forgeries.

Vii Conclusion

We proposed an abstract, graph-based representation of an image, which we call the Forensic Similarity Graph. This representation can be used to analyze images for evidence localized tampering with high accuracy. Small image patches are represented by graph vertices with edges assigned according to their forensic similarity. Tampered and unaltered regions form unique structures that align with a concept called “communities” in graph-theory literature. As a result, forgery detection is performed by identifying multiple communities, and forgery localization is performed by partitioning these communities. We experimentally showed that this approach signficantly outperforms naive implementations that do not consider this community structure, including prior art.