Integrity researchers have been reporting a threat to scientific image integrity for a long time[12, 18, 10, 3]. This improbity has achieved even areas like cancer , or more dramatically used on ’paper mills’ services .
Although this serious problem has been an urge in the scientific community, to the best of our knowledge, few forensics works are dedicated to this topic. So far, no rich annotated forensic benchmark containing scientific tampering images was published. We believe that a large dataset would foster the forensic community to work more actively in this subject and assist state-of-the-art forensic techniques that might require large training datasets.
In this sense, we tried to collect known doctored scientific images, but we faced two main issues that make us avoid these methodology: legal and practical. To publish a dataset with real tampering cases, we would have to face copyrights and legal aspects of pointing third-party works that were retracted due to suspicious manipulation. Even if we decided to manage this legal aspect, we had to be guided by a retraction notice relative to the issued images. However, after reading several retraction notices, we realized that many of them are not precise enough to pinpoint the issued region’s at a pixel level, which would not lead to an accurate ground-truth. The example of a real retraction notice due to an honest error, depicted in Fig. 1, shows this inaccuracy, in which the highlighted words ‘some lanes’ and ‘not the appropriate ones’ translate to an ambiguous region and cause.
On the other hand, we notice that a representative number of retracted images were due to duplication and basic image processing operations that could be automatically created. Therefore, this work presents the RECOD Scientific Image Integrity Library (RSIIL) that enables creating a synthetic scientific image tampering dataset with enriched pixel-wise ground-truth and without any associated legal issue. With this library, we created the RECOD Scientific Image Integrity Dataset (RSIID) with the most common image operations reported by scientific integrity researchers [18, 3]. To create this benchmark, we doctored 2,923 figures from creative common sources resulting in 39,423 tampered figures (26,496 for training and 12,927 for testing). In addition, we propose a new metric to evaluate copy-move forgery detection dedicated to scientific images using an enriched ground-truth map, that assert a consistent detection match between the cloned region and its source. Finally, using this new dataset and metric, we evaluate the performance of the state-of-the-art copy-move forgery detection [7, 6, 21], establishing a baseline and setting the ground for any future investigation.
We organize the remaining of this paper into seven sections: Section II presents related work while Section III details the proposed library, RSIIL. Section IV presents the dataset RSIID while Section V
brings out a new evaluation metric aimed at a more consistent copy-move detection evaluation. SectionVI presents an analysis of state-of-the-art copy-move forgery detectors on the proposed dataset setting the ground for future research while Section VII presents the conclusions and future work directions.
Ii Related Work
To the best of our knowledge, few works try to design a tampering benchmark focused on scientific images. So far, we were only able to find two works that address scientific integrity image datasets. The first one is from Xiang and Acuna , which created a synthetic tampering dataset of scientific images from the web. They doctored microscopy and western blot images using three types of manipulations that they claim to be the common cause of problems in scientific papers: cleaning of an image region with a single color or noise (Cleaning); copying an alien content region into the image (Splicing); and applying visual adjustments in the image content (Retouching). Their dataset contains 747 manually manipulated scientific images, of which 616 are dedicated to Removal. As we were only able to find the pre-print version of , we could not find any released data. Due to this, the quality of their manipulations and the dataset license is still unclear. Despite the authors manually constructed the dataset to create a more realistic scenario, their dataset is still limited to a small size that might not represent the diversity of scientific images. Besides this, the dataset is highly concentrated on Cleaning, preventing one to properly evaluate the robustness of a forensic method among all modalities.
The second one is the work of Koker et al. , named as Bio-Image Near-Duplicate Examples Repository (BINDER), which have the pioneering idea of using legal issue-less scientific images for an integrity dataset. This dataset is limited to finding near-duplicate images, aiming to find image re-use across scientific publications. Their dataset has 10,179 non-overlapping patches tiled in or pixels. To create their dataset, they gathered microscopy images from the following public repositories: NYU Mouse Embryo Tracking Database111http://celltracking.bio.nyu.edu (Last access May, 2021 (METD), the Broad Bioimage Benchmark Collection222https://bbbc.broadinstitute.org (Last access May, 2021) (BBBC), the Adiposoft Image Dataset333https://imagej.net/Adiposoft (Last access May, 2021) (AID), and the Open Microscopy Image Data Resource444https://idr.openmicroscopy.org (Last access May, 2021) (IDR). Besides, they also applied some geometric, brightness/contrast, and compression transformations on some images. However, their dataset is still not as realistic as the figures presented in scientific publications. Despite scientific images often embed graphical elements and captions, hampering to detect re-use, the authors did not add these elements to the images. In addition, they did not apply any local tampering (region-level), which is also a typical manipulation in inappropriate image re-use .
In addition to these works, we also found two scientific integrity initiatives that collect real cases of retracted papers. The first is the Retraction Watch Database555http://retractiondatabase.org (Last access May, 2021) maintained by the Retraction Watch666A non-profit organization affiliated with the Center for Scientific Integrity and dedicated to report and discuss cases of retracted papers and related issues.. This database has more than 20,000 metadata of retracted, corrected, or concerned papers. The metadata presents the paper’s title, retraction reason, authors, and Digital Object Identifier (DOI), among other fields. Although this database is not dedicated to image integrity issues, it is possible to filter the retracted papers to this category. However, only the paper’s metadata will be retrieved – due to legal aspects; it is not possible to retrieve the articles PDF, Figures, or Retraction notice –, which is a drawback of this database.
The second is the HEADT Centre Image Integrity Database777https://headt.eu/Image-Integrity-Database (Last access May, 2021), an initiative focused on researchers and developers working on scientific image manipulation detection. Their database contains more than 500 images’ metadata from retracted papers due to image manipulation. In addition to the basic information of a paper (Title, Authors, Publisher, Journal), they also added a text description of each manipulation, including the figure’s panel in which the manipulation occurs and its category (e.g., copy-move). This text description is based on the retraction notice associated with the figure; therefore, some text also presents ambiguity as depicted in the Figure 1. Despite this text description, we could not find any manipulation map at pixel level in this dataset, which we believe is needed to evaluate a detection method properly.
Iii RECOD Scientific Image Integrity Library - RSIIL
Before working with synthetic data, we tried to gather real-world problematic scientific images. To avoid any bias from our side, we relied upon retracted papers due to image problems given that they have a retraction notice resultant of an integrity investigation. However, to publish an accessible benchmark for forensic research, we possibly would have to deal with some legal aspects (e.g., figure copyright and causing possible defamation to someone).
As reported by Adam Marcus , a retracted paper could make their authors feel their reputation harmed and make them sue journals for defamation. Azoulay et al.  also indicate that retraction due to misconduct ––which are the most important papers to be included in a forensic benchmark— has a significant reputation penalty to their authors. Even co-authors that might not be involved in the image manipulation, who already suffered severe consequences , could be affected by such benchmark since it would promote their association with the retracted paper.
Besides this legal aspect, we also faced some practical issues regarding data annotation. When manually annotating the problematic figures’ regions following their retraction notice, we experienced an absence of standard, including vague sentences (as illustrated in Figure 1), resulting on unreliable ground-truths.
Because of these issues, we decide to avoid using real-world scientific problematic image and create a photorealistic dataset using the library introduced in this section. Thus, this Section presents the types of forgeries implemented in the library (Sub-Section A), explains how the library mimics realistic figures as they usually are presented in scientific documents (Sub-Section B), and addresses the manipulation ground-truth (Sub-Section C). Finally, the section also discusses how the proposed library is amenable to extensions of new image manipulations types (Sub-Section D).
Iii-a Library functionalities
The goal of the library is to implement the most common image manipulations reported in the scientific community. Although we are aware of the possibilities of more complexity tools for image manipulation, for example, the creation of scientific images using artificial intelligence (AI) algorithms
, we suspect that these tools are not vastly used yet due to their complexities. Therefore, while designing the library, we adopt the forgery function based on the most common image processing operation accessible for a non-expert in AI or Computer Vision. We also design each block of the library to allow it to be extended to other more complex operations in the future.
Following the research from Bik et al.  and Rossner and Yamada , we selected three main types of manipulations that can be recreated using common image processing software (e.g., Adobe Photoshop):
Retouching: The process of image beautification leading to an experiment misreading. This modality implements contrast, brightness, and blurring adjustments that highlight or obfuscate an image region. Figure 2 depicts an image that we applied retouching with our library. In Figure (b)b, we used a Gaussian filter within the selected objects to obfuscate its content. Figure (e)e illustrates an image with contrast and brightness adjustment, in which the method changes the selected object pixels intensity to cause an experimental misreading.
Fig. 2: Example of image Retouching forgery implemented in the library. (a) and (d) are original images without any manipulation; (b) is the manipulated version of (a) using blurring retouching; (e) is the brightness/contrast manipulated version of (d); (c) and (f) are the ground-truth map that indicate the manipulated regions of (b) and (e).
Cleaning: The result of obfuscating a foreground object using a background region. For this modality, we use inpainting and a brute-force routine. For the inpainting, we use the method of Criminisi et al.  implemented by Moura.888Code available at https://github.com/igorcmoura/inpaint-object-remover. (Last access March, 2021) For the brute-force routine, we develop an in-house method to mimic the forgery procedure of a person seeking to cover an object using the background. To implement this routine, we select a foreground object ; then, using brute force, we fit on a background region that has the most similar color histogram of this object; finally, we copy into and blur the border of , smoothing (feather edges) the difference from the copied and the neighborhood of . Figure (b)b depicts the result of inpainting on the top-right cell of the image, and Figure (e)e depicts the result of the brute-force routine.
Fig. 3: Example of image Cleaning forgery implemented in the library. (a)-(c) depict the inpainting method of  added to the library. (d)-(g) depict the Brute-Force cleaning routine. (g) indicates the background regions of (d) selected to cover (clean) the cells indicated by (f). Each color in (f) and (g) represent a different ID that helps to track the regions involved in the forgery.
Duplication: The action of copying and pasting a region of an image within the same or another image, using or not post-processing operations. Note that this definition includes both copy-move and splicing. We organized this category into three sub-categories:
Copy-Move Forgery: Duplication of a region within the same image using geometric transformations (translation, rotation, flip, and scaling) and post-processing (e.g., retouching). All transformations can be combined with another. Due to the intrinsic result of scaling, we always combined it with another operation, otherwise it would cover the source object region. Besides these transformations, we also implemented a random object-to-background copy-move (that we named Random). This routine copies a random object to a background region that has the same shape as .
Overlap Forgery: Creation of two images with an overlap region from a single one. From a source image , we select different regions of that share an overlap area to create two images from these regions. Any of these new regions can suffer post-processing to obfuscate its source. Figure 5 depicts the creation of an image with an overlap area.
Splicing: Creation of an image composition that uses a donor figure’s elements into a host one. Figure 6 depicts an Splicing forgery.
Despite all cases are generated without any human interaction, the result images may confuse even an attentive person. To produce forgeries as realistic as possible, some functions from the library require as input an object map (segmentation map). The object map locates each object inside the image and assists a forgery function to execute the falsification more likely as a human would do.
Iii-B Realistic Scientific Figures
As our key objective is to create scientific figures, we include two features (frequently present in such figures) in the library: captions/indicative letters and compound figures.
Caption/indicative letters: Scientific figures often present indicative letters or captions that overlay the image’s content. As a result, this overlay is a splicing operation between a letter or a word within the experiment image that could raise a false alarm during forgery detection. Therefore, we add to the library the possibility to mimic this overlap behavior as it appears on scientific papers. We include three different levels of indicative verbosity. Level 1 includes only indicative letters around each panel of the figure. Level 2 includes the features of Level 1 and a random word around each panel. Level 3 includes all features from Level 2 and an indicative letter inside each panel. Figure 7 depicts all these levels of verbosity.
Compound figures: A Compound figure is a composition of multiple images that are organized in panels. These figures usually appear in articles to represent an overview of an experiment. To avoid creating unrealistic compound figures, we make use of figure templates based on real cases. These templates are image masks that can inform each panel’s location to the method, as well as their type (e.g., graphs, photos).
To create Compound figures, we implemented a routine that has as input a set of realistic compound figures templates , a dataset of scientific images (to be included in the compound figure), a source image ( is not in ), and a forgery function (to be applied in ). Figure 8 illustrates this routine. Thus, the method selects a template from with at least one panel whose aspect ratio is similar to the aspect ratio of . Then, the routine applies the forgery in , creating (a forged version of ). Later, a figure with the same size of is created, and is resized and placed in the panel of with the most similar aspect ratio of . Finally, all other panels of are filled with different images from that have similar aspect ratio to those panels or with fake graphics.
Iii-C Data Annotation
In spite of the importance of reliable ground-truth to evaluate a forensic method, to the best of our knowledge, there is no scientific forgery image dataset that presents an enriched ground-truth. Hence, all tampered operations implemented in the library provide detailed maps to indicate the manipulated regions. Each object involved in the tampering operation is indicated with a different ID in the ground-truth, which helps pinpoint the object’s exact location before and after the forgery, as depicted in Figure (g)g. The library also enables the creation of a JSON file containing metadata related to the forgery. This metadata includes the source images, the method and arguments used, and the location of each panel inside the Compound figure. As the metadata includes the source images and the forgery methods applied, one can evaluate provenance analysis  using these information as reference.
Iii-D Library Extension
Given that scientific image tampering improves over time to convince even researchers , the benchmark of tampering detection also should include cutting-edge forgery techniques. In this sense, to facilitate the inclusion of new manipulation in RSIIL, we implemented a high-level routine that receives as one of its arguments a forgery function and applies it to an image. This routine is responsible for regulating the application of any new manipulation, asserting its guidelines to the ground-truth and the metadata associated with the forgery. Because of this, any new forgery function capable of returning the forged image along with its manipulation map –a pixel-wise map locating the forgery inside the image– can be easily added to the library to generate Simple or Compound tampering figures.
Iv RECOD Scientific Image Integrity Dataset - RSIID
In addition to the library we introduced in the previous section, we created a dataset to serve as future benchmark for the area. For that, we selected the most frequent retracted types of images from the biomedical area. For this, we followed the orientation of Bucci  and Bik et al.  that report a high image manipulation rate on images from Western Blot techniques and Microscopy imagery. With this in mind, we downloaded real scientific images collected from diverse sources to apply the forgeries.
To avoid any legal aspect of creating manipulated images and aiming to publish the dataset with a common creative license, we only downloaded data available under public domain999https://creativecommons.org/publicdomain/zero/1.0 (Last access May, 2021) (PD) or common creative attributed101010https://creativecommons.org/licenses/by/4.0 (Last access May, 2021) (CC-BY) licenses. These license allow us to remix, transform, and reuse the images without asking for the author’s authorization.
We use the following data source to gather the image collection:
Broad Bioimage Benchmark Collection111111https://bbbc.broadinstitute.org (Last access May, 2021) (BBBC), :
A collection of freely downloadable microscopy image sets. From this source, we selected the datasets BBBC038, BBBC039, and BBBC019. The first two are dedicated to segmented nuclei images and have object-mask –that are needed for forgeries at object-level–. The last dataset (BBBC019) is dedicated to cell migration which we use for Overlap forgeries.
PubMed Central (PMC)121212https://www.ncbi.nlm.nih.gov/pmc (Last access May, 2021):
PMC is a free article archive of biomedical and life sciences. To download each figure from this repository, we use an API131313https://www.ncbi.nlm.nih.gov/pmc/tools/openftlist (Last access May, 2021 available by PubMed, in which we could select images that only have PD or CC-BY licenses. We choose to include published western blots images. To include the western blots to the source dataset, after downloading the PMC figures, we manually extracted the panels that had western blots for each figure. We based the templates images used for creating Compound Figures based real figures retrieved from this repository.
Table I shows the number of source figures for each collection by type (Microscopy or Western blot) pointing if they have object-mask annotation, and the number of template images created based on the figures from PMC.
|Source Image Dataset|
|Compound Figures Template|
Templates created based on the figures from the collection.
Iv-a Dataset Construction
While designing the dataset, we project it to evaluate a forensic tool in different tasks with different complexities. Because of this, the dataset is organized so that a user can easily find the data and its annotation for each forgery modality. Thus, we divided the dataset into two types of figure complexities: Simple and Compound.
Simple Scientific Figures: Figures with this complexity are represented by a single experiment image, (e.g., Figure 2a). To include a tampering figure in this complexity type, we forge an original figure using Retouching, Cleaning, or Duplication techniques, implemented in our library. To avoid unexpressive forgeries, we only included doctored figures in the dataset that have at least 500 manipulated pixels. In addition to the tampered figures, we also reserved a pristine directory in which we include the original images, so that a user can easily evaluate false positives. Figure 9 illustrates the organization of Simple figures in our dataset.
We divided the Compound Figures into two types of tampering: Intra-Panels and Inter-Panels.
Intra-Panels are forgeries that are present in just one panel of the figure. To create this tampering type, we add a Simple forgery as one of the figure’s panels. Forgeries that need more than one source image (e.g., splicing) or that generate more than one doctored figure (e.g., overlap) were not included in this modality.
Inter-Panel are forgered figures that have two or more panels involved in the manipulation process. This modality aims to evaluate duplications among two or more panels within the same figure. These duplications can be at object-level, region-level, or panel-level. At object-level, the objects from a donor panel are copied into a host, using splicing operation. At region-level, an overlap forgery operation creates two panels with overlapping areas. At panel-level, the entire panel is duplicated with or without post-processing (e.g., retouching, cleaning, or geometric transformations).
To assist machine learning forensics techniques, we further divided the dataset into training/test sets. TablesII and III express the number of manipulated figures included in each modality. Note that, overlap forgery appears only in the test set, since this modality is similar to the copy-move, and this protocol will force the generalizability of a forensic tool among the methods.
|Number of Figures||Number of Figures|
|Source of Forgery Figures|
|Total of Figures|
|Number of Figures||Number of Figures|
|Source of Forgery Figures|
|Total of Figures|
V Copy-Move Forgery Detection Proposed Metric
Popular metrics used on Copy-Move Forgery Detection (CMFD) (e.g., F1-score and Precision) make use of True Positive (TP), False Negative (FN), False Positive (FP), and True Negative (TN) detection concepts at pixel-level, as described in Table IV.
|Predicted - Detection Map|
|Positive (Suspect Pixels)||Negative (Non-Suspect Pixels)|
|Ground-Truth||Positive ( Tampered Pixel)||TP||FN|
|Negative (Pristine Pixel)||FP||TN|
As a drawback, these metrics cannot assert if both regions of a copy-move (the source and its copy) are in the ground-truth, since there is no consistency check. Because of this, some contradiction might occur during the evaluation. For instance, Figure 11 illustrates a detection map that has an inconsistent detection in which the copied objects with and are inconsistent with the ground-truth. For both objects, only the source or its copy overlaps with the ground-truth, which is inconsistent, since both (object and its copy) are expected to be included during detection. However, when evaluating this detection map with traditional true positive score, both regions would be considered as true positive hit.
To mitigate this drawback, this section introduces a new metric that takes advantage of the enriched pixel-wise ground-truth of the dataset. The proposed metric is a variation on how to consider a pixel as true positive in a detection map named as the Consistent True Positive score (CTP) and defined as:
Given a ground-truth map with manipulated regions, a detection map with copy-pasted regions, each one of the regions included in has connected components (the source object and all its copies), and each region in has connected components. Let be a detected region from and a tampered region indicated by the ground-truth. Also, let be a pixel from , such that . Thus, is a consistent true positive if exists , such that, at least two connected components from intersects .
In other words, to consider a region from the detection map as a consistent true positive, at least two components from (the source and at least one of its copies) have to intersect the ground-truth.
As Figure 12 depicts, a region from the detection map can overlap with two or more region from the ground-truth. Given that the goal of is consistency, we only consider as the region of the ground-truth that has the maximum intersection area with the detected region. Hence, . As a result, will have higher penalty on Precision and F1-score metrics, if calculated with .
Thus, the equation of F1-score and Precision using CTP become:
Vi Evaluating CMFD Methods
Duplication of scientific images is one of the threats highly reported and studied in the literature [3, 18, 20] which includes copy-move, a well-studied forgery in the digital forensic field. Although this field presents multiples CMFD solutions for natural images, we could not find any study that evaluates their performance in the scientific image domain. In this sense, to assist any future forensic method with a baseline, we investigated the performance of popular CMFD solutions on natural images applied to the RSIID. In addition, we checked the difference of F1-score using the proposed consistent true positive metric and the regular true positive one. For this, we choose the following CMFD methods:
Efficient Dense-Field from Cozzolino et al. . During the evaluation, we use the implementation of Ehret . Ehret released two versions of  using Zernike and SIFT features. To distinguish this method from the others, we named them Zernike-PM and SIFT-PM, since these detectors use the PatchMatch algorithm  to match similar blocks contents.
CMFD library implemented by Christlein et al. . We selected SIFT and SURF methods from this library since the others were not efficient enough to be explored on such a large dataset. To distinguish them from the previous CMFD detectors, we named them SIFT-NN and SURF-NN since they use a regular approximate nearest-neighbor approach to match similar blocks.
To evaluate SIFT-PM, Zernike-PM, SIFT-NN, and SURF-NN using
, we modified their implementation, including a routine that assigns each detected object and its copies a unique ID. For the sake of reproducibility, we released the methods source-code with this modification in the same repository of RSIIL. On the other hand, to evaluate Busternet, we normalized its output [0,255], then binarized all pixels greater than 100 to 1, otherwise 0. As Busternet is based on neural networks, we could not find an explainable methodology that would track the matching among different objects and their copies. Thus, to themetric, all detected and ground-truth objects are set with the same . Consequently, would not be able to properly check inconsistencies on figures with more than one tampered object for Busternet’s output; however, is still valid and useful to check if Busternet’s output overlaps with two or more connected components from the ground-truth.
As a baseline approach, the evaluation protocol consists of running all methods without any training or fine-tuning and measuring their output with . During the evaluation, we use all figures from the test set applicable for CMFD (i.e., images with duplicated areas within the same image). We group the baseline results into Simple and Compound scientific figures, which were divided by modalities. All copy-move modalities presented in Figures 13 and 14 can also include scaling. Since scaling cannot be applied alone, we did not indicate when this operation is combined with others.
Figure 13 presents a radar graph visualization in which the forgery modalities are arranged in the radius axes. Each CMFD methods’ result is represented with a different color in the radar char. In this visualization, we insert the score of each method along the modality axis (e.g., copy-move with flip) which start from the radar center (score zero) until its border (highest score); thus, as farther a method point (color point) is from the center as better is the method for the axis copy-move modality. After inserting all points of a method for each copy-move modality, we connected those points resulting in a polygon. The larger the polygon area, the better the method’s robustness among different forgery modalities. Also, comparing each detector’s robustness to the operations, this type of visualization helps identifying possible complementary behaviors among different methods. As an example, consider Figure (a)a left panel. In this case, we have five modalities being compared (e.g., Copy-Move with Flip, Cleaning with Brute-Force, and Copy-Move with Translation). This char shows the results of five methods represented by each different polygon color (see legend on the right of the figure). The best method in this figure is Busternet (in orange) while the two worse methods (SURF-NN and Zernike-PM) are in superposition at the center (smaller areas). In the following, we discuss the forgery evaluation for each modality.
Vi-a Simple Figure Forgery Baseline
In this modality, we tested the chosen methods on Simple figures, forged with Cleaning (Brute-Force) and Copy-Move. Although the chosen CMFD detectors have high efficacy on natural image benchmarks, their performance drastically decrease when applied to our scientific dataset. As Figure (a)a shows, the best CMFD method in the Simple Figure Evaluation was Busternet , despite its modest scores.
For this modality, we also compare each methods’ performance between F1-score using and . We notice a difference in these scores for all methods, represented by the area reduction from their polygon chars, indicating the existence of copy-move inconsistencies on their detection maps, which is also depicted in Figure 14. The second row of this figure shows an example of an inconsistent detection map, in which Busternet activates just one of the connected components involved in the manipulation, resulting in . On the other hand, in the same row, SIFT-NN detects both regions (object and copy), resulting in .
Vi-B Inter-Panel Figure Baseline
We evaluate the Inter-Panel tampered figures for all indicative verbosity levels in Copy-Move (at panel-level), Splicing, and Overlap forgeries. Figure (b)b shows the result for the Inter-Panel forgery evaluation using . In this modality, the radar visualization allowed us to notice some complementary performance among the chosen detectors. For instance, SURF-NN and Zernike-PM show a complementary behavior to copy-move with rotation and retouching. We believe that this complementary aspect indicates that a fusion/ensemble technique might enhance their individual robustness. For the Inter-Panel scenario, the flipped copy-move, Splicing, and Overlap forgery showed to be the most challenging forgeries. In addition, the indicative letters are shown to have a perceptible impact in this scenario, reducing by up to seven points from level 1 to level 3 for some detectors. Althouth Busternet achieves the best performance in Simple Figure modality, when applied to compound figures, it leads to a higher false-positive rate, as depicted in Figure 14 by activating the entire image.
We also noticed that graphs and indicative letters are the most common causes of false alarms in the Compound Figures scenario, as illustrated by the third, fourth, and fifth rows of Figure 14, which SIFT-PM wrongly activates letters and graphs regions.
These findings help us to see where researchers should focus on when dealing with the scientific image forgery detection problem.
Vi-C Intra-Panel Figure Baseline
We evaluate the Intra-Panel tampered figures for all levels of indicative verbosity in Cleaning (Brute-Force) and Copy-Move (at object-level).
As presented by Figure 13, this is the most challenging scenario, in which the detectors scored lower than four on for all evaluated operations. A possible explanation for this is the lower percentage number of doctored pixels in these figures than in other modalities. The detectors’ low performance does not allow us to measure the impact of verbosity levels in the figures properly. However, as Figure 14 shows, graphs and indicative letters would also be one cause of false alarms in this modality.
In addition to the daunting scenario of fraud in science —due to the increase of image misconduct cases—, there is a legal issue related to copyrights and judicial aspects that prevents one from creating a large collection of fraudulent scientific images, even for an in-depth forensic study to benchmark and drive the development of appropriate detection methods.
Therefore, this work introduced a library and a dataset to assist the scientific integrity and forensic community to overcome this legal hurdle. We believe that by presenting a large dataset to the forensic community, we are fostering the development of more complex and robust detection tools (e.g., AI-based models).
The proposed library implements the most common image manipulation forgeries described by scientific integrity researchers. Also, it is extendable to more complex tampering operations. As a special feature, the library generates an enriched ground-truth addressing all regions affected before and after applying a tampering function, assigning a unique ID for the regions involved (when applicable). Using this library on creative common scientific images, we created a dataset with 39,423 manipulated figures freely available.
Leveraging the dataset’s enriched ground-truth, we proposed a metric that avoids inconsistent detection during CMFD evaluation. Using this metric, we evaluate popular CMFD methods on our dataset. Although we choose high-cited and effective CMFD tools for natural images, all solutions presented a lower performance when transferred to the scientific image domain. This is not a fault of such algorithm as they were not designed for this specific setup. However, these findings show an important lack of methods and a tremendous research opportunity for new specialized detectors aiming at finding forgeries in scientific-related images. In addition, we notice that some of the chosen algorithms present complementary performance and might benefit from a fusion approach.
Notwithstanding the large size and diversity of the proposed dataset, we believe that science will report more sophisticated tampering operations in the near future, as warned by . Thus, we are also concerned about more issue-less and freely available scientific integrity datasets with complex, enhanced, and realistic tampering modalities, aiding the design of more robust detectors.
Therefore, as future work, in addition to investigating robust forensic solutions using AI-based or fusion-based methods, we believe that studies on automated realistic scientific forgeries would also assist the forensic community in fighting scientific misconduct. Furthermore, we believe that the detailed pixel-wise ground-truth of RSIID opens a research opportunity to explore eXplainable AI solutions that might assist analysts on sensitive cases, such as misconduct investigations.
This research was supported by São Paulo Research Foundation (FAPESP), under the thematic project DéjàVu, grants 2017/12646-3 and 2020/02211-2.
-  (2017) The career effects of scandal: evidence from scientific retractions. Research Policy 46 (9), pp. 1552–1569. Cited by: §III.
-  (2009) PatchMatch: a randomized correspondence algorithm for structural image editing. In ACM Transactions on Graphics (ToG), Vol. 28, pp. 24. Cited by: item 1.
-  (2016) The prevalence of inappropriate image duplication. Biomedical Research publications 7 (3). Cited by: §I, §I, §II, §III-A, §IV, §VI.
-  (2018) Automatic detection of image manipulations in the biomedical literature. Nature Cell death & disease 9 (3), pp. 400. Cited by: §IV.
-  (2020-02) A single ‘paper mill’ appears to have churned out 400 papers, sleuths find. Science. External Links: Cited by: §I.
-  (2012-12) An evaluation of popular copy-move forgery detection approaches. IEEE Transactions on Information Forensics and Security 7 (6), pp. 1841–1854. External Links: Cited by: §I, item 2.
-  (2015) Efficient dense-field copy-move forgery detection. IEEE Transactions on Information Forensics and Security 10 (11), pp. 2284–2297. Cited by: §I, item 1.
Region filling and object removal by exemplar-based image inpainting. IEEE Transactions on image processing 13 (9), pp. 1200–1212. Cited by: Fig. 3, item 2.
-  (2018-07) Automatic detection of internal copy-move forgeries in images. Image Processing On Line 8, pp. 167–191. External Links: Cited by: item 1.
-  (2009-10) Science journals crack down on image manipulation. Nature. External Links: Cited by: §I.
On identification and retrieval of near-duplicate biological images: a new dataset and protocol.
International Conference on Pattern Recognition (ICPR), External Links: Cited by: §II.
-  (2002-04) Forensic examination of questioned scientific images. Accountability in Research 9 (2), pp. 105–125. External Links: Cited by: §I.
-  (2019-12) Pitt researchers sue journal for defamation following retraction. External Links: Cited by: §III.
-  (2013) The collective consequences of scientific fraud: an analysis of biomedical research. In Proceedings of ISSI 2013, Proceedings of the International Conference on Scientometrics and Informetrics. Vienna: Austrian Institute of Technology, pp. 1897–1899. Cited by: §III.
-  (2018) Image provenance analysis at scale. IEEE Transactions on Image Processing 27 (12), pp. 6109–6123. Cited by: §III-C.
-  (2017-04) Nuclei segmentation in histopathology images using deep neural networks. In 2017 IEEE 14th International Symposium on Biomedical Imaging (ISBI 2017), External Links: Cited by: item 3.
Emerging concern of scientific fraud: deep learning and image manipulation. bioRxiv. Cited by: §III-A, §III-D, §VII.
-  (2004) What’s in a picture? The temptation of image manipulation. The Journal of Cell Biology 166 (1), pp. 11–15. Cited by: §I, §I, §III-A, §VI.
-  (2002-01) German task force outraged by changes to science fraud report. Nature 415 (6867), pp. 3–3. External Links: Cited by: §I.
-  (2021-02) Scientific integrity is threatened by image duplications. American Journal of Respiratory Cell and Molecular Biology 64 (2), pp. 271–272. External Links: Cited by: §VI.
-  (2018) BusterNet: detecting image copy-move forgery with source/target localization. In European Conference on Computer Vision (ECCV), Cited by: §I, item 3, §VI-A.
-  (2020) Scientific image tampering detection based on noise inconsistencies: a method and datasets. arXiv preprint arXiv:2001.07799. Cited by: §II.