DeepAI
Log In Sign Up

Printing and Scanning Attack for Image Counter Forensics

04/27/2020
by   Hailey James, et al.
0

Examining the authenticity of images has become increasingly important as manipulation tools become more accessible and advanced. Recent work has shown that while CNN-based image manipulation detectors can successfully identify manipulations, they are also vulnerable to adversarial attacks, ranging from simple double JPEG compression to advanced pixel-based perturbation. Rephrase(hailey): In this paper we explore another method of highly plausible attack: printing and scanning. We demonstrate the vulnerability of two state-of-the-art models to this type of attack. We also propose a new machine learning model that performs comparably to these state-of-the-art models when trained and validated on printed and scanned images. Of the three models, our proposed model performs the best when trained and validated on images from a single printer. To facilitate this exploration, we create a dataset of over 6,000 printed and scanned image blocks. Further analysis suggests that variation between images produced from different printers is significant, large enough that good validation accuracy on images from one printer does not imply similar validation accuracy on images from a different printer.

READ FULL TEXT VIEW PDF

page 2

page 3

page 8

08/25/2018

Analysis of adversarial attacks against CNN-based image forgery detectors

With the ubiquitous diffusion of social networks, images are becoming a ...
04/25/2021

Making GAN-Generated Images Difficult To Spot: A New Attack Against Synthetic Image Detectors

Visually realistic GAN-generated images have recently emerged as an impo...
05/06/2018

A Counter-Forensic Method for CNN-Based Camera Model Identification

An increasing number of digital images are being shared and accessed thr...
08/30/2022

A Black-Box Attack on Optical Character Recognition Systems

Adversarial machine learning is an emerging area showing the vulnerabili...
05/31/2018

Adversarial Attacks on Face Detectors using Neural Net based Constrained Optimization

Adversarial attacks involve adding, small, often imperceptible, perturba...
04/17/2019

Adversarial Defense Through Network Profiling Based Path Extraction

Recently, researchers have started decomposing deep neural network model...
10/22/2020

Adversarial Attacks on Binary Image Recognition Systems

We initiate the study of adversarial attacks on models for binary (i.e. ...

1 Introduction

Determining the authenticity of an image is becoming increasingly important for legal proceedings, criminal investigations, and verifying identity-supporting documents. In recent years, Convolutional Neural Networks have been employed to detect image manipulations, ranging from identifying splicing and copy-move forgeries

[30] to manipulations such as contrast enhancement [32, 38], resampling [29], JPEG compression [14], Gaussian blurring [19, 6], median filtering [5], and additive white Gaussian noise [2]. Of this latter group, some may be innocuously applied but often are maliciously included in post-processing to disguise other manipulations [4], .

More recently, research in image forensics has included the presence of an adversary, a situation in which the vulnerability of CNNs has been well-studied [7]. In regards to image forensics, consideration of these adversarial attacks have been primarily limited to pixel-based adversarial examples and JPEG double compression[1, 31]. In pixel-based attacks, an adversary with knowledge of the CNN model in deployment can craft an ”attacked” image which appears visually identical to the original image, but is mislabeled by the CNN [16]

. This problem is well known in computer vision and has been at the forefront of recent work in field. However, this type of attack demands a certain level of expertise by the adversary, and is unlikely to be employed in a majority of cases in image forensics. Even for skilled adversaries, constructing pixel-based adversarial attacks is often labor-intensive, and recent work has cast doubt on the transferability of adversarial attacks in image forensics applications

[18]. While pixel-based adversarial attacks require at least some knowledge of the model, a low-level adversarial manipulation such as double JPEG compression requires no such knowledge[1]. In this type of attack, the images are simply JPEG compressed after the manipulation has been applied, hampering the model’s ability to correctly identify post-processing methods such as additive white Gaussian noise or median filtering [1]. For this reason, building models robust to low-level, simple adversarial manipulations such as JPEG double compression, to which several manipulation detection models have been found to be vulnerable [1, 31, 1], is particularly important. The goal of this paper is to investigate the vulnerability of state of the art models to another kind of low-level adversarial manipulation: printing and scanning. To our knowledge this is the first investigation into adversarial attack in digital image manipulations through printing and scanning.

Figure 1: Pristine image before (left) and after (right) printing and scanning on Xerox1 (Xerox Altalink C8070 Multifunction Printer). We note that there is significant variation between the two images, similar to that introduced by the global manipulation methods with which we experimented.

In physical forgery, repeated printing and scanning can be used to obscure manipulations or watermarks. A document may be modified, usually non-digitally, and then repeatedly printed and scanned to disguise the manipulation artifacts. While scanning a printed document is not always related to forgery, it is reasonable to expect that state-of-the-art models be impervious to this type of post-processing, as is noted in related work in double JPEG compression [1, 14]. In addition, unlike complex pixel-based adversarial attacks, simply printing and scanning an image is both low-cost and requires little expertise, similar to JPEG compression.

In this paper, we limit our investigation to globally-applied manipulations, such as Gaussian blurring, additive white Gaussian noise, and median filtering, rather than local manipulations such as copy-move or splicing, as in related work [2]. We construct printed and scanned datasets from three different printers and experiment with two state-of-the-art models, as well as our own novel model. Related to our work is research involved in identifying camera models [37] – we additionally report results for identifying printer model. Our main contributions include the following:

  • We conduct the first analysis into the vulnerability of image manipulation detectors to printing and scanning, demonstrating that at least two state-of-the-art models are vulnerable to this type of highly plausible and inexpensive attack

  • We propose a novel model architecture which performs comparably than the state-of-the-art models when trained and evaluated on printed and scanned images, including performing 5% better when trained on images from a single printer

  • We conduct an in-depth analysis on the relationship between CNN-based image manipulation detectors, including training on composite datasets, and plan to share our dataset of over 6,000 printed and scanned images with the community to facilitate further investigation

The rest of the paper is organized as follows. In Section 2, we give context and background through related work. In Section 3, we describe our novel model architecture, as well as those of the models we used for comparison. In Section 4, we describe the datasets used for training and validation. In Section 5, we explain the experiments conducted, and in Section 6, we discuss the results of these experiments. The paper ends in Section 7, where we summarize our conclusions and suggest areas of future research.

2 Related Work

Figure 2: Examples of manipulations before and after printing and scanning. The six manipulations refer to additive white Gaussian noise (AWGN), Gaussian blurring (GB), JPEG compression (JPEG), median filtering (MF), pristine or no manipulation (PR) and bilinear resampling (RS). We note that due to the algorithms employed, JPEG compression and resampling might be reasonably similar to the printing and scanning process. For this reason, we additionally train and evaluate the models on a restricted set of four classes only, excluding JPEG and bilinear resampling. See Table 2 for details on the parameters used for each manipulation.

As the scope of this paper is limited to CNN-based global image manipulation detectors, we provide background on convolutional neural networks and CNN-based detectors. We similarly provide context on adversarial attacks on CNNs generally as well as specifically on CNN-based image manipulation detectors. Related to this work is work on detecting manipulations through inconsistencies in lighting [20] and despite various compression qualities [13]. [20]

contributes significantly to this problem area, though without examining models that leverage deep learning.

[13] similarly advances the area, but without addressing specifically the problem of printing and scanning in relation to CNN-based detectors, and is thus complementary to this work.

2.1 Convolutional Neural Networks

Convolutional neural networks, and deep learning approaches more generally, are an extension of neural networks. The architecture is defined by a set of parameters used to design the network and generally consists of stacked layers. This approach has proven effective in extracting higher-order features, particularly in object identification, but recent work has also demonstrated that these types of networks, with some modifications, are also useful for extracting the low level features useful for detecting global image manipulations. In a Convolutional Neural Network (CNN), the first layer is a set of convolutional feature extractors that convolve across the image in parallel using a set of learnable filters. The filters are initialized randomly, and the values are updated iteratively through back-propagation [23, 22]

. The filters use a sliding window approach to reach all regions of the input image, producing feature maps. The following hidden layers function similarly, convolving over the resulting feature maps to produce higher-level feature maps. Each convolutional layer is followed by an activation function to provide non-linearity. Additionally, convolutional layers are often followed by pooling layers (max, average, stochastic, etc) to reduce the dimensionality of the resulting features and reduce over-fitting. The output of the final convolutional layer is inputted into a fully-connected neural network for classification.

2.2 Deep Learning for Image Forensics

Recent methods in image forensics techniques leveraging deep learning have reached impressive performance. In 2015, a CNN-based classifier was proposed for detecting median filtering in images

[9]. Building on this work, [2] proposed CNN-based model with the addition of a ”constrained convolutional layer”, or a layer constrained to learn the high-pass features of an image by attempting to predict a central pixel based on its neighbors. This serves to suppress the image content while learning the manipulation fingerprint, drawing inspiration from steganalysis rich model (SRM) filters in steganalysis [15].

More recently, it has been shown that third order subtractive pixel analysis matrix (S3SPAM) features can be learned by a simple shallow CNN, and can employ transfer learning to achieve good performance on little training data

[12]. In addition to directly detecting manipulations, a deep learning method for analyzing the image processing history as an important component for image forensics has been proposed, as the processing history pipeline can affect the accuracy of other forensic tools [3].

Dataset Name Description Size Labels
Original IFSTC dataset after six manipulations 198,624
awgn, gb, mf,
pr (jpeg), (rs)
Xerox1
Images from IFSTC dataset with
manipulations after being printed
and scanned on Xerox1
2,142
awgn, gb, mf,
pr (jpeg), (rs)
Composite Printers Combined set of images from each printer (balanced) 6,426
awgn, gb, mf,
pr (jpeg), (rs)
Composite Full
Combined set of images from each
printer plus original IFSTC images (balanced)
8,568
awgn, gb, mf,
pr (jpeg), (rs)
Printer Identification Pristine images after being printed and scanned by all three printers 3,213 Dell, X1, X2
JPEG Compression IFSTC dataset with JPEG compression (QF=80) on all images 198,624
awgn, gb, mf,
pr (jpeg), (rs)
Table 1: Descriptions and sizes of each dataset used for training and validation. Size refers to the number of 256x256 or 299x299 image blocks in each dataset, which is then split in 75% training and 25% validation. X1 and X2 refer to Xerox1 and Xerox2 printers (Section 4.1), respectively. The labels refer to the labels used when training and evaluating on each dataset

2.3 Adversarial Attacks on CNNs

The vulnerability of CNNs to adversarial attacks has been well documented [16, 26]. Adversarial noise can be designed in such a way that, when added to the image, can retain visual quality while misleading the classifier. For example, Fast Gradient Sign Method (FGSM) [16]

leverages the differentiability of the loss function, assumed to be known to the adversary. The method proposes altering each pixel based on the gradient of the loss with respect to the original pixels in the input image. These changes small are enough such that the resulting image is visually nearly identical to the original, but are large enough cumulatively to increase the loss such as to impair the classification. Similarly, projected gradient descent (PGD)

[24] seeks a perturbation that maximises the loss on a specific input while keeping the perturbation size smaller than a given epsilon. DeepFool [25] uses a local linearization of the classifier to approximate the decision boundary and alter the images accordingly. The Jacobian-based Saliency Map Attack (JSMA) [26] uses a greedy iterative procedure, altering only the pixels which contribute most to the correct classification as identified by a saliency map. Each of these pixel-based adversarial attacks, while effective, requires at least partial knowledge of the network used for image manipulation detection. In contrast, low-level adversarial attacks such as JPEG compression or printing and scanning, the subject of this paper, require no such knowledge.

2.4 Adversarial Attacks in Image Forensics

While CNN-based classifiers have achieved high performance on benchmark image forensic tasks, recent research in computer vision has demonstrated that CNN-based detectors, like CNNs more broadly, are highly vulnerable to adversarial attacks. For example, in [21], the authors demonstrate that a GAN-based architecture can conceal 3x3 median filtering manipulation, causing a detector to label the image as non-manipulated, including for the CNN-based detectors proposed in [2] and [8]. Additionally, a method of adversarial attack based on small pixel-based distortions has been proposed for fooling global image manipulation detectors [35]. However, [36]

notes that unlike in most pattern recognition tasks, pixel-based adversarial attacks such as Fast Gradient Sign Method (FGSM)

[17] and Jacobian-based Saliency Map Attack (JSMA) [27], are not for the most part transferable between models.

Recent work has explored the vulnerability of image manipulation detectors to low-resolution median filtering [34] and JPEG compression [1, 31, 10]. To our knowledge, ours is the first paper to examine model vulnerability to printing and scanning.

3 Models

Here we describe our proposed model architecture for improved performance on printed and scanned images. We compare our model to the model proposed in [2], the inspiration for the constrained convolutional layer, as well as to XceptionNet (Xception) [11], as they have nearly identical number of parameters and similar architecture, so the difference in performance cannot be attributed to increased network capacity.

Figure 3:

Proposed network architecture. The first layer is a constrained convolutional layer to extract SRM features, followed by an deep architecture with separable convolutional layers to improve generalization. The last layer is either a 4x1 vector (for four classes) or a 6x1 vector (for all six classes).

3.1 Proposed Model

Our proposed architecture consists of one constrained convolutional layer [2]

, 1 convolutional layer, 34 separable convolutional layers, 5 pooling layers (4 max pooling, 1 global average pooling), and a final fully connected layer (see Figure

3

). Each convolutional layer was followed by ReLU activation, and max pooling layers were performed with a stride of 2x2.

In the constrained convolutional layer, a 5x5 filter is employed in which the sum of all the weights is constrained to be zero [2]. Specifically, the center pixel is predicted by the rest of the pixels in the field, and the output of the filter can be interpreted as the prediction error, as suggested by research in steganalysis [3]. Specifically, the weights in the filter are constrained such that:

where refers to the weight, and and refer to the coordinates in the filter, where is the central weight.

The purpose of the constrained convolutional layer is to constrain the model to learn image manipulation fingerprints, rather than image content and higher order features, such as those useful for object recognition and classification tasks. The prediction error fields are then used as low-level forensic trace features by the rest of the network to assist in classifying global image manipulation detection.

For the separable convolutional layers, a spatial convolution is performed independently for each channel and is followed by a point-wise or 1x1 convolution, as proposed in [11]. These components decrease the number of free parameters allowing the deep network to learn effectively even with a small training set, making it particularly appropriate for our investigation.

In this approach, we hope to leverage both the SRM-like features produced by the convolutional layer as well as the improved generalization ability provided by the added depth and separable layers.

3.2 Bayar2016

Proposed in 2016, the constrained convolution method of image manipulation detection, hereafter referred to at Bayar2016, proposes a three-layer CNN, with two max-pooling layers and three fully-connected layers (including the initial constrained convolutional layer) [2]. This model demonstrated impressive results in discerning between the six manipulations investigated in this paper using the dataset described in the next section, achieving 99.9% validation accuracy.

3.3 Xception

In addition to a the Bayar2016 shallow network, recent work has demonstrated that increasing network depth can dramatically improve model generalization. To compare with a model of similar depth that also uses separable convolutional layers, we experiment with XceptionNet, a deep network comprising of 42 layers, including separable convolutional layers [11]. The network design is built upon Inception architecture [33], with the innovation of separable filters.

4 Datasets

Manipulation Hyperparameters
Additive White Gaussian Noise
(AWGN)
= 2.0
Gaussian Blurring (GB)
kernal size = (5,5)
= 1.1
JPEG compression (JPEG) quality = 70
Median Filtering (MF) kernal size = (5,5)
Pristine (PR) none
Bilinear Resampling (RS) ratio = 1.5
Table 2: Parameter specifications for each manipulation type. We used the same parameters as in [2] for fair comparison. See Section 4.2 for details on the manipulations.

Following the procedure described in [2], we use images from the first IEEE IFSTC Image Forensics Challenge [28], obtaining 3,334 images of size 1024x768, which was further split into training, validation and testing data. The dataset comprises of many images captured from several different digital cameras of both indoor and outdoor scenes.

4.1 Printing and Scanning

We used three different printers and one scanner to create a dataset of printed and scanned images: one Dell S3845CDN Laser Multifunction Printer, one Xerox Altalink C8070 Multifunction Printer, and one Xerox WorkCentre 7970 Multifunction Printer, which we refer to as Dell, Xerox1 and Xerox2 respectively hereafter. We printed 50 images of each manipulation type on each printer and used the Dell scanner to scan each image (see Figure 1). After scanning and extracting the images from the resulting pdfs, the image sizes were 1700 x 2200 pixels, which was then center-cropped to 1536 x 1792 to remove the white border added by the scanning process. Each image was then split into 42 256x256 blocks (or 299x299 blocks for our proposed model and for Xception), resulting in 2,142 image blocks of each class from each printer (see Figure 2). We limited our data creation to 900 full-page color images both for budget constraints and environmental concerns, thus presenting synthetic printing and scanning simulation as an avenue of possible future work.

4.2 Manipulations

We post-processed each image with additive white Gaussian noise (AWGN), Gaussian blurring (GB), JPEG compression (JPEG), median filtering (MF) and re-sampling (RS) as well as retaining the pristine image (PR), yielding six classes, per the procedure described in [2].

Additive white Gaussian noise constructs a noise matrix of the same shape as the image according to a normal distribution with a given sigma value and adds this matrix to the original image. The result is then normalized to values between 0 and 255. Gaussian blurring blurs the image using a Gaussian filter by convolving the input image using a given kernal. JPEG compression is a lossy compression method which compresses the image through converting the color map, down-sampling and Discrete Cosine Transform (DCT). Median filtering replaces each pixel with the median value of the neighboring pixels using a given kernal area. Bilinear resampling works similarly, resizing the image using the distance-weighted average of the neighboring pixels to estimate the new pixel value. See Table

2 for parameter details.

5 Experiments

We trained each model (our proposed model, Bayar2016, and Xception) on a variety of training sets and evaluated each trained model on a variety of validation datasets (See Table 1).

We first evaluated the extent to which our selected models can correctly classify the validation images after printing and scanning. We trained each model on the original dataset (before printing and scanning) with all six classes: additive white gaussian noise (AWGN), gaussian blurring (GB), JPEG compression (JPEG), median filtering (MF), bilinear resampling (RS) and pristine or no manipulation (PR). For a more complete analysis, we removed the bilinear resampling (RS) and JPEG compression (JPEG) classes from the training and validations sets and retrained the models, as these two classes could intuitively be considered similar to changes introduced during the printing and scanning process (see Table 3).

Second, we explored countering this vulnerability by training on the printed and scanned image blocks [17]. We trained each model on the printed and scanned image blocks from a single printer. The dataset (see Table 1, Xerox1) consists of 50 full images (1700 x 2200 pixels), which were then divided into 256x256 for the Bayar2016 model or 299x299 for the Xception models. This resulted in 2,142 image blocks for each dataset, which was divided into training and validation sets of size 1722 and 420 respectively, using only the central images to avoid including border artifacts from the scanning process.

Third, we created composite datasets, one consisting of all printed and scanned image blocks (from all three printers), and the other consisting of all printed and scanned image blocks as well as a number of image blocks from the original dataset (before printing and scanning), at a size equivalent to those from one of the three printers. The first composite dataset, which we refer to as Composite Printers, consists of 6,426 image blocks (printed and scanned only), while the second consists of 8,568 image blocks (75% printed and scanned, 25% original). The goal of this experiment was to evaluate if the poor accuracy fitting the printed and scanned data could be mitigated by dramatically increasing the size of the training set.

Finally, we evaluated the performance of each of the models on identifying the printer of printed and scanned images (see Table 1, Printer Identification).

For Bayar2016, we used a batch size of 64, an initial learning rate of 0.01, Stochastic Gradient Descent (SGD) with momentum 0.95, weight decay 0.0005, gamma 0.7, and step size 6.

We used similar hyperparameters for Xception and our proposed model. Specifically, for both models, we use the pre-trained weights from the network as trained on ImageNet. We again used SGD, and inferred the batch size and learning rate at training time based on the number of GPUs, using

for the batch size and 0.01 for the initial learning rate. We use momentum 0.9 and weight decay 0.0005. For learning rate decay, we use polynomial decay as described in [39]. For each model, we trained until the validation accuracy plateaued or began to fall.

Following the original methodology for Bayar2016, we retain only the green color layer of each image and divide into 256x256 non-overlapping blocks, retaining nine central blocks. For our proposed model and for Xception, we retain all three color channels and split the images into 299x299 non-overlapping blocks, according to the input size of the original architecture.

6 Results and Discussion

6.1 Print-Scan Manipulation

To evaluate the general vulnerability of each of the models to printed and scanned images, we trained on the original IFSTC dataset (before printing and scanning) and evaluated each model on validation sets from each of the three printers. When we evaluated the models on the printed and scanned validation sets, we found that each model performed only slightly better than random.

We additionally removed the bilinear resampling and JPEG compression classes, and found that the resulting models are similarly unable to correctly classify the remaining four manipulations, still performing at or below random. We additionally note that the models perform worse on the printed and scanned validation images than on the validation images after JPEG compression, a known vulnerability of these types of models, indicating that printing and scanning may be more effective at masking the manipulations [13] (see Table 3).

Bayar2016 Xception
Proposed
Model
Original (6c) 0.9979 0.9916 0.993
Dell (6c) 0.1643 0.1632 0.1673
Xerox1 (6c) 0.1976 0.201 0.1827
Xerox2 (6c) 0.1972 0.202 0.1953
Original (4c) 0.9948 0.9954 0.997
Dell (4c) 0.2571 0.223 0.2347
Xerox1 (4c) 0.2411 0.246 0.2367
Xerox2 (4c) 0.2387 0.255 0.2393
JPEG (4c) 0.4255 0.5126 0.4825
Table 3: Validation accuracy for various validation sets after training on IFSTC dataset. We note that although all three models perform exceptionally well on the original IFSTC dataset, each performs little better than random when evaluated on images from any of the three printers. Because JPEG compression and bilinear resampling could be reasonably inferred to be similar to printing and scanning, we remove these classes and train and evaluate on a restricted set of four classes (4c) (See Section 5). Despite this restricted set of manipulations, however, the models perform no better than random.
Bayar2016 Xception
Proposed
Model
Xerox1 (4c) 0.7036 0.666 0.753
Dell (4c) 0.3018 0.482 0.456
Original (4c) 0.2342 0.3873 0.3649
Xeros2 (4c) 0.4738 0.611 0.572
JPEG (4c) 0.2418 0.3848 0.364
Table 4: Validation accuracy for various validation sets after training on Xerox1 dataset (see Table 1

, Xerox1). We trained each model on images from only the Xerox1 dataset, or images after being printed and scanned on the first Xerox printer. We find that while no model is able to perfectly fit the printed and scanned dataset, our proposed models significantly outperforms the current state-of-the-art models. We also note that transferability to other printers remains weak, indicating significant variance between the printers. Here 4c indicates that we used the restricted set of manipulations (AWGN, GB, MF, and PR) (See Section

5).
Bayar2016 Xception
Proposed
Model
Dell (4c) 0.6506 0.649 0.713
Xerox1 (4c) 0.7001 0.626 0.683
Xeros2 (4c) 0.5381 0.623 0.663
JPEG (4c) 0.2643 0.2902 0.2601
Original (4c) 0.2617 0.2847 0.2449
Table 5: Validation accuracy for various validation sets after training on the Composite Printers dataset. One possible explanation for the poor validation accuracy on a single printer could be the small size of the dataset. To investigate this, we combine the images from all three printers for training, but note that performance on a single printer does not improve. Here 4c indicates that we used the restricted set of manipulations (AWGN, GB, MF, and PR) (See Section 5).
Bayar2016 Xception
Proposed
Model
Dell (4c) 0.6339 0.662 0.654
Xerox1 (4c) 0.6982 0.632 0.674
Xerox2 (4c) 0.5637 0.602 0.696
JPEG (4c) 0.519 0.6374 0.4972
Original (4c) 0.8063 0.0.9259 0.9629
Table 6: Validation accuracy for various validation sets after training on the Composite Printers dataset. For a complete analysis, we add additional image blocks (blocks before printing and scanning) to the composite dataset, but again find that performance does not improve. Here 4c indicates that we used the restricted set of manipulations (AWGN, GB, MF, and PR) (See Section 5).
Bayar2016 Xception
Proposed
Model
Printer
Identification
0.9048 0.956 0.9533
Table 7: Validation accuracy for printer identification by model. We investigate the variation of the images between printers by training each model to discern between printers. The high accuracy indicates that the images produces by each printer vary significantly.

6.2 Cross-Training on Printed and Scanned Examples

We additionally trained each model on printed and scanned images from an individual printer (Xerox1) (See Section 4.1).

We note that Bayar2016 and Xception achieve accuracies 66.6% and 70.4% respectively, while our proposed model is able to achieve an accuracy of 75.3%. It also appears that training on one printer does not lend itself to similar validation accuracy on examples from another printer, even of the same make. (See Table 4).

6.3 Composite Training

To compensate for the small size of the dataset for each printer alone, we created a composite dataset, consisting of all of the printed and scanned examples (total size 6,426 blocks), which we refer to as Composite Printers. However, we found that training on this composite dataset did not improve validation performance on any single printer compared with training on images from that printer alone. While this is possibly due to a still insufficiently large training dataset, it also likely provides further evidence that the difference between printers and scanners may be significant enough to preclude fitting a general printed and scanned dataset (see Table 5).

For completion, we additionally created another composite dataset, which we refer to as Composite Full, which consists of the same composition as Composite Printers plus an equivalent number of examples from the original dataset (total size 8,568), and found similar results. (See Table 6).

6.4 Printer Identification

For comparison with work on camera model identification, we additionally experimented with printer identification on each of the three printers using the discussed models, and found that the models could distinguish between images from the printers with up to 95% accuracy. This is particularly impressive considering the accuracies were achieved using a relatively small set of training data (2,410 image blocks) and without any additional metadata (see Table 7), indicating significant variance between the artifacts introduced by each printer [37].

7 Summary

We investigated the robustness of current state-of-the-art image manipulation detection models in the context of printing and scanning, and found that these state-of-the-art models perform poorly on printed and scanned image data. We proposed our own novel model architecture, which performs  5% better than the state-of-the-art models when trained and evaluated on images from a single printer. We constructed a dataset of over 6,000 printed and scanned image blocks which we plan to release to the community for further investigation.

That current state-of-the-art models are vulnerable to printing and scanning is an important finding given the availability and ease of printing and scanning images versus constructing complex adversarial examples.

Further analysis suggest that the variability between images produced by each printer is large, significant enough for the models to easily distinguish between printers and for models trained on a single printer to generalize poorly to images from another printer. This conclusion may create additional challenges in designing models robust to printing and scanning, and sets it apart from work on creating models robust to more uniform and predictable JPEG compression. Future work may include developing methods to simulate printing and scanning in order to create a larger datasets for training the models.

Figure 4: Confusion matrix for Bayar2016 trained on Original IFSTC, evaluated on Xerox1 (see Table 1, Xerox1). We note that despite the high reported validation accuracy on the original dataset, the model struggles to distinguish between the classes after printing and scanning.
Figure 5: Confusion matrix for Bayar2016 trained on Original IFSTC (without RS and JPEG), evaluated on Xerox1 (See Table 1). We investigate the model’s performance after removing bilinear resampling and JPEG compression, but find that it still performs little better than random.

References

  • [1] M. Barni, A. Costanzo, E. Nowroozi, and B. Tondi (2018-10) Cnn-based detection of generic contrast adjustment with jpeg post-processing. 2018 25th IEEE International Conference on Image Processing (ICIP). External Links: ISBN 9781479970612, Link, Document Cited by: §1, §1, §2.4.
  • [2] B. Bayar and M. C. Stamm (2016) A deep learning approach to universal image manipulation detection using a new convolutional layer. ACM Workshop on Information Hiding and Multimedia Security, pp. 5–10. Cited by: §1, §1, §2.2, §2.4, §3.1, §3.1, §3.2, §3, §4.2, Table 2, §4.
  • [3] M. Boroumand and J. J. Fridrich (2018) Deep learning for detecting processing history of images. In Media Watermarking, Security, and Forensics, Cited by: §2.2, §3.1.
  • [4] G. Cao, Y. Zhao, R. Ni, L. Yu, and H. Tian (2010) Forensic detection of median filtering in digital images. In 2010 IEEE International Conference on Multimedia and Expo, Vol. , pp. 89–94. Cited by: §1.
  • [5] G. Cao, Y. Zhao, R. Ni, L. Yu, and H. Tian (2010) Forensic detection of median filtering in digital images. In 2010 IEEE International Conference on Multimedia and Expo, pp. 89–94. Cited by: §1.
  • [6] G. Cao, Y. Zhao, and R. Ni (2010) Edge-based blur metric for tamper detection. Journal of Information Hiding and Multimedia Signal Processing 1 (1), pp. 20–27. Cited by: §1.
  • [7] N. Carlini and D. Wagner (2017) Adversarial examples are not easily detected: bypassing ten detection methods. In

    Proceedings of the 10th ACM Workshop on Artificial Intelligence and Security

    ,
    pp. 3–14. Cited by: §1.
  • [8] J. Chen, X. Kang, Y. Liu, and Z. Wang (2015-11) Median filtering forensics based on convolutional neural networks. Signal Processing Letters, IEEE 22, pp. 1849–1853. External Links: Document Cited by: §2.4.
  • [9] J. Chen, X. Kang, Y. Liu, and Z. Wang (2015-11) Median filtering forensics based on convolutional neural networks. Signal Processing Letters, IEEE 22, pp. 1849–1853. External Links: Document Cited by: §2.2.
  • [10] Y. Chen, X. Kang, Y.Q. Shi, and Z. Wang (2019-03) A multi-purpose image forensic method using densely connected convolutional neural networks. Journal of Real-Time Image Processing 16, pp. . External Links: Document Cited by: §2.4.
  • [11] F. Chollet (2017-07) Xception: deep learning with depthwise separable convolutions. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). External Links: ISBN 9781538604571, Link, Document Cited by: §3.1, §3.3, §3.
  • [12] D. Cozzolino, G. Poggi, and L. Verdoliva (2017) Recasting residual-based local descriptors as convolutional neural networks. Proceedings of the 5th ACM Workshop on Information Hiding and Multimedia Security - IHMMSec ’17. External Links: ISBN 9781450350617, Link, Document Cited by: §2.2.
  • [13] M. Dejean-Servières, K. Desnos, K. Abdelouahab, W. Hamidouche, L. Morin, and M. Pelcat (2017) Study of the impact of standard image compression techniques on performance of image classification with a convolutional neural network. Cited by: §2, §6.1.
  • [14] H. Farid (2009) Exposing digital forgeries from jpeg ghosts. IEEE transactions on information forensics and security 4 (1), pp. 154–160. Cited by: §1, §1.
  • [15] J. Fridrich and J. Kodovsky (2012-06) Rich models for steganalysis of digital images. IEEE Transactions on Information Forensics and Security 7, pp. 868–882. External Links: Document Cited by: §2.2.
  • [16] I. J. Goodfellow, J. Shlens, and C. Szegedy (2014) Explaining and harnessing adversarial examples. arXiv preprint arXiv:1412.6572. Cited by: §1, §2.3.
  • [17] I. J. Goodfellow, J. Shlens, and C. Szegedy (2014) Explaining and harnessing adversarial examples. External Links: 1412.6572 Cited by: §2.4, §5.
  • [18] D. Gragnaniello, F. Marra, G. Poggi, and L. Verdoliva (2018-09) Analysis of adversarial attacks against cnn-based image forgery detectors. 2018 26th European Signal Processing Conference (EUSIPCO). External Links: ISBN 9789082797015, Link, Document Cited by: §1.
  • [19] D. Hsiao and S. Pei (2005) Detecting digital tampering by blur estimation. In First International Workshop on Systematic Approaches to Digital Forensic Engineering (SADFE’05), pp. 264–278. Cited by: §1.
  • [20] M. K. Johnson and H. Farid (2007) Exposing digital forgeries in complex lighting environments. IEEE Transactions on Information Forensics and Security 2 (3), pp. 450–461. Cited by: §2.
  • [21] D. Kim, H. Jang, S. Mun, S. Choi, and H. Lee (2018) Median filtered image restoration and anti-forensics using adversarial networks. IEEE Signal Processing Letters 25, pp. 278–282. Cited by: §2.4.
  • [22] Y. A. LeCun, L. Bottou, G. B. Orr, and K. Müller (2012) Efficient backprop. In Neural networks: Tricks of the trade, pp. 9–48. Cited by: §2.1.
  • [23] Y. LeCun, B. Boser, J. S. Denker, D. Henderson, R. E. Howard, W. Hubbard, and L. D. Jackel (1989) Backpropagation applied to handwritten zip code recognition. Neural computation 1 (4), pp. 541–551. Cited by: §2.1.
  • [24] A. Madry, A. Makelov, L. Schmidt, D. Tsipras, and A. Vladu (2017) Towards deep learning models resistant to adversarial attacks. arXiv preprint arXiv:1706.06083. Cited by: §2.3.
  • [25] S. Moosavi-Dezfooli, A. Fawzi, and P. Frossard (2016) Deepfool: a simple and accurate method to fool deep neural networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 2574–2582. Cited by: §2.3.
  • [26] N. Papernot, P. McDaniel, S. Jha, M. Fredrikson, Z. B. Celik, and A. Swami (2016) The limitations of deep learning in adversarial settings. In 2016 IEEE European symposium on security and privacy (EuroSP), pp. 372–387. Cited by: §2.3.
  • [27] N. Papernot, P. McDaniel, S. Jha, M. Fredrikson, Z. B. Celik, and A. Swami (2016-03) The limitations of deep learning in adversarial settings. 2016 IEEE European Symposium on Security and Privacy (EuroSP). External Links: ISBN 9781509017522, Link, Document Cited by: §2.4.
  • [28] A.C. Popescu and H. Farid (2005-02) Exposing digital forgeries by detecting traces of resampling. Trans. Sig. Proc. 53 (2), pp. 758–767. External Links: ISSN 1053-587X Cited by: §4.
  • [29] A. Popescu and H. Farid (2005-02) Exposing digital forgeries by detecting traces of re-sampling. IEEE Transactions on Signal Processing 53, pp. 758–767. External Links: Document Cited by: §1.
  • [30] Y. Rao and J. Ni (2016) A deep learning approach to detection of splicing and copy-move forgeries in images. In 2016 IEEE International Workshop on Information Forensics and Security (WIFS), Vol. , pp. 1–6. Cited by: §1.
  • [31] W. Shan, Y. Yi, J. Qiu, and A. Yin (2019-01) Robust median filtering forensics using image deblocking and filtered residual fusion. IEEE Access PP, pp. 1–1. External Links: Document Cited by: §1, §2.4.
  • [32] M. Stamm and K. R. Liu (2008) Blind forensics of contrast enhancement in digital images. In 2008 15th IEEE International Conference on Image Processing, pp. 3112–3115. Cited by: §1.
  • [33] C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich (2014) Going deeper with convolutions. External Links: 1409.4842 Cited by: §3.3.
  • [34] H. Tang, R. Ni, Y. Zhao, and X. Li (2018) Median filtering detection of small-size image based on cnn. Journal of Visual Communication and Image Representation 51, pp. 162 – 168. External Links: ISSN 1047-3203, Document, Link Cited by: §2.4.
  • [35] B. Tondi (2018-08) Pixel-domain adversarial examples against cnn-based manipulation detectors. Electronics Letters 54, pp. . External Links: Document Cited by: §2.4.
  • [36] M. B. K. K. E. N. B. Tondi (2020) On the transferability of adversarial examples against cnn-based image forensics. External Links: Link Cited by: §2.4.
  • [37] A. Tuama, F. Comby, and M. Chaumont (2016) Camera model identification with the use of deep convolutional neural networks. In 2016 IEEE International workshop on information forensics and security (WIFS), pp. 1–6. Cited by: §1, §6.4.
  • [38] P. Yang, R. Ni, Y. Zhao, G. Cao, H. Wu, and W. Zhao (2018) Robust contrast enhancement forensics using convolutional neural networks. CoRR abs/1803.04749. External Links: Link, 1803.04749 Cited by: §1.
  • [39] H. Zhao, J. Shi, X. Qi, X. Wang, and J. Jia (2017-07) Pyramid scene parsing network. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). External Links: ISBN 9781538604571, Link, Document Cited by: §5.