Domain Generalization for Document Authentication against Practical Recapturing Attacks

by   Changsheng Chen, et al.
Shenzhen University

Recapturing attack can be employed as a simple but effective anti-forensic tool for digital document images. Inspired by the document inspection process that compares a questioned document against a reference sample, we proposed a document recapture detection scheme by employing Siamese network to compare and extract distinct features in a recapture document image. The proposed algorithm takes advantages of both metric learning and image forensic techniques. Instead of adopting Euclidean distance-based loss function, we integrate the forensic similarity function with a triplet loss and a normalized softmax loss. After training with the proposed triplet selection strategy, the resulting feature embedding clusters the genuine samples near the reference while pushes the recaptured samples apart. In the experiment, we consider practical domain generalization problems, such as the variations in printing/imaging devices, substrates, recapturing channels, and document types. To evaluate the robustness of different approaches, we benchmark some popular off-the-shelf machine learning-based approaches, a state-of-the-art document image detection scheme, and the proposed schemes with different network backbones under various experimental protocols. Experimental results show that the proposed schemes with different network backbones have consistently outperformed the state-of-the-art approaches under different experimental settings. Specifically, under the most challenging scenario in our experiment, i.e., evaluation across different types of documents that produced by different devices, we have achieved less than 5.00 Classification Error Rate) and 5.56 Classification Error Rate) by the proposed network with ResNeXt101 backbone at 5



There are no comments yet.


page 2

page 3


A Robust Document Image Watermarking Scheme using Deep Neural Network

Watermarking is an important copyright protection technology which gener...

Deep Metric Learning-Based Semi-Supervised Regression With Alternate Learning

This paper introduces a novel deep metric learning-based semi-supervised...

Deep Learning-based Forgery Attack on Document Images

With the ongoing popularization of online services, the digital document...

Scale Invariant Domain Generalization Image Recapture Detection

Recapturing and rebroadcasting of images are common attack methods in in...

Evaluation of Neural Network Classification Systems on Document Stream

One major drawback of state of the art Neural Networks (NN)-based approa...

Light-weight Document Image Cleanup using Perceptual Loss

Smartphones have enabled effortless capturing and sharing of documents i...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Authentication of hardcopy document with digitally acquired document image is a forensic research topic with broad interest. Due to the COVID-19 pandemic, we have observed an unprecedented demand of online document authentication in the e-commerce and e-government applications. Some important document images have been uploaded to online platforms for various purpose. However, the security loophole in the exiting authentication scheme has put our system at risk. As shown in Fig. 1 (a), some texts have been added on the identity document (ID) image to prevent any illegally usage. However, the image can be tampered with photo editing software. To cover the editing trace, the edited ID image is reacquired through a print-and-scan cycle. The resulting image in Fig. 1

 (b) is therefore more realistic than the one edited in digital domain. It should be noted that such recapture attack on ID images can also be launched for other important documents, such as business licenses and certificates. The attack with recaptured images have posed a new thread to our document authentication system. Worse still, as the rapid advancement of deep learning-based techniques, there are some recent works on editing characters and words in document images with convolutional neural networks

wu2019editing; roy2020stefann; yang2020swaptext in an end-to-end fashion.

(a) (b)
Figure 1: An example of illegal use of an identity document (ID) image. (a) An authentic ID image with texts to prevent illegal usage. (b) An tampered ID image obtained from (a) with print-and-scan operation. The printing and scanning devices in generating (b) are Epson L805 and Kyocera M2530dn, respectively.

Exiting document authentication techniques with digital image has found its applications in various field. These techniques can be divided into active and passive categories. As an example of the active forensic techniques, the digital watermarking techniques cox2007digital can be applied to a certificate as protection against illegal alternations or re-acquisition. However, the active techniques require controls over the document generation process, which limits its application to documents from various parties. In contrary, the passive techniques do not have such requirement. For instance, a questioned document image can be examined through the some inherent characteristics of the printing and acquisition process chiang2009printer; mayer2020forensic for tampering detection. However, the existing passive forensic techniques on digital images have not considered a low-cost and popular attack, i.e., the recapture attack. Under such an attack, the original or tampered image of a given document is output with a printed and re-acquired with an imaging device to generated a recaptured version of the image. It should be noted that the recaptured document image has been through a complete image acquisition chain, and no post-processing (or forgery) is carried out after the acquisition steps. By definition, the image will be considered as an original copy by the existing passive tampering detection techniques. To fight against such attacks, the image recapturing detection has attracted global research attentions. However, most of the exiting recapture detection schemes focus on natural images, only a few works consider the recapturing attack on hardcopy documents. Moreover, to the best of our knowledge, there is currently no research on detecting recaptured document images forged with the latest deep learning-based approaches wu2019editing; roy2020stefann; yang2020swaptext.

In this work, we aim at evaluating the difficulties in the problem of recaptured document detection. To investigate the problem of recaptured document detection, a high quality recaptured document image database is established. The dataset consists of 1104 document images (including 132 captured document images and 972 re-captured document images) by 14 different devices combinations. It should be noted that these two parts involves two sets of different devices. To evaluate the performance of exiting machine learning-based classifier, a generic framework for document spoofing detection with image-based features extracted from both deep learning-based and handcrafted descriptors is considered. The effectiveness of this detection framework is evaluated under both intra-dataset and cross-dataset experiment protocols with our database. Experimental results reveal the risks of existing document recapture detection algorithms under uncontrolled application scenarios.

2 A High Quality Captured and Recaptured Image Database of Identity Documents

Figure 2: The block diagram of collecting genuine document images, recaptured document images and the forge-and-recapture document images.

To investigate the problem of document recapturing detection, a high quality database consists of captured and recaptured document images is needed. First and foremost, the content of document should be chosen carefully. Some legal documents (e.g., passport, ID card, certificate) which contain sensitive privacy, are not suitable to be shared publicly. Student ID cards from 5 universities are synthesized with Adobe CorelDRAW and thus serve as the original document images in our experiment.

Figure 3: The original ID images synthesized with Adobe CorelDRAW for our experiment.

Our database contains two datasets. Dataset I collects 1104 document images which are captured or re-captured by 14 different combinations of devices. As shown in Fig. 2, the original document is printed by an authorized party to generate the genuine document, which are then scanned/captured to yield the captured document images. To collect the recaptured document images, the copied document images are print and reacquired (by scanner or camera). Dataset II follows the same data collection procedure but with a different set of devices. As shown in Table 1, we have employed 4 phones, 3 scanners and 1 printer in collecting dataset I, while 2 phones (including a high quality camera phone, Oppo Reno with resolution of 48 MP), 2 scanners (including a high-end scanner, Epson V850 with optical resolution of 6400 DPI), and 2 printers (including a high-end printer, Epson L805 with resolution of 5760 1440 DPI). Thus, dataset II have considered the attacks with very high quality devices.

To collect a high quality dataset, there are a few thumb of rules in our experiment.

  • Camera Phone: set to the highest supported resolution, and the captured images are saved in JPEG format with the highest quality factor;

  • Illuminance: the environmental light is controlled by a lamp to avoid introducing distortions (such as, shadowing, geometric distortion, defocusing);

  • Scanner: set to the resolution of 1200 DPI, except the Epson V850 remains at its default value of 3200 dpi;

  • Printer: set to color mode with the finest printing resolution;

  • Printing substrate: paper with 120 g/m.

Set 1st Imaging Device Printer 2nd Imaging Device
Phone XiaoMi 8 Phone XiaoMi 8
RedMi Note 5 RedMi Note 5
Huawei P9 HP Huawei P9
Apple iPhone 6 OfficeJet Apple iPhone 6
Scanner Brother DCP-1519 258 Scanner Brother DCP-1519
Epson V330 Epson V330
Benq K810 Benq K810
Phone Apple iPhone 6s HP LJ Phone Apple iPhone 6s
Oppo Reno m176n Oppo Reno
Scanner Epson V850 Epson Scanner Epson V850
HP Laserjet m176n L805 HP Laserjet m176n
Table 1: The devices used for collecting dataset I and II.

3 Experimental Results

To study challenges of the forge-and-recapture attacks, we construct a generic framework for document spoofing detection with a single image by following the network architecture in agarwal2018diverse. Some popular CNNs including ResNet 34/50/101/152 he2016deep, ResNeXt 50/101 xie2017aggregated, DenseNet 121/169/201 huang2017densely, VGG 16/19 simonyan2014very, MobileNet howard2017mobilenets, Inception V3 szegedy2016rethinking

are considered in our experiment. These CNNs serve as feature extractors in our recapture detection framework. The pre-trained (with ImageNet database) models of these CNN are adopted and frozen, and only the parameters in fully connected (FC) layer of our framework are trainable. The dimensions of both FC layers are 256. The batch size is set to 128. The learning rate is initially set to

and the number of iterations is 20 epochs. Cross entropy loss and the Adam optimizer are chosen in our implementation. This generic framework is coded with Tensorflow 1.13.1 and Pytorch 1.10 and is run on a NVidia 2080Ti GPU.

Referring to the literatures of face spoofing detection, there are some prior works tirunagari2015detection; boulkenafet2016face; patel2016secure that detects spoofing face images without using depth information. Within these works, the local binary pattern (LBP) ojala2002multiresolution descriptor has been included as a benchmarking feature. To allow a more complete picture of the performances of different features, LBP with SVM (both linear and RBF kernel) is chosen as a representative machine learning scheme with handcrafted features in our recapture detection framework.

Figure 4: The generic framework based on CNNs for document spoofing detection with a single image.
Methods (8:1:1) (8:1:1) (8:1:1)
LBP+SVM (Linear) 0.2143 0.8782 0.0001 1.0000 0.3363 0.7485 0.4242 0.6745 0.2308 0.8204
LBP+SVM (RBF) 0.2857 0.8535 0.1250 0.9615 0.3363 0.7685 0.4478 0.6260 0.2500 0.8145
DenseNet121 0.0001 1.0000 0.0001 1.0000 0.0001 1.0000 0.1548 0.9222 0.0001 1.0000
DenseNet169 0.0001 1.0000 0.0001 1.0000 0.1429 0.9387 0.2165 0.8529 0.0001 1.0000
DenseNet201 0.0001 1.0000 0.0001 1.0000 0.0476 0.9866 0.2083 0.8605 0.0001 1.0000
MobileNet 0.0293 0.9974 0.0001 1.0000 0.3373 0.7811 0.2440 0.8017 0.1268 0.9234
ResNeXt101 0.0001 1.0000 0.0001 1.0000 0.0555 0.9897 0.2024 0.8753 0.0001 1.0000
ResNeXt50 0.0001 1.0000 0.0001 1.0000 0.0039 0.9998 0.2529 0.8252 0.0001 1.0000
ResNet152 0.0001 1.0000 0.0001 1.0000 0.0357 0.9957 0.2440 0.8538 0.0001 1.0000
ResNet101 0.0001 1.0000 0.0001 1.0000 0.0001 1.0000 0.2202 0.8703 0.0001 1.0000
ResNet50 0.0001 1.0000 0.0001 1.0000 0.0238 0.9987 0.2232 0.8612 0.0001 1.0000
ResNet34 0.0001 1.0000 0.0001 1.0000 0.0953 0.9774 0.2687 0.8309 0.0001 1.0000
VGG16 0.0294 0.9991 0.0001 1.0000 0.2221 0.8785 0.1696 0.9077 0.1267 0.9471
VGG19 0.0588 0.9948 0.0001 1.0000 0.2499 0.8460 0.1458 0.9122 0.1208 0.9090
InceptionV3 0.0001 1.0000 0.0001 1.0000 0.0426 0.9941 0.1696 0.9140 0.0244 0.9973
Table 2: Experimental result in Dataset . and stands for the subsets of samples captured (in the last imaging process) by mobile phones and scanners, respectively. The notation 8:1:1 means that the samples (in , or ) are divided into 80%, 10% and 10% for the training, validation and testing sets, respectively.

As shown in Table 2, performances of the generic recapture detection framework are satisfactory in the scenarios where training and testing data is sampled from the same subset. For example, a majority of the CNN-based schemes achieve EER = 0.0001 and AUC = 1.0000 in the experiment conducted within and , respectively. It should also be noted that the LBP-based classifiers perform less accurately compared to the CNN-based approaches.

However, the recapture detection performances degrade significantly when the training and testing data is inhomogeneous. Such degradation is shown in the and experimental conditions in Table 2, as well as the and experimental conditions in Table 3.

LBP+SVM (Linear) 0.3333 0.7934 0.3939 0.6157
LBP+SVM (RBF) 0.3157 0.7509 0.4149 0.6615
DenseNet121 0.1250 0.9378 0.0561 0.9844
DenseNet169 0.1536 0.9130 0.0714 0.9822
DenseNet201 0.2031 0.9024 0.1139 0.9538
MobileNet 0.2500 0.7953 0.3809 0.6732
ResNeXt101 0.1094 0.9655 0.0680 0.9878
ResNeXt50 0.2057 0.8514 0.1054 0.9689
ResNet152 0.2552 0.7910 0.0374 0.9946
ResNet101 0.2318 0.8550 0.0527 0.9905
ResNet50 0.1172 0.9463 0.0867 0.9740
ResNet34 0.1666 0.8831 0.1190 0.9532
VGG16 0.2499 0.8227 0.2772 0.7627
VGG19 0.2499 0.8914 0.3793 0.6270
InceptionV3 0.1979 0.8914 0.1246 0.9398
Table 3: Cross dataset evaluation in and .