1 Introduction
Authentication of hardcopy document with digitally acquired document image is a forensic research topic with broad interest. Due to the COVID-19 pandemic, we have observed an unprecedented demand of online document authentication in the e-commerce and e-government applications. Some important document images have been uploaded to online platforms for various purpose. However, the security loophole in the exiting authentication scheme has put our system at risk. As shown in Fig. 1 (a), some texts have been added on the identity document (ID) image to prevent any illegally usage. However, the image can be tampered with photo editing software. To cover the editing trace, the edited ID image is reacquired through a print-and-scan cycle. The resulting image in Fig. 1
(b) is therefore more realistic than the one edited in digital domain. It should be noted that such recapture attack on ID images can also be launched for other important documents, such as business licenses and certificates. The attack with recaptured images have posed a new thread to our document authentication system. Worse still, as the rapid advancement of deep learning-based techniques, there are some recent works on editing characters and words in document images with convolutional neural networks
wu2019editing; roy2020stefann; yang2020swaptext in an end-to-end fashion.

Exiting document authentication techniques with digital image has found its applications in various field. These techniques can be divided into active and passive categories. As an example of the active forensic techniques, the digital watermarking techniques cox2007digital can be applied to a certificate as protection against illegal alternations or re-acquisition. However, the active techniques require controls over the document generation process, which limits its application to documents from various parties. In contrary, the passive techniques do not have such requirement. For instance, a questioned document image can be examined through the some inherent characteristics of the printing and acquisition process chiang2009printer; mayer2020forensic for tampering detection. However, the existing passive forensic techniques on digital images have not considered a low-cost and popular attack, i.e., the recapture attack. Under such an attack, the original or tampered image of a given document is output with a printed and re-acquired with an imaging device to generated a recaptured version of the image. It should be noted that the recaptured document image has been through a complete image acquisition chain, and no post-processing (or forgery) is carried out after the acquisition steps. By definition, the image will be considered as an original copy by the existing passive tampering detection techniques. To fight against such attacks, the image recapturing detection has attracted global research attentions. However, most of the exiting recapture detection schemes focus on natural images, only a few works consider the recapturing attack on hardcopy documents. Moreover, to the best of our knowledge, there is currently no research on detecting recaptured document images forged with the latest deep learning-based approaches wu2019editing; roy2020stefann; yang2020swaptext.
In this work, we aim at evaluating the difficulties in the problem of recaptured document detection. To investigate the problem of recaptured document detection, a high quality recaptured document image database is established. The dataset consists of 1104 document images (including 132 captured document images and 972 re-captured document images) by 14 different devices combinations. It should be noted that these two parts involves two sets of different devices. To evaluate the performance of exiting machine learning-based classifier, a generic framework for document spoofing detection with image-based features extracted from both deep learning-based and handcrafted descriptors is considered. The effectiveness of this detection framework is evaluated under both intra-dataset and cross-dataset experiment protocols with our database. Experimental results reveal the risks of existing document recapture detection algorithms under uncontrolled application scenarios.
2 A High Quality Captured and Recaptured Image Database of Identity Documents
To investigate the problem of document recapturing detection, a high quality database consists of captured and recaptured document images is needed. First and foremost, the content of document should be chosen carefully. Some legal documents (e.g., passport, ID card, certificate) which contain sensitive privacy, are not suitable to be shared publicly. Student ID cards from 5 universities are synthesized with Adobe CorelDRAW and thus serve as the original document images in our experiment.
Our database contains two datasets. Dataset I collects 1104 document images which are captured or re-captured by 14 different combinations of devices. As shown in Fig. 2, the original document is printed by an authorized party to generate the genuine document, which are then scanned/captured to yield the captured document images. To collect the recaptured document images, the copied document images are print and reacquired (by scanner or camera). Dataset II follows the same data collection procedure but with a different set of devices. As shown in Table 1, we have employed 4 phones, 3 scanners and 1 printer in collecting dataset I, while 2 phones (including a high quality camera phone, Oppo Reno with resolution of 48 MP), 2 scanners (including a high-end scanner, Epson V850 with optical resolution of 6400 DPI), and 2 printers (including a high-end printer, Epson L805 with resolution of 5760 1440 DPI). Thus, dataset II have considered the attacks with very high quality devices.
To collect a high quality dataset, there are a few thumb of rules in our experiment.
-
Camera Phone: set to the highest supported resolution, and the captured images are saved in JPEG format with the highest quality factor;
-
Illuminance: the environmental light is controlled by a lamp to avoid introducing distortions (such as, shadowing, geometric distortion, defocusing);
-
Scanner: set to the resolution of 1200 DPI, except the Epson V850 remains at its default value of 3200 dpi;
-
Printer: set to color mode with the finest printing resolution;
-
Printing substrate: paper with 120 g/m.
Set | 1st Imaging Device | Printer | 2nd Imaging Device | ||
Phone | XiaoMi 8 | Phone | XiaoMi 8 | ||
RedMi Note 5 | RedMi Note 5 | ||||
Huawei P9 | HP | Huawei P9 | |||
Apple iPhone 6 | OfficeJet | Apple iPhone 6 | |||
Scanner | Brother DCP-1519 | 258 | Scanner | Brother DCP-1519 | |
Epson V330 | Epson V330 | ||||
Benq K810 | Benq K810 | ||||
Phone | Apple iPhone 6s | HP LJ | Phone | Apple iPhone 6s | |
Oppo Reno | m176n | Oppo Reno | |||
Scanner | Epson V850 | Epson | Scanner | Epson V850 | |
HP Laserjet m176n | L805 | HP Laserjet m176n |
3 Experimental Results
To study challenges of the forge-and-recapture attacks, we construct a generic framework for document spoofing detection with a single image by following the network architecture in agarwal2018diverse. Some popular CNNs including ResNet 34/50/101/152 he2016deep, ResNeXt 50/101 xie2017aggregated, DenseNet 121/169/201 huang2017densely, VGG 16/19 simonyan2014very, MobileNet howard2017mobilenets, Inception V3 szegedy2016rethinking
are considered in our experiment. These CNNs serve as feature extractors in our recapture detection framework. The pre-trained (with ImageNet database) models of these CNN are adopted and frozen, and only the parameters in fully connected (FC) layer of our framework are trainable. The dimensions of both FC layers are 256. The batch size is set to 128. The learning rate is initially set to
and the number of iterations is 20 epochs. Cross entropy loss and the Adam optimizer are chosen in our implementation. This generic framework is coded with Tensorflow 1.13.1 and Pytorch 1.10 and is run on a NVidia 2080Ti GPU.
Referring to the literatures of face spoofing detection, there are some prior works tirunagari2015detection; boulkenafet2016face; patel2016secure that detects spoofing face images without using depth information. Within these works, the local binary pattern (LBP) ojala2002multiresolution descriptor has been included as a benchmarking feature. To allow a more complete picture of the performances of different features, LBP with SVM (both linear and RBF kernel) is chosen as a representative machine learning scheme with handcrafted features in our recapture detection framework.
Methods | (8:1:1) | (8:1:1) | (8:1:1) | |||||||
EER | AUC | EER | AUC | EER | AUC | EER | AUC | EER | AUC | |
LBP+SVM (Linear) | 0.2143 | 0.8782 | 0.0001 | 1.0000 | 0.3363 | 0.7485 | 0.4242 | 0.6745 | 0.2308 | 0.8204 |
LBP+SVM (RBF) | 0.2857 | 0.8535 | 0.1250 | 0.9615 | 0.3363 | 0.7685 | 0.4478 | 0.6260 | 0.2500 | 0.8145 |
DenseNet121 | 0.0001 | 1.0000 | 0.0001 | 1.0000 | 0.0001 | 1.0000 | 0.1548 | 0.9222 | 0.0001 | 1.0000 |
DenseNet169 | 0.0001 | 1.0000 | 0.0001 | 1.0000 | 0.1429 | 0.9387 | 0.2165 | 0.8529 | 0.0001 | 1.0000 |
DenseNet201 | 0.0001 | 1.0000 | 0.0001 | 1.0000 | 0.0476 | 0.9866 | 0.2083 | 0.8605 | 0.0001 | 1.0000 |
MobileNet | 0.0293 | 0.9974 | 0.0001 | 1.0000 | 0.3373 | 0.7811 | 0.2440 | 0.8017 | 0.1268 | 0.9234 |
ResNeXt101 | 0.0001 | 1.0000 | 0.0001 | 1.0000 | 0.0555 | 0.9897 | 0.2024 | 0.8753 | 0.0001 | 1.0000 |
ResNeXt50 | 0.0001 | 1.0000 | 0.0001 | 1.0000 | 0.0039 | 0.9998 | 0.2529 | 0.8252 | 0.0001 | 1.0000 |
ResNet152 | 0.0001 | 1.0000 | 0.0001 | 1.0000 | 0.0357 | 0.9957 | 0.2440 | 0.8538 | 0.0001 | 1.0000 |
ResNet101 | 0.0001 | 1.0000 | 0.0001 | 1.0000 | 0.0001 | 1.0000 | 0.2202 | 0.8703 | 0.0001 | 1.0000 |
ResNet50 | 0.0001 | 1.0000 | 0.0001 | 1.0000 | 0.0238 | 0.9987 | 0.2232 | 0.8612 | 0.0001 | 1.0000 |
ResNet34 | 0.0001 | 1.0000 | 0.0001 | 1.0000 | 0.0953 | 0.9774 | 0.2687 | 0.8309 | 0.0001 | 1.0000 |
VGG16 | 0.0294 | 0.9991 | 0.0001 | 1.0000 | 0.2221 | 0.8785 | 0.1696 | 0.9077 | 0.1267 | 0.9471 |
VGG19 | 0.0588 | 0.9948 | 0.0001 | 1.0000 | 0.2499 | 0.8460 | 0.1458 | 0.9122 | 0.1208 | 0.9090 |
InceptionV3 | 0.0001 | 1.0000 | 0.0001 | 1.0000 | 0.0426 | 0.9941 | 0.1696 | 0.9140 | 0.0244 | 0.9973 |
As shown in Table 2, performances of the generic recapture detection framework are satisfactory in the scenarios where training and testing data is sampled from the same subset. For example, a majority of the CNN-based schemes achieve EER = 0.0001 and AUC = 1.0000 in the experiment conducted within and , respectively. It should also be noted that the LBP-based classifiers perform less accurately compared to the CNN-based approaches.
However, the recapture detection performances degrade significantly when the training and testing data is inhomogeneous. Such degradation is shown in the and experimental conditions in Table 2, as well as the and experimental conditions in Table 3.
Methods | ||||
EER | AUC | EER | AUC | |
LBP+SVM (Linear) | 0.3333 | 0.7934 | 0.3939 | 0.6157 |
LBP+SVM (RBF) | 0.3157 | 0.7509 | 0.4149 | 0.6615 |
DenseNet121 | 0.1250 | 0.9378 | 0.0561 | 0.9844 |
DenseNet169 | 0.1536 | 0.9130 | 0.0714 | 0.9822 |
DenseNet201 | 0.2031 | 0.9024 | 0.1139 | 0.9538 |
MobileNet | 0.2500 | 0.7953 | 0.3809 | 0.6732 |
ResNeXt101 | 0.1094 | 0.9655 | 0.0680 | 0.9878 |
ResNeXt50 | 0.2057 | 0.8514 | 0.1054 | 0.9689 |
ResNet152 | 0.2552 | 0.7910 | 0.0374 | 0.9946 |
ResNet101 | 0.2318 | 0.8550 | 0.0527 | 0.9905 |
ResNet50 | 0.1172 | 0.9463 | 0.0867 | 0.9740 |
ResNet34 | 0.1666 | 0.8831 | 0.1190 | 0.9532 |
VGG16 | 0.2499 | 0.8227 | 0.2772 | 0.7627 |
VGG19 | 0.2499 | 0.8914 | 0.3793 | 0.6270 |
InceptionV3 | 0.1979 | 0.8914 | 0.1246 | 0.9398 |
Comments
There are no comments yet.