1 Introduction
Digital images can now be easily obtained by cameras and distributed over the Internet. Modifications including recapturing are direct threats to image credibility at present. Thus, images as evidence require rigorous validation on the originality to be dependable testimonies [2]. According to [4], human beings have difficulty discminating between recaptured and original images. To this end, recapture detection forensics are required to exclude recapture frauds.

In recapture detection tasks, wavelets statistical distributions [7, 2, 21], texture distribution [4] are exploited as detection features. According to [13], texture distribution is considered to be a solid method in this task. Physical traits, such as specularity, blurriness and chromaticity [15, 9, 14] are also considered as an effective discriminative cue. DCT coefficients [24, 26], quality [27] and gray level co-occurance matrix (GLCM) [3]
are also modeled. Besides, neural network methods
[25, 13, 1, 5] are proposed to further enhance the detection accuracy. However, after compression on a recaptured image, the consequent deformation of texture feature patterns decreases the classification accuracy [17, 10, 19]. Besides, in order to build a robust recapture detection system, different datasets are collected for model training, which causes domain shift effects on properties of input images, including scale, illumination and color [23]. Therefore, the distribution biases introduced by the dataset collection process is a practical challenge in recapture detection tasks. All of the above methods achieve successful performance only on single domain scenarios. Therefore, as is illustrated in Figure 1, cross domain recapture detection task is proposed to learn a shared feature space which preserves the most generalized classification cues and is robust with intra-domain and inter-domain scale variances.Domain generalization (DG) methods are direct solutions for this task. Here a domain = {, } is defined by a feature space and a marginal distribution [18], and each single database is considered an independent domain in this paper. In a similar task, face-antispoofing, DG methods such as MADDG [20] and SSDG [12] achieve promising performance in multi-domain scenarios. However, these DG methods are highly customized and consequently not applicable for recapture detection. Firstly, the recaptured images are only obtained from screens, thus recaptured features can be aggregated in feature space across different domains. Furthermore, the global relationship [6] such as scale variances can be exploited to enhance the discriminability of the representations.
In this paper, as is illustrated in the left part of Figure 1, the shared feature space is learned from source domains. This paper makes the following contributions: (1) We introduce a competition between feature generator and domain discriminator for domain generalization; (2) In training phase the scale alignment(SA) operations are performed in each category across all source domains to aggregate the embedded features of different scale levels but the same capture category; (3) To improve the local representation compactness and further enhance the discriminability of generalized features, a triplet mining strategy is incorporated in the framework.
2 Method
2.1 Overview
This framework consists of three modules, i.e.(1) a domain discriminator competing with feature generator; (2) a global scale alignment loss on the classification outputs of large and small scales, alongside the cross-entropy loss; (3) a triplet loss applied to the feature space as a local constraint. Details are described in the following sections.

2.2 Adversarial Learning
Suppose we have N domains = {, , …, } and corresponding labels = {, , …, }. There are C = 2 categories in each domain, where Y = 0/1 represents single capture/recapture. Our goal is to generalize from and to unseen target domain . Here labels in target domain are not necessary for practical purposes. In recapture detection tasks, we postulate that common discriminative cues exist in both categories in sight of the identical nature of data collection in each domain and each class. To this end, we introduce adversarial learning method to the embedded feature space to exploit generalized differentiation information and minimize distribution bias of any specific source domain.
For a feature generator G, network input (recaptured images) and (single captured images) are transformed into embedded features and :
(1) |
A domain discriminator D is applied on and to determine their corresponding source domain:
(2) |
There is a competition between domain discriminator D and feature generator G, where G is trained to fool D
to make domain label indistinguishable from the shared discriminative feature space. Domain discriminator and feature generator are trained simultaneously and adversarially across all of source domains and categories in the training phase. Furthermore, in order to optimize domain discriminator and feature generator in the same backpropagation step, a gradient reverse layer(GRL)
[12, 8] is inserted between them. The task of domain discriminator is effectively a multiclass classification, thus we utilize cross-entropy loss to measure the performance of G and D:(3) |
where is a set of domain labels.
2.3 Scale Alignment Clustering
Images from different domains or even the same domain have different scales [17], which adversely affects the generalization performance. Inspired by the global class alignment objective in MASF [6], we propose to introduce a scale alignment objective to the distribution of classification outputs and structure the feature space by an explicit regularization. Our preliminary experiments demonstrated that scale relationship is better represented in the classification outputs than in the feature space, therefore, scale alignment is performed on task network outputs, which is different from [6]. For each class c, the concept of scale is modeled by for large scale and for small scale:
(4) |
where T and G represents task network and feature generator, and j is the index of the image from the synthesized training dataset. We define scale alignment loss on the distribution of and :
(5) |
where and symmetric Kullback-Leibler (KL) divergence is . The discrepency between distributions of and are measured by symmetric KL divergence across two classes.
2.4 Triplet Mining
To hold local feature compactness [6] in the feature space, we insert a triplet loss to aggregate intra-class samples and separate inter-class samples for a clearer decision boundary. Triplet mining is also introduced by Jia et al. [12] and Shao et al. [20] with a view to structure the feature space. However, in recapture detection context, images are recaptured only from screens, thus in contrast we enforce feature compactness in both classes regardless of domain or scale. Specifically, we assume there are three source domains in training phase. In a triplet mining procedure, recaptured and single captured images are recollected from all source domains.
The two objectives in triplet mining are: pull apart recapture samples from single capture samples, and aggregate each class respectively. In the backpropagation step, feature generator G is optimized by:
(6) |
where G denotes the feature generator, superscripts a and n represents different classes and a and p samples stem from the same class. Subscripts i and j indicates there is no restriction on domain or scale of samples. represents a pre-defined positive parameter. Finally, samples from different domains or scales but the same category are forced to be more compact in the feature space.
2.5 Scale Invariant Domain Generalization
We formulate the integrated framework into an optimization objective as follow:
(7) |
where
is a task-specific loss function. Because recapture detection is a classification task, the framework is optimized by cross-entropy loss, denoted by
. , and are pre-defined parameters to balance four losses. As is illustrated in Figure 2, this framework is trained in an end-to-end manner in the training phase. After training, G achieves a more generalized feature space, which is robust with domain shift and scale variances.3 Experiments
3.1 Experimentsal Settings
3.1.1 Dataset
Four recapture detection datasets are collected to simulate real-life scenarios and evaluate our proposed method against other baseline methods. Our selected datasets are BJTU-IIS [15] (B for short), ICL-COMMSP [22, 16] (I for short), mturk [1] (M for short) and NTU-ROSE [4, 11] (N for short), and the number of recapture/single capture images and recapture devices are shown in Table 1.
Dataset | Recapture Device Count | Recaptured Image Count | Single Captured Image Count |
---|---|---|---|
B | 2 | 706 | 636 |
I | 8 | 1440 | 905 |
M | 119 | 1369 | 1368 |
N | 9 | 2776 | 2712 |
These datasets are collected for different purposes, specifically, B is for evaluation on high resolution images; images in I are controlled by distance between camera and screen; images in N are captured in a lighting controlled room; and for M, the aim was to crowd-source the collection of images. The contents, illumination, scales and resolution of pictures are different across datasets or within a single dataset.
3.1.2 Implementation Details
Our work is implemented using Pytorch as a framework. The images are cropped
off, in a RGB color space. ResNet-18 is exploited as backbone of feature generator. In order to achieve a better generalization performance, the Adam optimizer learning rate is set to 1e-4, and batch size is 8 for each domain. So the total batch size is 24 in a 3 source domains case, and 16 in a limited source experiment. We set the hyperparameters
, and in Equation 7 to be 0.1, 0.2 and 0.1, respectively. Each time one domain is chosen as target, and remaining three domains are source domains. Thus, there are four experimental tasks in total.3.2 Experimental Comparison
![]() |
![]() |
3.2.1 Baseline Methods
We compare our proposed SADG framework with several state-of-the-art recapture detection methods, multi-scale learning methods and domain generalization algorithms: Multi-scale LBP(MS-LBP) [4], Choi-CNN [5], Multi-scale CNN (MS-CNN) [17], mturk [1], MADDG [20], SSDG [12]
. The MS-LBP and MS-CNN are multi-scale methods, Choi-CNN and mturk are deep learning methods for recapture detection. MADDG and SSDG are two algorithms for face anti-spoofing task, and we compare these two methods with SADG because no recapture detection methods pay attention to domain generalization to our best knowledge.
3.2.2 Comparison Results
As is shown in Table 2 and 3, our algorithm outperforms all compared state-of-the-art methods except the HTER in experiment I&M&N to B, with only 0.01% behind SSDG. By subsampling, the scale variances are amplified in target domains. From these two tables, we can see that scale variances are crucial to detection performance. In the second column of Table 1, the recapture device counts are significantly different. Furthermore, the variation of scale and resolution are larger in M than in B, and thus all of the methods performs better in experiment I&M&N to B than in experiment B&I&N to M, but SADG is better than all other methods, which demonstrates the effectiveness of our scale alignment loss and domain generalization strategy. As is shown in Figure 3 and Figure 4, when compared with traditional methods, the proposed method performs better because other methods pay no attention on domain shift and cannot achieve a generalized feature space. Moreover, although SSDG and MADDG are domain generalization methods, neither of them focuses on the scale variances, either within a single domain or among different domains. The domain shift affects introduced by scale variances can be addressed by adversarial learning, but scale variances also exists in a single domain. This was resolved by global alignment and clustering operations of SADG in a pairwise manner.
![]() |
![]() |
Method | B&I&M to N | B&I&N to M | B&M&N to I | I&M&N to B | ||||
---|---|---|---|---|---|---|---|---|
HTER(%) | AUC(%) | HTER(%) | AUC(%) | HTER(%) | AUC(%) | HTER(%) | AUC(%) | |
MS-LBP [4] | 33.07 | 69.85 | 39.50 | 65.09 | 35.86 | 81.08 | 20.56 | 87.25 |
Choi-CNN [5] | 24.87 | 73.60 | 47.70 | 71.60 | 37.99 | 87.20 | 30.39 | 73.47 |
MS-CNN [17] | 32.50 | 74.43 | 28.59 | 70.90 | 38.91 | 72.35 | 15.00 | 85.72 |
mturk [1] | 32.81 | 74.41 | 35.16 | 69.32 | 36.88 | 73.72 | 18.25 | 85.88 |
MADDG [20] | 19.74 | 88.39 | 26.15 | 81.72 | 20.40 | 87.64 | 18.25 | 89.40 |
SSDG [12] | 20.06 | 88.41 | 22.37 | 83.67 | 18.43 | 90.59 | 15.12 | 90.48 |
Ours(SADG) | 15.95 | 90.28 | 22.20 | 85.60 | 15.93 | 92.03 | 15.13 | 91.65 |
Method | B&I&M to N | B&I&N to M | B&M&N to I | I&M&N to B | ||||
---|---|---|---|---|---|---|---|---|
HTER(%) | AUC(%) | HTER(%) | AUC(%) | HTER(%) | AUC(%) | HTER(%) | AUC(%) | |
MADDG [20] | 18.75 | 90.06 | 10.85 | 96.09 | 14.64 | 93.32 | 12.99 | 94.17 |
SSDG [12] | 16.61 | 91.58 | 12.66 | 94.93 | 12.17 | 93.84 | 11.02 | 94.85 |
Ours(SADG) | 16.45 | 91.42 | 10.03 | 96.14 | 11.18 | 95.28 | 8.55 | 94.85 |
3.3 Discussion
3.3.1 Ablation Study
Method | B&I&M to N | B&I&N to M | B&M&N to I | I&M&N to B | ||||
---|---|---|---|---|---|---|---|---|
HTER(%) | AUC(%) | HTER(%) | AUC(%) | HTER(%) | AUC(%) | HTER(%) | AUC(%) | |
SADG wo/ad | 21.05 | 87.73 | 25.49 | 82.16 | 23.87 | 83.23 | 15.46 | 92.66 |
SADG wo/trip | 24.18 | 83.17 | 25.16 | 82.92 | 23.84 | 85.58 | 16.77 | 91.63 |
SADG wo/sa | 18.75 | 88.27 | 23.25 | 84.15 | 17.59 | 90.69 | 15.46 | 91.88 |
Ours(SADG) | 15.95 | 90.28 | 22.20 | 85.60 | 15.93 | 92.03 | 15.13 | 91.65 |
Because every component, adversarial learning, triplet mining and scale alignment clustering in SADG framework are independent from domain settings, we conduct ablation study on the aforementioned four sets of domain generalization experiments, eliminating effects of one component each time. SADG wo/ad denotes the SADG framework without adversarial learning, where we disengage GRL and domain discriminator from the backpropagation procedure. This specific network does not explicitly exploit the shared information in feature space. SADG wo/trip denotes the SADG framework without triplet mining, where the effects of triplet loss are canceled. In this case, the framework does not utilize local clustering objective as a regularization. SADG wo/sa denotes the SADG framework without scale alignment clustering, where the global relationship alignment between different scales are removed from feature space.
Table 4 shows the performance of every incomplete SADG framework degrades on each set of domain generalization experiments. As expected, this result verifies that each component in SADG advances the performance simultaneously by global and local alignment and clustering operations, and that the intact version of proposed scheme achieves the finest performance.
3.3.2 Stages Comparison
Method | B&I&M to N | B&I&N to M | B&M&N to I | I&M&N to B | ||||
---|---|---|---|---|---|---|---|---|
HTER(%) | AUC(%) | HTER(%) | AUC(%) | HTER(%) | AUC(%) | HTER(%) | AUC(%) | |
feature-SADG | 18.43 | 85.98 | 23.68 | 83.16 | 16.93 | 91.44 | 15.12 | 91.87 |
task-SADG | 16.94 | 87.76 | 23.85 | 83.37 | 16.45 | 91.15 | 17.43 | 87.39 |
Ours(SADG) | 15.95 | 90.28 | 22.20 | 85.60 | 15.93 | 92.03 | 15.13 | 91.65 |
According to MASF [6], the concept of a class is represented by the average of embedded features in the feature space. However, our preliminary experiments indicated that KL divergence output alignment strategy is better in scale alignment. There is also an average strategy deployed on task network classification score. We conduct an extensive study on these different strategies. The first strategy is a feature scale alignment. Following the work of [6], the large (l) scale concept of class c is:
(8) |
where G is the feature generator. And we can define soft label distribution and loss as such:
(9) |
(10) |
where l and s indicate large scale and small scale respectively. The second strategy is a task network classification score alignment. The average classification score of large scale and scale alignment loss as such:
(11) |
(12) |
The third strategy is the KL divergence output alignment described in Equations 4 and 5. Here the KL divergence is calculated in a pairwise manner.
Table 5 shows that the last strategy outperforms the other two strategies except for the I&M&N to B experiment, where the proposed method is 0.01% behind the feature scale alignment method. Therefore, our KL divergence alignment strategy is the robustest method.
3.3.3 Limited Source Domains
Method | M&N to B | M&N to I | ||
---|---|---|---|---|
HTER(%) | AUC(%) | HTER(%) | AUC(%) | |
MS-LBP [4] | 24.83 | 83.15 | 36.33 | 81.58 |
Choi-CNN [5] | 45.70 | 81.30 | 48.05 | 72.24 |
MS-CNN [17] | 24.34 | 83.48 | 30.13 | 76.95 |
SSDG [12] | 22.03 | 85.38 | 23.93 | 84.08 |
Ours(SADG) | 18.56 | 88.81 | 22.94 | 84.31 |
We further conduct experimental comparison in a limited source scenario (e.g. only two source domains are available), which is a normal case in real-life practices. The scale variance between mturk (M for short) and NTU-ROSE (N for short) is more significant than that of the other two domains, thus we choose M and N as source domains and the remaining two as target domains. In Table 6, our proposed method outperforms other methods significantly. When compared with Table 2, our proposed method performs better by expoliting the scale variance information. Therefore, our method achieves more generalized feature space even in a extremely limited source scenario.
3.4 Conclusion
To address two challenges, scale variances and domain shift, we propose a scale alignment domain generalization framework (SADG). Different from existing recapture detection methods, our SADG framework exploits generalized discriminative information in shared feature space. Moreover, we apply global and local regularization on the embedded features. Specifically, the global relationship between different scales is aligned and utilized for optimization. Meanwhile, triplet loss is also incorporated as a further constraint for class clustering and a clearer decision boundary. Extensive experiments on public databases validate the effectiveness of our proposed method and prove that our SADG framework achieves state-of-the-art results in domain generalization recapture detection.
Acknowledgements
Portions of the research in this paper used the ROSE Recaptured Image Dataset made available by the ROSE Lab at the Nanyang Technological University, Singapore.
References
- [1] Agarwal, S., Fan, W., Farid, H.: A diverse large-scale dataset for evaluating rebroadcast attacks. In: 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). pp. 1997–2001. IEEE (2018)
- [2] Anjum, A., Islam, S.: Recapture detection technique based on edge-types by analysing high-frequency components in digital images acquired through lcd screens. Multimedia Tools and Applications pp. 1–21 (2019)
- [3] Awati, C., Alzende, N.H.: Classification of singly captured and recaptured images using sparse dictionaries. International Journal 5(7) (2017)
- [4] Cao, H., Kot, A.C.: Identification of recaptured photographs on lcd screens. In: 2010 IEEE International Conference on Acoustics, Speech and Signal Processing. pp. 1790–1793. IEEE (2010)
-
[5]
Choi, H.Y., Jang, H.U., Son, J., Kim, D., Lee, H.K.: Content recapture detection based on convolutional neural networks. In: International Conference on Information Science and Applications. pp. 339–346. Springer (2017)
- [6] Dou, Q., Castro, D.C., Kamnitsas, K., Glocker, B.: Domain generalization via model-agnostic learning of semantic features. In: Advances in Neural Information Processing Systems (NeurIPS). vol. 32. Vancouver, BC, Canada (2019)
-
[7]
Farid, H., Lyu, S.: Higher-order wavelet statistics and their application to digital forensics. In: 2003 Conference on computer vision and pattern recognition workshop. vol. 8, pp. 94–94. IEEE (2003)
-
[8]
Ganin, Y., Lempitsky, V.: Unsupervised domain adaptation by backpropagation. In: International conference on machine learning. pp. 1180–1189. PMLR (2015)
- [9] Gao, X., Ng, T.T., Qiu, B., Chang, S.F.: Single-view recaptured image detection based on physics-based features. In: 2010 IEEE International Conference on Multimedia and Expo. pp. 1469–1474. IEEE (2010)
- [10] Gluckman, J.: Scale variant image pyramids. In: 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’06). vol. 1, pp. 1069–1075. IEEE (2006)
- [11] Hong, C.: Statistical image source model identification and forgery detection. Ph.D. thesis, Nanyang Technological University (2011)
- [12] Jia, Y., Zhang, J., Shan, S., Chen, X.: Single-side domain generalization for face anti-spoofing. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 8484–8493 (2020)
-
[13]
Li, H., Wang, S., Kot, A.C.: Image recapture detection with convolutional and recurrent neural networks. Electronic Imaging
2017(7), 87–91 (2017) - [14] Li, J., Wu, G.: Image recapture detection through residual-based local descriptors and machine learning. In: International Conference on Cloud Computing and Security. pp. 653–660. Springer (2017)
- [15] Li, R., Ni, R., Zhao, Y.: An effective detection method based on physical traits of recaptured images on lcd screens. In: International Workshop on Digital Watermarking. pp. 107–116. Springer (2015)
- [16] Muammar, H., Dragotti, P.L.: An investigation into aliasing in images recaptured from an lcd monitor using a digital camera. In: 2013 IEEE International Conference on Acoustics, Speech and Signal Processing. pp. 2242–2246. IEEE (2013)
- [17] Noord, N.V., Postma, E.: Learning scale-variant and scale-invariant features for deep image classification. Pattern Recognition 61, 583–592 (2017)
-
[18]
Pan, S.J., Yang, Q.: A survey on transfer learning ieee transactions on knowledge and data engineering. 22 (10): 1345
1359 (2010) - [19] Park, D., Ramanan, D., Fowlkes, C.: Multiresolution models for object detection. In: European conference on computer vision. pp. 241–254. Springer (2010)
- [20] Shao, R., Lan, X., Li, J., Yuen, P.C.: Multi-adversarial discriminative deep domain generalization for face presentation attack detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 10023–10031 (2019)
-
[21]
Sun, Y., Shen, X., Lv, Y., Liu, C.: Recaptured image forensics algorithm based on multi-resolution wavelet transformation and noise analysis. International Journal of Pattern Recognition and Artificial Intelligence
32(02), 1854003 (2018) - [22] Thongkamwitoon, T., Muammar, H., Dragotti, P.L.: An image recapture detection algorithm based on learning dictionaries of edge profiles. IEEE Transactions on Information Forensics and Security 10(5), 953–968 (2015)
- [23] Torralba, A., Efros, A.A.: Unbiased look at dataset bias. In: CVPR 2011. pp. 1521–1528. IEEE (2011)
- [24] Yang, P., Li, R., Ni, R., Zhao, Y.: Recaptured image forensics based on quality aware and histogram feature. In: International Workshop on Digital Watermarking. pp. 31–41. Springer (2017)
- [25] Yang, P., Ni, R., Zhao, Y.: Recapture image forensics based on laplacian convolutional neural networks. In: International Workshop on Digital Watermarking. pp. 119–128. Springer (2016)
- [26] Yin, J., Fang, Y.: Markov-based image forensics for photographic copying from printed picture. In: Proceedings of the 20th ACM international conference on Multimedia. pp. 1113–1116 (2012)
- [27] Zhu, N., Li, Z.: Recaptured image detection through enhanced residual-based correlation coefficients. In: International Conference on Cloud Computing and Security. pp. 624–634. Springer (2018)