Adversarial Unsupervised Domain Adaptation Guided with Deep Clustering for Face Presentation Attack Detection

02/13/2021 ∙ by Yomna Safaa El-Din, et al. ∙ The American University in Cairo 0

Face Presentation Attack Detection (PAD) has drawn increasing attentions to secure the face recognition systems that are widely used in many applications. Conventional face anti-spoofing methods have been proposed, assuming that testing is from the same domain used for training, and so cannot generalize well on unseen attack scenarios. The trained models tend to overfit to the acquisition sensors and attack types available in the training data. In light of this, we propose an end-to-end learning framework based on Domain Adaptation (DA) to improve PAD generalization capability. Labeled source-domain samples are used to train the feature extractor and classifier via cross-entropy loss, while unsupervised data from the target domain are utilized in adversarial DA approach causing the model to learn domain-invariant features. Using DA alone in face PAD fails to adapt well to target domain that is acquired in different conditions with different devices and attack types than the source domain. And so, in order to keep the intrinsic properties of the target domain, deep clustering of target samples is performed. Training and deep clustering are performed end-to-end, and experiments performed on several public benchmark datasets validate that our proposed Deep Clustering guided Unsupervised Domain Adaptation (DCDA) can learn more generalized information compared with the state-of-the-art classification error on the target domain.



There are no comments yet.


page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Face detection and recognition is an important topic in computer vision, it is used in many applications from which authentication is the most sensitive. Since the wide spread of smart mobile devices and the incorporation of latest vision technologies in these devices, end users find it more convenient to use their biometric data for authentication instead of classic passwords typing. On the other hand, this ease of use makes it easier for attacker to spoof the authentication system using pre-recorded biometric samples of the device user. Hence, the interest in developing reliable anti-spoofing or Presentation Attack Detection (PAD) techniques is increasing. Through the past years, several approaches were developed in literature 


starting from basic methods relying on image processing and hand-engineered features, till approaches depending on automatically learnt features by deep-learning.

These approaches have succeeded to obtain perfect attack detection results on intra-dataset scenarios, where the dataset is split into training and testing subsets, so both subsets are coming from the same sensor model and acquisition environment. However, the main drawback of such methods is their lack of generalization to different environments and attack scenarios. The performance of the learnt representations in classifying the attack from the bona-fide (real) presentation degrades significantly when test data is captured by different sensor or in different settings or illumination conditions. In view of this, Domain Adaptation (DA) [10] and Domain Generalization (DG) [21] were introduced recently in the PAD field. The target of DG is to learn representations that are robust across different domains, given samples from several source domains, such as in [19], [30], [15]. While, DA aims at adapting a model trained on labeled source domain to a different target domain. Unsupervised DA (UDA) uses labeled samples from a source domain and unlabeled samples from a target domain, with a goal to achieve low classification error on the target domain though samples are unlabeled, by learning domain-invariant features.

For example, [20]

experimented with both hand-crafted and deep learnt features in DA, however their approach was not end-to-end and the deep features did not generalize well. They achieved their best results using a combination of hand-crafted features. Adversarial training was used in DA for face PAD in 

[33] to learn an embedding space shared by both the source and target domain models. The training process is still not end-to-end where source pre-training, embedding adaptation and target classification are done separately.

In this paper, we focus on developing an end-to-end trainable solution for PAD based on DA, which focuses on improving the generalization of the model for cross-dataset testing without the need for several labeled source domains as in DG. Existing DA-based solutions solely aim to align the distribution of an unlabeled target domain to that of a different source domain, neglecting the specific nature of target domain. Target domain in face PAD is a different PAD dataset usually using a different device for authentication, in addition to different attack types in different illumination conditions. So solely trying to align the distribution of such different attacks scenarios to the distribution of attack scenarios in the labeled source dataset would not succeed, especially when the device used for authentication in one domain, is close to the one used for attack in the other domain, e.g. mobile device. So, we propose an approach that utilizes DA for PAD generalization to a different domain without neglecting the intrinsic properties of this target domain. We incorporate clustering based on deeply extracted features, for guiding the feature extraction network to generate features that are domain invariant, yet maintain the class-wise separability of the target dataset.

The main contributions of this work are: (1) proposing a novel end-to-end DA-based training architecture for the generalization of face PAD based; (2) utilize deep embedding clustering of target domain in guiding the DA process; (3) show substantial improvement on SOTA in cross-dataset evaluation on public benchmark face PAD datasets, with close to 0% cross-dataset error. The rest of the paper is organized as follows: Section 2 reviews the latest literature in face PAD and domain adaptation. Our proposed algorithm is explained in Section 3, followed by the experiments, benchmark datasets used and results in Section 4, then conclusions in Section 5.

2 Related Work

2.1 CNN-Based Face PAD

Recent software-based face presentation attack detection methods can be mainly categorized into texture-based and temporal-based techniques. The texture-based methods rely on extracting features from the frames that would identify if the presented image is fake or bona-fide. Features could be hand-crafted features as color texture [2], SIFT [25] or SURF [3] which obtained good results in differentiating real from fake presentations. However, they are often sensitive to varying acquisition conditions, such as camera devices, lighting conditions and Presentation Attack Instruments (PAIs). Hence, the need to automatically learn and extract meaningful features directly from the data using deep representations, such as in [24, 8].

In additional to texture-based features, temporal-based models utilize the temporal information in face videos for better detection of attack presentations. Frame difference was combined with deep features in [26]. In [9]

image quality information and motion information from optical flow were combined with neural network for classification. LSTM-CNN architecture was used in 

[40] and in [37]

multiple RGB frames were used to estimate face depth information, and then two modules were used to extract short and long-term motion.

These methods obtain excellent results in intra-dataset testing, yet still fail to generalize to unseen environments and acquisition conditions. They show high cross-dataset evaluation errors, hence the need to incorporate domain adaptation techniques to decrease the discrepancy in distributions of the domain used for training and that used for deployment.

2.2 Unsupervised Domain Adaptation

Recently, Domain Adaptation (DA) has been introduced in computer vision, to tackle the problem of domain shift when applying models trained on a certain (source) domain to another (target) domain. Several methods, such as [10], rely on adversarial training [11] to guide the feature extraction module to generate domain-invariant features that make it harder for a domain discriminator to decide the original domain of the sample. Specifically, unsupervised DA uses labeled samples from the source domain in addition to unlabeled samples from the target domain; to train a model that reduces the classification error on the unlabeled target domain.

Inspired by the success of DA in image classification [27], [22], [29], [28], [18], [41], [32], [16], we believe that it can be used to address the problem of generalization in face PAD. A model fine-tuned on certain small-sized face PAD dataset fails to generalize when testing on different PAD domains with different domain. The learnt features become specific to the subjects or sensors available in the source dataset. Hence, by using domain adaptation in face PAD, the model will be guided to learn domain-invariant features that can differentiate between bona-fide and attack face videos regardless of the instance origin. However, learning domain invariant features can hurt classification of the target face PAD dataset by ignoring the fine-level class-wise structure of this target since the attack samples are generated with different instruments, and bona-fide samples may be captured by different sensors. Hence, we propose to incorporate deep clustering of target samples to constraint the model to keep the discriminative structure of both classes in the target dataset.

2.3 Deep Unsupervised Clustering

Deep learning is adopted in clustering of deep visual features since Deep Embedded Clustering (DEC) [39]

. Clustering aims at categorizing unlabeled data into groups (clusters). A DEC is a method that jointly learns feature representations and cluster assignments, where a neural network is first pre-trained by means of an autoencoder and then fine-tuned by jointly optimizing cluster centroids in output space and the underlying feature representation using Kullback-Leibler divergence minimization. Later, variants of DEC have emerged, such as 

[12] which adds data augmentation.

Unlike DEC, which require layer-wise pretraining as well as non-joint embedding and clustering learning, DEeP Embedded RegularIzed ClusTering (DEPICT) [6]

utilizes an end-to-end optimization for training all network layers simultaneously using the unified clustering and reconstruction loss functions. DEPICT consists of a multi-layer convolutional autoencoder followed by a multinomial logistic regression function. The clustering objective function uses relative entropy (KL divergence) minimization, regularized by a prior for the frequency of cluster assignments. An alternating strategy is then followed to optimize the objective by updating parameters and estimating cluster assignments. Reconstruction loss functions is employed in the autoencoder to prevent the deep embedding function from overfitting. A joint learning framework is introduced to minimize the unified clustering and reconstruction loss functions together and train all network layers simultaneously.

Recently, clustering has been introduced in several domain adaptation methods. [36] proposed a method to alleviate the effects of negative transfer in adversarial domain matching between source and target representations. They proposed to simultaneously learn tightly clustered target representations while encouraging that each cluster is assigned to a unique and different class from the source. In [31], structural domain similarity is assumed and the clustering solution is constrained using structural source regularization. By minimizing the KL divergence between predictive label distribution of the network and an introduced auxiliary one; replacing the auxiliary distribution with that formed by ground-truth labels of source data implements the structural source regularization via a simple strategy of joint network training.

Figure 1: Architecture of the proposed Deep Clustering-guided-Domain Adaptation (DCDA) for face PAD. : Feature extraction network, : Domain Discriminator, : Gradient Reverse Layer, : Categories Classifier, : Source, : Target. Bona-fide images are highlighted in green border, while attack images are highlighted in red. Deep Features Clustering: predicts target pseudo-labels and cluster centers . Cluster Assignment: assigns target features to clusters based on Student’s -distribution.

2.4 DA in Face PAD

Domain Adaptation (DA) and Domain Generalization (DG) have been utilized recently to reduce the gap between the target domain and the source domain during face PAD. [30] focuses on improving the generalization ability of face PAD methods from the perspective of the domain generalization. Adversarial learning was proposed to train multiple feature extractors to learn a generalized feature space. They also incorporated an auxiliary face depth supervision to further enhance the generalization ability. Later, a Single-Side Domain Generalization framework was proposed in (SSDG) [15] that is end-to-end. They proposed to learn a generalized feature space, where the feature distribution of the real faces is compact while that of the fake ones is dispersed among domains but compact within each domain.

One of the first work exploring DA for face PAD is [20] were both hand-crafted features and deep neural network learned features are adopted and compared in DA. [20] found that the deep learning based methods may not generalize well under cross-database testing scenarios, and their best results were achieved using concatenated CoALBP and LPQ feature in HSV and YCbCr color space.

A 3D CNN architecture tailored for the spatial-temporal input is proposed by [19] for enhancing the generalization capability of the network. A robust representation across different face spoofing domains is presented by introducing the generalization loss as the regularization term. Given training samples from several domains, the network is optimized such that the Maximum Mean Discrepancy (MMD) distances among different domains can be minimized. They performed the experiments by combining three publicly available face PAD datasets to create 10 protocols. In each protocol, data from one camera is set aside as the unseen target domain, and a subset of the remaining cameras are used as source domains.

ADA [33] is the first to incorporate adversarial domain adaptation in a learning approach to improve face PAD generalization capability. A source model optimized with triplet loss is first pre-trained in source domain, and then adversarial adaptation is used for training a target model to learn a shared embedding space by both the source and target domain models. Finally, target images are mapped with the target model to the embedding space and classified with k-nearest neighbors’ classifier. However, as the first attempt to use adversarial training for domain adaptation, the training is not performed end-to-end. In [23], authors relied only on bona-fide samples of the target domain for DA. They hypothesize that, in a CNN trained for PAD given a source domain, some of the filters learned in the initial layers are robust filters that generalize well to the target dataset, whereas others are more specific to the source dataset. They propose to prune such filters that do not generalize well from one dataset to another in order to improve the performance of the network on the target dataset. Feature Divergence Measure (FDM) is computed to quantify the level of domain shift at a given layer in a CNN.


proposed disentangled representation learning for cross-domain face PAD. Their approach consists of Disentangled Representation learning (DR-Net) and Multi-Domain feature learning (MD-Net). DR-Net learns a pair of encoders via generative models that can disentangle PAD informative features from subject discriminative features. The disentangled features from different domains are fed to MD-Net which learns domain-independent features for the final cross-domain face PAD task. They tested single-source to single-target cross-domain PAD and also multi-source to multi-target and obtained state of the art results on four public datasets. Their later work (DR-UDA) 

[35] consists of three modules, ML-Net, UDA-Net and DR-Net. ML-Net uses the labeled source domain face images to learn a discriminative feature representation. UDA-Net performs unsupervised adversarial domain adaptation in order to optimize the source domain and target domain encoders jointly, and obtain a common feature space shared by both domains. Furthermore, DR-Net disentangles the features irrelevant to specific domains by reconstructing the source and target domain face images from the common feature space.

3 Methodology

In this section, we introduce the frameworks of unsupervised DA and unsupervised clustering. Then, we present our proposed model for UDA in face PAD. Figure 1 shows a brief overview of the proposed architecture.

Since the most common target platform is mobile devices, we follow [7] and use latest architecture of MobileNet; MobileNetV3 [14] instead of the commonly used Resnet-50 [13]. MobileNet is tuned for mobile phone CPUs which helps preserve the mobile battery life by reducing power consumption. With

less parameters, MobileNetV3 achieves comparable ImageNet accuracy as Resnet50 with reduced inference time.

3.1 Deep Unsupervised Domain Adaptation

Unsupervised Domain Adaptation (UDA), depends on having a set of labeled source samples and another set of unlabeled samples from target domain . The goal is to train a model that is capable of achieving low classification errors on the unlabeled target domain guided by the labeled source samples. The feature extraction module is trained to be able to extract features that benefit the categories classification without differentiating the domain origin of the sample.

As (DANN)  [10], adversarial training is incorporated to guide the feature extraction module, , to generate features that confuse a domain discriminator, , to not be able to determine the domain of the input features. The categories (task) classifier, , is then trained on top of these generated domain-invariant features; using the labeled source samples, to decide the final classification label.

The task classification loss is calculated as


where is categorical cross-entropy loss, is the feature extractor network and is the task classification loss from all source samples using. Similarly, domain discrimination loss,


where is categorical cross-entropy loss, is domain label, zero for source samples, and one otherwise. This loss is minimized over the parameters of while maximized over the parameters of via the gradient reverse layer ().

3.2 Proposed DC-guided UDA for Face PAD

For handling the problem of generalization in face PAD, we propose to use UDA, in combination with Deep Embedding Clustering (DEC) of the unlabeled target samples during training. Motivation for UDA is to alleviate the shift between the source and target domains. However, we do not want to lose the target properties for each class.

Aligning both source and target domains in face PAD with source and target coming from different sensors and attack instruments, might lead to target samples being misclassified and shifted towards the wrong class. For example, a target mobile attack instance can be assigned to the closest source sample which might be bona-fide class if bona-fide samples of source dataset are captured with same instrument (mobile device). So motivation for adding target clustering is to preserve the class-wise separation of target domain samples. Which together with adversarial DA, will guide to generate features that reduce domain shift without corrupting the class-wise separability of target domain.

  Let {, , } be the learnable parameters for each model component.
  Let { , } be the learnable cluster centers for bona-fide and attack classes respectively.
      Labeled source videos and unlabeled target videos
      Batch size:
      Feature extractor:
  Deep Descriminative Clustering:
      Fix model parameters
      for all
      for all
      k-means clustering of using as initial centers
      for do
  while  do
     for  to  do
        Draw random batch ,
     end for
     Update target pseudo-labels based on distance to and
  end while
Algorithm 1 Training of DCDA: Deep Clustering-guided-Domain adaptation for face PAD

3.2.1 Deep Clustering for DA

Our training follows the unsupervised deep clustering methods [39], [6]

which alternates between cluster assignment while fixing model parameters, then model update while fixing these cluster assignment. At the start of each epoch, k-means clustering is performed on the deep features generated by

to generate pseudo-labels, , for the unlabeled target samples. Then, during epoch iterations, two losses based on Kullback-Leibler (KL) divergence [39] are minimized to update the parameters of , and cluster centroids via back-propagation.

These learnable centroids for each of the bona-fide and attack classes are re-updated at the start of each epoch, while fixing the model parameters. Guided by the labels of source samples, and the source features generated by the current , clusters centers for the source domain; , can be obtained in the embedding space. On the other hand, for the unlabeled target samples, k-means clustering is used on the generated latent features of all target samples. This obtains both pseudo-labels for all target instances in training, , and clusters centers for the target domain, . Finally, the learnable cluster center for each class is updated to be the mean of both and .

During training iterations of an epoch, target samples are used to minimize KL divergence two-way. The loss to be minimized can be written as


where is the cluster assignments for target samples and is an auxiliary target distributions, and the purpose of Kl divergence minimization is to decrease the distance between the model predicted and the distribution . The second term follows [17] for incorporating class balance to avoid degenerate solutions, where .

As in [6], optimization of loss in equation 3 alternates between updating auxiliary distribution then using to update model parameters. is calculated in closed-form solutions as


For further regulation of target clustering, we use the previously estimated target pseudo-labels as part of by setting .

Then using calculated and , parameters of and are updated by minimizing


As mentioned earlier, we use KL divergence minimization with target domain samples for two losses which update parameters of feature extraction module

via backpropagation. The first loss additionally aims to update the classifier

as well, and the second loss updates the cluster centroids . For the first loss (), we set

as the classifier prediction probabilities after softmax;

, so that it becomes like cross-entropy classification loss using pseudo-labeled target samples.

For the second loss (), is estimated using the Student’s t-distribution to measure the similarity between target features and cluster centroids as in [39]

Finally, the estimated pseudo-labels for target samples are used to update the parameters of both the feature extractor and the classifier by minimizing the following task classification loss


where is categorical cross-entropy loss.

3.2.2 Complete Model learning

The complete end-to-end training methodology of our proposed DC-guided-DA for face PAD is listed in Algorithm 1. We use only one frame per video.

Database PAI Sensor used for authentication Subset Bona-fide Attack Total
1) PR (A4)
2) VR on iPhone
3) VR on iPad
(1) Webcam in MacBook laptop

1) PR (A3)
2) high-def VR on iPad
3) VR on iPhone
1) Webcam in MacBook Air
2) FC of Google Nexus5 Mob

1) PR (A4)
2) VR on matte-screen
1) FC of iPad Mini2 Tablet
2) FC of LG-G4 Mobile
FC: Front-Camera, PR: Hard-copy print of high-res photo, VR: Video replay
Table 1: Number of samples per class per subset for each used PAD dataset.
traintest RAM RARM MRA MRM RMRA RMM Average
Source-only 34 49.8 39.4 15.6 42.3 42 37.18
DA w/o clustering 29.6 47.2 49.25 11.35 45 2.9 30.88
DCDA w/o 18.35 49.2 10.40 2.25 11.65 37.80 19.94
DCDA 0 0 0.15 1.6 1.15 1.65 0.76
RA: Replay-Attack, M: MSU-MFSD, RM: Replay-Mobile
Table 2: Results of Proposed DC-guided-DA for Face-PAD in ACER% at threshold .
RAM MRA Average
KSA [20] 18.6 23.3 20.95
ADA [33] 30.5 5.1 17.8
PAD-GAN [34] 23.2 8.7 15.95
SSDG [15] 7.38 11.7 9.54
DCDA (Proposed) 0 0.15 0.08
On concatenated CoALBP and LPQ features in HSV and YCbCr color space
Source-domain includes two other datasets
Table 3: Comparison with SOTA in HTER%.
(a) RAM
(b) RARM
(c) RAM
(d) RARM
(e) MRA
(f) MRM
(g) MRA
(h) MRM
(i) RMRA
(j) RMM
(k) RMRA
(l) RMM
Figure 2: t-SNE visualization analysis. Upper row : DA without clustering, Bottom row : Proposed DC-guided-DA. : Source, : Target, : Bona-fide, : Attack. Best viewed in color.

4 Experiments and Results

4.1 Face PAD datasets

Table 1 summarizes the total number of samples present in each subset of the datasets used, in addition to the Presentation Attack Instruments (PAI) used and the sensors used in recording videos for authentication.

Replay-Attack [4] is one of the earliest datasets presented in literature for the problem of face spoofing It consists of 1200 short videos from 50 different subjects with resolution from 50 different subjects. Attack scenario include ”hard-copy print-attack”, ”mobile-photo attack” and ”high-definition screen attack”. Attacks are presented to the sensor (regular webcam) either with a ”fixed” tripod, or by an attacker holding the presenting device (printed paper or replay device) with his/her ”hand”.

MSU Mobile Face Spoofing Database (MSU-MFSD) [38] targets the problem of face spoofing on smartphones . The dataset includes real and spoofed videos from 35 subjects . Two devices were used, the webcam of a MacBook Air with resolution and the front facing camera of a smartphone with resolution. Three attack scenarios are used: print-attack on A3 paper, video replay attack on the screen of an iPad and video replay attack on a smartphone.

Replay-Mobile [5] was released by the same research institute that released Replay-Attack. It has 1200 short videos from 40 subjects captured by two mobile devices at resolution . Each subject has ten bona-fide accesses and 16 attack videos under different attack modes. Two types of attack are present: photo-print and matte-screen attack displaying digital-photo or video.

4.2 Experimental setup

Our experiments were performed on NVIDIA GeForce 840m GPU with CUDA version 11.0. Bob package [1]

was used for datasets management and PyTorch was used for models and training. Evaluation metrics for PAD are the ISO/IEC 30107-3:2017

111 metrics. Attack Presentation Classification Error Rate (APCER), Bona-fide Presentation Classification Error Rate (BPCER) and their Average Classification Error Rate (ACER) () is used for reporting results in the tables.

4.3 Results and Discussion

Table 2 presents results of our proposed DC-guided UDA for face PAD on the 3 benchmark face datasets used. Results are reported as the average ACER % of three runs, ACER is calculated on the test subset of the target dataset. The first row represents the results obtained by fine-tuning a MobileNetV3 classification network on source dataset only without domain adaptation. We performed experiments to study the influence of each model component on the overall performance of the algorithm. Clustering components and losses were removed and only Domain Adaptation was performed, results in the second row of Table 2 show only slight improvement over source-only trained models. Then, adding clustering components with target psuedo-labels estimation and target clustering loss , but without updating the classifier with target classification loss , yielded a significant decrease in the target classification error on most datasets as shown in third row. However, though feature extraction network is trying to learn domain-invariant features, the classifier trained on source-samples only still fails in some cases to achieve low errors on some target datasets. For example, the classifier trained on Replay-Attack dataset fails to discriminate the attack and bona-fide samples on Replay-Mobile dataset.

Finally, the last row shows results obtained by our full proposed DCDA framework, which achieves near-perfect classification of the unlabeled target samples. Comparison with state-of-the art DA-based face PAD solutions is provided in Table 3 showing superiority of our proposed DC-guided-DA framework. Furthermore, -SNE visualization analysis is presented in Figure 2, comparing our proposed architecture, with models trained using Domain Adaptation only. The visualizations show that our proposed framework could align the classification boundaries for both source and target datasets, it also shows the diversity of attack and sensors types present in the same dataset that form clusters in the same class of the same dataset, for example Replay-Attack in Figure 2 parts 1(c),  1(d),  1(g) and 1(k).

5 Conclusion and Future Work

In this paper, we proposed an approach that exploits unsupervised adversarial domain adaptation guided with target clustering, in order to improve the generalization ability for face PAD. Specifically, our framework utilizes UDA to learn domain invariant features that could leverage from the labeled source samples to classify the unlabeled samples from target domain. Yet, the approach succeeds to preserve the intrinsic properties of the target domain via deep clustering of target embedding features. Our approach is trained in an end-to-end fashion and succeeds to reach perfect adaptation to the target domain when evaluated on public benchmark datasets, reaching only 0 - 2% cross-dataset error. Our future work would focus on evaluating on more variable datasets, in addition to reducing the dependency of the model during training on target domain samples from both classes, trying to let the model focuses on learning from bona-fide samples with minimal attack samples contribution.


  • [1] A. Anjos, L. E. Shafey, R. Wallace, M. Günther, C. McCool, and S. Marcel (2012-10)

    Bob: a free signal processing and machine learning toolbox for researchers

    In 20th ACM Conference on Multimedia Systems (ACMMM), Nara, Japan, Cited by: §4.2.
  • [2] Z. Boulkenafet, J. Komulainen, and A. Hadid (2016-08) Face spoofing detection using colour texture analysis. IEEE Transactions on Information Forensics and Security 11 (8), pp. 1818–1830. External Links: ISSN 1556-6013 Cited by: §2.1.
  • [3] Z. Boulkenafet, J. Komulainen, and A. Hadid (2017-02)

    Face antispoofing using speeded-up robust features and fisher vector encoding

    IEEE Signal Processing Letters 24 (2), pp. 141–145. External Links: ISSN 1070-9908 Cited by: §2.1.
  • [4] I. Chingovska, A. Anjos, and S. Marcel (2012-Sep.) On the effectiveness of local binary patterns in face anti-spoofing. In 2012 BIOSIG - Proceedings of the International Conference of Biometrics Special Interest Group (BIOSIG), pp. 1–7. External Links: ISSN 1617-5468 Cited by: §4.1.
  • [5] A. Costa-Pazo, S. Bhattacharjee, E. Vazquez-Fernandez, and S. Marcel (2016-Sep.) The replay-mobile face presentation-attack database. In 2016 International Conference of the Biometrics Special Interest Group (BIOSIG), pp. 1–7. Cited by: §4.1.
  • [6] K. G. Dizaji, A. Herandi, C. Deng, W. Cai, and H. Huang (2017) Deep clustering via joint convolutional autoencoder embedding and relative entropy minimization.. In IEEE international conference on Computer Vision, pp. 5747–5756. Cited by: §2.3, §3.2.1, §3.2.1.
  • [7] Y. S. El-Din, M. N. Moustaf, and H. Mahdi (2020) On the effectiveness of adversarial unsupervised domain adaptation for iris presentation attack detection in mobile devices. In ICMV’20, Cited by: §3.
  • [8] Y. S. El-Din, M. N. Moustafa, and H. Mahdi (2020-09)

    Deep convolutional neural networks for face and iris presentation attack detection: survey and case study

    9, pp. 179–193(14) (English). External Links: ISSN 2047-4938, Link Cited by: §1, §2.1.
  • [9] L. Feng, L. Po, Y. Li, X. Xu, F. Yuan, T. C. Cheung, and K. Cheung (2016) Integration of image quality and motion cues for face anti-spoofing: a neural network approach. Journal of Visual Communication and Image RepresentationCoRRIET BiometricsArXivIEEE transactions on pattern analysis and machine intelligenceCVPR Workshop on Learning from Unlabeled VideosArXivPLOS ONEIEEE Transactions on Information Forensics and SecurityIEEE Transactions on Information Forensics and SecurityarXiv preprint arXiv:1901.05602ACM Trans. Intell. Syst. Technol.IEEE Transactions on Information Forensics and Security 38, pp. 451 – 460. External Links: ISSN 1047-3203 Cited by: §2.1.
  • [10] Y. Ganin, E. Ustinova, H. Ajakan, P. Germain, H. Larochelle, F. Laviolette, M. Marchand, and V. Lempitsky (2016-01) Domain-adversarial training of neural networks. J. Mach. Learn. Res. 17 (1), pp. 2096–2030. External Links: ISSN 1532-4435 Cited by: §1, §2.2, §3.1.
  • [11] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio (2014) Generative adversarial nets. In Advances in Neural Information Processing Systems 27, Z. Ghahramani, M. Welling, C. Cortes, N. D. Lawrence, and K. Q. Weinberger (Eds.), pp. 2672–2680. External Links: Link Cited by: §2.2.
  • [12] X. Guo, E. Zhu, X. Liu, and J. Yin (2018-14–16 Nov) Deep embedded clustering with data augmentation. J. Zhu and I. Takeuchi (Eds.), Proceedings of Machine Learning Research, Vol. 95, , pp. 550–565. External Links: Link Cited by: §2.3.
  • [13] K. He, X. Zhang, S. Ren, and J. Sun (2016) Deep residual learning for image recognition. In

    2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

    Vol. , pp. 770–778. Cited by: §3.
  • [14] A. Howard, M. Sandler, B. Chen, W. Wang, L. Chen, M. Tan, G. Chu, V. Vasudevan, Y. Zhu, R. Pang, H. Adam, and Q. Le (2019) Searching for mobilenetv3. In 2019 IEEE/CVF International Conference on Computer Vision (ICCV), Vol. , pp. 1314–1324. Cited by: §3.
  • [15] Y. Jia, J. Zhang, S. Shan, and X. Chen (2020) Single-side domain generalization for face anti-spoofing. In Proc. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Cited by: §1, §2.4, Table 3.
  • [16] G. Kang, L. Jiang, Y. Wei, Y. Yang, and A. G. Hauptmann (2020) Contrastive adaptation network for single-and multi-source domain adaptation. Cited by: §2.2.
  • [17] A. Krause, P. Perona, and R. G. Gomes (2010) Discriminative clustering by regularized information maximization. In Advances in Neural Information Processing Systems 23, J. D. Lafferty, C. K. I. Williams, J. Shawe-Taylor, R. S. Zemel, and A. Culotta (Eds.), pp. 775–783. External Links: Link Cited by: §3.2.1.
  • [18] V. K. Kurmi and V. P. Namboodiri (2019-07) Looking back at labels: a class based domain adaptation technique. In International Joint Conference on Neural Networks (IJCNN), Cited by: §2.2.
  • [19] H. Li, P. He, S. Wang, A. Rocha, X. Jiang, and A. C. Kot (2018) Learning generalized deep feature representation for face anti-spoofing. 13 (10), pp. 2639–2652. External Links: Document Cited by: §1, §2.4.
  • [20] H. Li, W. Li, H. Cao, S. Wang, F. Huang, and A. C. Kot (2018) Unsupervised domain adaptation for face anti-spoofing. 13 (7), pp. 1794–1809. External Links: Document Cited by: §1, §2.4, Table 3.
  • [21] H. Li, S. J. Pan, S. Wang, and A. C. Kot (2018) Domain generalization with adversarial feature learning. In 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vol. , pp. 5400–5409. External Links: Document Cited by: §1.
  • [22] M. Long, Z. CAO, J. Wang, and M. I. Jordan (2018) Conditional Adversarial Domain Adaptation. In Advances in Neural Information Processing Systems 31, S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett (Eds.), pp. 1640–1650. External Links: Link Cited by: §2.2.
  • [23] A. Mohammadi, S. Bhattacharjee, and S. Marcel (2020) Domain adaptation for generalization of face presentation attack detection in mobile settengs with minimal information. In ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Vol. , pp. 1001–1005. External Links: Document Cited by: §2.4.
  • [24] C. Nagpal and S. R. Dubey (2018) A performance evaluation of convolutional neural networks for face anti spoofing. CoRR. External Links: 1805.04176 Cited by: §2.1.
  • [25] K. Patel, H. Han, and A. K. Jain (2016-10) Secure face unlock: spoof detection on smartphones. IEEE Transactions on Information Forensics and Security 11 (10), pp. 2268–2283. External Links: ISSN 1556-6013 Cited by: §2.1.
  • [26] K. Patel, H. Han, and A. K. Jain (2016) Cross-database face antispoofing with robust feature representation. In Biometric Recognition, Z. You, J. Zhou, Y. Wang, Z. Sun, S. Shan, W. Zheng, J. Feng, and Q. Zhao (Eds.), Cham, pp. 611–619. External Links: ISBN 978-3-319-46654-5 Cited by: §2.1.
  • [27] Z. Pei, Z. Cao, M. Long, and J. Wang (2018) Multi-adversarial domain adaptation. In

    AAAI Conference on Artificial Intelligence

    External Links: Link Cited by: §2.2.
  • [28] K. Saito, Y. Ushiku, T. Harada, and K. Saenko (2018) Adversarial dropout regularization. In International Conference on Learning Representations, Cited by: §2.2.
  • [29] K. Saito, K. Watanabe, Y. Ushiku, and T. Harada (2018) Maximum classifier discrepancy for unsupervised domain adaptation. In 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vol. , pp. 3723–3732. Cited by: §2.2.
  • [30] R. Shao, X. Lan, J. Li, and P. C. Yuen (2019) Multi-adversarial discriminative deep domain generalization for face presentation attack detection. In 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vol. , pp. 10015–10023. External Links: Document Cited by: §1, §2.4.
  • [31] H. Tang, K. Chen, and K. Jia (2020) Unsupervised domain adaptation via structurally regularized deep clustering. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Cited by: §2.3.
  • [32] H. Tang and K. Jia (2020) Discriminative adversarial domain adaptation. abs/1911.12036. Cited by: §2.2.
  • [33] G. Wang, H. Han, S. Shan, and X. Chen (2019) Improving cross-database face presentation attack detection via adversarial domain adaptation. In 2019 International Conference on Biometrics (ICB), Vol. , pp. 1–8. External Links: Document Cited by: §1, §2.4, Table 3.
  • [34] G. Wang, H. Han, S. Shan, and X. Chen (2020) Cross-domain face presentation attack detection via multi-domain disentangled representation learning. In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vol. , pp. 6677–6686. External Links: Document Cited by: §2.4, Table 3.
  • [35] G. Wang, H. Han, S. Shan, and X. Chen (2021) Unsupervised adversarial domain adaptation for cross-domain face presentation attack detection. 16 (), pp. 56–69. External Links: Document Cited by: §2.4.
  • [36] R. Wang, G. Wang, and R. Henao (2019) Discriminative clustering for robust unsupervised domain adaptation. abs/1905.13331. Cited by: §2.3.
  • [37] Z. Wang, C. Zhao, Y. Qin, Q. Zhou, and Z. Lei (2018) Exploiting temporal and depth information for multi-frame face anti-spoofing. External Links: 1811.05118 Cited by: §2.1.
  • [38] D. Wen, H. Han, and A. K. Jain (2015-04) Face spoof detection with image distortion analysis. IEEE Transactions on Information Forensics and Security 10 (4), pp. 746–761. External Links: ISSN 1556-6013 Cited by: §4.1.
  • [39] J. Xie, R. Girshick, and A. Farhadi (2016-20–22 Jun)

    Unsupervised deep embedding for clustering analysis

    In International conference on machine learning, M. F. Balcan and K. Q. Weinberger (Eds.), Proceedings of Machine Learning Research, Vol. 48, New York, New York, USA, pp. 478–487. External Links: Link Cited by: §2.3, §3.2.1, §3.2.1.
  • [40] Z. Xu, S. Li, and W. Deng (2015-11) Learning temporal features using lstm-cnn architecture for face anti-spoofing. In 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR), pp. 141–145. External Links: ISSN 2327-0985 Cited by: §2.1.
  • [41] Y. Zhang, H. Tang, K. Jia, and M. Tan (2019) Domain-symmetric networks for adversarial domain adaptation. Cited by: §2.2.