Log In Sign Up

Unsupervised Face Morphing Attack Detection via Self-paced Anomaly Detection

by   Meiling Fang, et al.

The supervised-learning-based morphing attack detection (MAD) solutions achieve outstanding success in dealing with attacks from known morphing techniques and known data sources. However, given variations in the morphing attacks, the performance of supervised MAD solutions drops significantly due to the insufficient diversity and quantity of the existing MAD datasets. To address this concern, we propose a completely unsupervised MAD solution via self-paced anomaly detection (SPL-MAD) by leveraging the existing large-scale face recognition (FR) datasets and the unsupervised nature of convolutional autoencoders. Using general FR datasets that might contain unintentionally and unlabeled manipulated samples to train an autoencoder can lead to a diverse reconstruction behavior of attack and bona fide samples. We analyze this behavior empirically to provide a solid theoretical ground for designing our unsupervised MAD solution. This also results in proposing to integrate our adapted modified self-paced learning paradigm to enhance the reconstruction error separability between the bona fide and attack samples in a completely unsupervised manner. Our experimental results on a diverse set of MAD evaluation datasets show that the proposed unsupervised SPL-MAD solution outperforms the overall performance of a wide range of supervised MAD solutions and provides higher generalizability on unknown attacks.


page 1

page 2

page 3

page 4


Unsupervised Anomaly Detectors to Detect Intrusions in the Current Threat Landscape

Anomaly detection aims at identifying unexpected fluctuations in the exp...

Normality-Calibrated Autoencoder for Unsupervised Anomaly Detection on Data Contamination

In this paper, we propose Normality-Calibrated Autoencoder (NCAE), which...

Self-Supervised Training with Autoencoders for Visual Anomaly Detection

Deep convolutional autoencoders provide an effective tool for learning n...

Unsupervised Learning of the Set of Local Maxima

This paper describes a new form of unsupervised learning, whose input is...

Context-encoding Variational Autoencoder for Unsupervised Anomaly Detection

Unsupervised learning can leverage large-scale data sources without the ...

Client-Specific Anomaly Detection for Face Presentation Attack Detection

The one-class anomaly detection approach has previously been found to be...

1 Introduction

Face recognition technique has witnessed remarkable progress in recent years. A variety of face recognition methods [3, 19] are proposed in the literature and applied in practical applications with very high accuracy. However, these methods are vulnerable to several attack [7, 14, 21, 34], one of which is the face morphing attack. A morphing attack is a face image which is purposefully manipulated to be matched with the probe images of more than one identity. As a result, morphing attack detection (MAD) solutions is of crucial importance to building reliable face recognition systems. Conventional single-image and differential MAD solutions [13, 22, 25, 40, 42, 43] require two classes of data, bona fide (i.e., not attack) and morphing attack samples for training a supervised MAD model. However, this restricts the MAD performance with the size and diversity of the training data. Most of the existing MAD datasets [11, 13, 44] are limited in diversity and quantity caused by such as the labor-intensive pair selection, morphing process, and the limited of bona fide source data that is of suitable properties (e.g., ICAO compliant [26]) and shareable in a privacy-aware frame. Moreover, due to ethical and legal issues, only a few MAD datasets [11, 13, 44] are publicly available for the development of MAD solutions. Following the lack of diversity of MAD datasets and the possibility of facing attacks created by unknown methods, the supervised MAD solutions [13, 22, 40, 42, 43] commonly results in poor performance generalization on unknown morphing attacks or data sources. To the best of our knowledge, only one single-image based [8] and one differential based MAD method [25]

were proposed to detect morphing attacks as anomalies by training a one-class classifier. Despite the obtained improved performance on the unknown attack in comparison to supervised MAD approaches, training the one-class classifier still relies on the prior knowledge that all training samples are known to be bona fide, and thus it is not a completely unsupervised approach.

To target the lack of large-scale, labelled, and diverse MAD datasets, along with the low generalizability of supervised MADs on unknown attacks, we leverage the existing large-scale face recognition datasets to train our proposed unsupervised learning-based model. Most publicly available face recognition datasets were collected from the web and might consist of unintentionally and unlabeled manipulated samples. To alleviate this issue, we model our design as self-paced learning (SPL) paradigm. Self-paced learning paradigm is inspired by the cognitive learning order in human curricula, where samples are involved in the training phase from easy to hard ones

[1]. In this case, training data is evaluated and selected automatically based on the training loss without any prior knowledge of humans. Recently, researchers have investigated the potential of SPL paradigm [20, 30] and demonstrated that is significant a strong performance gain [51, 53]

. Following the properties of the SPL paradigm as an effective learning strategy to suppress the side effects of noise samples or outliers by adjusting the weight of samples. Given the understanding of the limitations of supervised MAD solutions, we propose an SPL paradigm and incorporate it and incorporate it in our unsupervised MAD learning.

This work makes the following main contributions: 1) We first study the behavior of unsupervised anomaly detection through reconstruction error analyses on MAD data to ensure that our unsupervised MAD solution is developed on the bases of solid empirical analyses. Our study reveals that morphing attacks are more straightforward to reconstruct than bona fide samples when the reconstruction is learned on general face data; 2) We leverage our above-stated observation to present a novel unsupervised MAD solution via an adapted self-paced anomaly detection, namely SPL-MAD. The adapted SPL paradigm proved helpful in neglecting the suspicious unlabeled data in training and thus enhancing the reconstruction gap between bona fide and attack samples, leading to improving the generalizability of the MAD model. 3) The experimental results demonstrate that our SPL-MAD solution not only reaches the performance of supervised MAD solutions but also outperforms well-established supervised MAD methods and presents a more generalizable performance over a diverse set of unknown attacks included in eight MAD datasets.

2 Related work

A number of studies [22, 45] pointed out that face recognition systems are vulnerable to morphing attacks. To target this problem, several MAD solutions [13, 43, 48] were proposed. MAD solutions can be categorized into two groups based on the application scenario requirements: single-image MAD and differential-MAD, where the latter requires an investigated image and an additional live capture of the individual [5], which limits its applicability in many scenarios. In our case, we focus on the single-image-based MAD scenario, where only the investigated image is analyzed. Most single-image MAD solutions [13, 22, 40, 42, 43] are based on supervised learning that relies on data annotations. For example, Raghavendra [43] proposed a handcrafted-feature-based solution where textural features were extracted across scale-space and were classified using collaborative representation. Damer [13] proposed a pixel-wise MAD (PW-MAD) solution where a network is trained to classify each pixel of the image into an attack or not, rather than only one binary label for the whole image. These supervised-learning-based solutions achieved good MAD performance typically on attacks with properties known during training. Variations in the attacks strongly effect the MAD performances, such variations can be related to the face morphing approach [6, 8, 10, 16, 54], the pairing protocols of morphed images [12, 15], image compression [32], the source of bona fide images [46], and re-digitization of the images [23, 40, 42], among other variations. Additionally, to the relative low generalizability of supervised MAD, their optimal training requires large-scale labelled databases with variation in the attacks, which is very challenging given the data-creating efforts and the legal limitations on using, sharing, and re-using biometric-based personal data [50].

Besides supervised MAD approaches, a single work [8] proposed to detect morphing attacks from single images by leveraging the anomaly detection technique, however, with very limited success. Damer [8] studied detecting attacks as anomalies via one-class classifiers. However, the performance of the one-class model on unknown attacks was still low, considering roughly 50% detection error rates [8]. Moreover, training the one-class classifier in fact relies on the prior knowledge that all training samples are assumed to be bona fide data even if they included contamination, and thus it is not a completely unsupervised approach. Also, using a one-class classifier, but for the out of our scope differential morphing attack detection that requires a bona fide (live) image in its operation, a recent work [25] proposed a solution that also required bona fide labelled images for the training.

In addition to the lack of large-scale, labelled, and diverse datasets, leading to poor generalizability of the MAD model, most of the existing morph attack samples in MAD datasets [11, 13, 44] are created based on a small-scale bona fide samples. This is due to the insufficient identities in suitable (ICAO compliant [26]) and publicly available datasets. Compared to the face recognition datasets [52], and even face presentation attack (spoofing) databases [56], existing MAD datasets are hundred times smaller. Besides, only a few datasets (detailed information of eight MAD datasets can be found in Section 4.1) are available for MAD development research. To enhance the generalizability on unknown morphing attacks and avoid the need for diverse morphing development databases, we leverage the publicly available face recognition datasets and present an unsupervised face MAD solution, the SPL-MAD.

3 Methodology

Figure 1: The curves of reconstruction error on two MAD test set (SMDD [9] and MorGAN-LMA [11], details in Section 4.1

) by models trained without SPL and with SPL paradigm. The x-axis refers to the training epoch, and the y-axis is the average reconstruction error of all data. The green curve denotes the error of bona fide data, and the red curve presents the error of attack data. It can be seen that attacks are easier to reconstruct than bona fide samples, thus resulting in lower reconstruction error, which we leverage in our proposed MAD solution. The SPL paradigm also leads to a higher gap in the reconstruction error between bona fide and attacks.

3.1 Preliminaries

Convolutional autoencoder (CAE): Autoencoder (AE) is a branch of unsupervised learning techniques and has been widely used for anomaly detection [47, 57]. AE consists of an encoder representing the input in a latent domain and a decoder reconstructing the data from this latent feature. Convolutional autoencoder (CAE) is designed by stacking convolutional layers, where the encoder (denote as ) is combined by several convolutional layers and the decoder (denote as ) is combined by transposed convolutional layers. Given -th input in dataset, the CAE with model parameter can be formulated as following:


where and is the parameters of the encoder and decoder, respectively.

is the reconstructed data. To measure the reconstruction quality, the most commonly used loss function for anomaly detection is mean square error (MSE), i.e.,

. Thus, the training objective of CAE can be constructed as:


Self-paced learning (SPL): we introduce first the conventional SPL as preliminaries based on [20, 30]. Given a training dataset with samples, where is the -th sample and is the learning target (i.e., ground truth label). In our case, we use a fully convolutional autoencoder (CAE) to reconstruct the input image, that is, is equal to input . A learned model is denoted as where is the model parameter. Let

denote the loss function that computes the cost between estimated data

and target object of -th sample. The sample weights is denoted as . Then, the objective of SPL can be presented as a union of a weighted loss term on all samples and a general self-paced regularizer imposed on sample weights. The object of SPL can be expressed as the following minimization problem:


where is the self-paced regularizer with a penalty parameter that controls the learning space. Alternative search strategy (ASS) [30] is generally used for solving Eq. 3 that alternatively optimizes model weight and sample weight while keeping the other fixed. For example, given a fixed , the minimization over is a weighted loss minimization problem which is independent of regularizer . Through such a jointly learning process of and by ASS with gradually increasing the value of , more samples can be automatically included in the training process from easy to hard in a self-paced manner based on training losses.

3.2 Unsupervised SPL-MAD

Reconstruction behaviour exploration: In the context of anomaly detection, AE/CAE models trained on normal data are expected to produce higher reconstruction error for the abnormal data than normal data [24]. However, this assumption does not always hold as the reconstruction behavior of anomaly inputs is unclear when no anomalies exist in the training set. Zong [57] observed that abnormal data obtains somehow lower reconstruction error than normal data. Hence, we will first analyze the reconstruction behavior in the MAD task by training a CAE on bona fides and limited morphed attacks. As discussed in Section 2, the insufficient number of identities in the existing MAD datasets is one possible reason for the poor generalization of MAD performance. To address this issue, we utilize the in-the-wild CASIA-WebFace dataset [52] as our assumed to be ”mostly normal” samples, considering its diversity in capture environment, sensor, and identity. Specifically, the CASIA-WebFace dataset [52] consists of 494,414 images across 10,575 identities collected from the web and is used for face verification and identification tasks. Then, we use an additional MAD dataset, namely SMDD [9], together with CASIA-WebFace for exploration of reconstruction behavior. We select the SMDD dataset due to its privacy-friendly property and diversity in identity. The detailed information of SMDD is presented later in Section 4.1. To investigate our conclusions hold on other datasets, we also did the same using the MorGAN-LMA (landmark-based morphs) dataset [11].

Figure 1 (a) and (c) illustrate the average reconstruction errors on MorGAN-LMA and SMDD test set by a vanilla CAE model trained on the unlabeled combined dataset in each epoch. It can be clearly observed that the average reconstruction errors of attacks are lower than bona fides. Meanwhile, the error gap between bona fides and attacks consistently persists as the training continues. This finding is in stark contrast to the assumption [24] in general anomaly detection, indicating that morph attacks are easier to reconstruct. The possible reasons are: 1) the possible artefacts resulting from various morphing processes induce an ambiguity in some of the image details, which might make the image similar to a wider range of reconstructions, as they make it similar to faces of multiple identities, and thus result in a lower reconstruction error. 2) the blending artefacts existing in some morphed images might be easier to decode (less sharp information) from attack encoding. Overall, such feasible error gaps prove that the morphs can be detected in an unsupervised manner even if the model is trained on data, including polluted attack data.

SPL-MAD: Despite the error gaps between bona fides and attacks observed in Figure 1

, the gaps become gradually smaller as the training continues. This is probably caused by the model learning anomalous patterns from the polluted attack samples as the training continues and thus leading to a degraded ability of the model to remove anomalies. To address this concern, we propose to incorporate the SPL paradigm into the training, aiming to continually remove the suspicious attacks in the training phase. As we introduced before, the SPL paradigm consists of a problem-specific weighted term on all samples and an SPL regularizer on sample weights. Due to such ability of weight adjustment, SPL can enhance the robustness and generalizability of the model in polluted data. Therefore, we incorporate the SPL paradigm into our unsupervised MAD learning by assigning smaller weights to suspicious morphed attacks. Our SPL-MAD solution is defined by:


where is the reconstructed image and is the reconstruction loss (MSE in our case). The Eq. 4 is optimized by ASS. First, when sample weight in the regularizer is fixed, the minimization over is a weighted loss minimization problem and the optimal model is determined as:


The Eq. 5 is solved by gradient descent in the training phase. Alternatively, given model parameter , the optimal weight of the -th sample is computed by:


Based on the observation in Figure 1 that lower reconstruction losses are achieved by morphs than by bona fides at the beginning of training and a smaller difference between bona fides and attacks in the later epochs of training, we adapted general SPL rules to meet our needs: 1) is monotonically increasing w.r.t , the reconstruction loss of the -th sample, which guides the model to select potential bona fide samples with larger losses in favor of suspicious morphs (morphs or images that have properties similar to a morphed image) with smaller losses. 2) is monotonically decreasing w.r.t , which means that a larger has a higher tolerance to the losses (i.e., smaller sample weight) and can remove more suspicious samples. 3) is convex w.r.t to ensure the soundness of SPL regularizer for optimization. Therefore, we selected a linear SPL regularizer proposed in [28] and modified its close-formed optimal solution formulation as:


Thus, Eq. 7 helps our unsupervised training process: the sample weight of data with a loss smaller than (suspicious samples) is set to zero; the data with a large loss is assigned with a relatively large sample weight, which encourages CAE model to focus on the learning of potential of normal (assumed to be bona fide) samples.

Furthermore, we determine a self-adaptive by considering the reconstruction errors in each training step. We gradually increase the to the maximum with the increasing training step by following the equation:


where and

denote the mean and standard deviation of all data in the current training step

. is the constant initial standard deviation range, and is the constant shrink rate. To be consistent with our modified SPL learning needs, is usually set to a small value, where only a few easy samples with very low reconstruction losses are removed at the early training stage. In our case, we assume that the majority of (or all) training samples are normal (assumed to be bona fide) samples. Therefore, the coefficient of is limited to minimum 1 (i.e., in ) to maintain the majority of the data. Our SPL-MAD paradigm is presented in Algorithm 1.

Input : Input unlabeled data , a CAE model with parameters , training step size
Output : Updated model parameters
1 Initialize model parameter , sample weights and repeat
2       Randomly sample a batch size of data from ;
3       Calculate the reconstruction loss by forward propagation ;
4       Compute based on step-size by Eq. 8 ;
5       ;
6       Updated sample weights by Eq. 7 ;
7       Update model parameters by Eq. 5 ;
9until convergence;
Algorithm 1 The proposed SPL-MAD algorithm

4 Experimental setup

4.1 Datasets

To evaluate the generalizability of our MAD solutions on unknown attacks, we use eight publicly available MAD datasets: LMA-DRD (PS) [13], LMA-DRD (D) [13], MorGAN-LMA [11], MorGAN-GAN [11], FRLL-Morphs [44], FERET-Morphs [44], FRGC-Morphs [44], and one synthetic morphing attack detection development dataset (SMDD) [9]. It should be noted that our model is trained on CASIA-WebFace dataset [52] which is used for face recognition tasks, and does not contain any information on the images being manipulated or not (morphed, beautified, re-digitized, and so on.). Furthermore, a privacy-friendly SMDD training set is added to CASIA-WebFace to explore the effect of unlabeled attack contamination in the unsupervised training phase.

LMA-DRD (D) and LMA-DRD (PS): The bona fide samples in LMA-DRD (D) and LMA-DRD (PS) are selected from VGGFace2 dataset [4] and morphed attack samples are created by OpenCV morphing [33] and following parametrization in [41] into two ways: ”D” refers to digital and ”PS” refers to re-digitized (print and scan). In our case, we only use the test set for unknown MAD, which consists of 123 bona fide and 88 morphs, for each D and PS.

MorGAN-LMA and MorGAN-GAN: The bona fide samples in MorGAN are selected from CelebA [31] and morphs are created by either OpenCV landmark-based morphing [33] (denoted as MorGAN-LMA) or GAN-based solution presented in [11] (denoted as MorGAN-GAN). Each of the test sets contains 750 bona fides and 500 morphs. Note that the generated faces by the GAN-based solution in this dataset are of the relatively low resolution of pixels.

FRLL-, FERET-, and FRGC-Morphs: The morph samples in the FRLL-Morphs [44] dataset is generated based on the publicly available Face Research London Lab dataset [18]. FRLL-Morphs contains five different morphing methods: OpenCV [33], FaceMorpher [39], StyleGAN2 [29, 49], and WebMorpher [17], and AMSL [35]. FRLL-Morphs is created for the evaluation of attack vulnerability and MAD performance (i.e., containing only a test set) and is thus suitable in our case for evaluating our solution. Each test set contains 204 bona fides and 1,222 attacks (per morphing type). The FERET-Morphs [44] datasets contains bona fide images from FERET [37] and three types of morph attacks generated from OpenCV [33], FaceMorpher [39], and StyleGAN2 [29, 49]. The test set contains 1,361 bona fides and between 523 and 525 attacks per morph type. Similarly, the FRGC-Morphs [44] datasets contains bona fide images from FRGC v2.0 [38] and three types of morph attacks generated from OpenCV [33], FaceMorpher [39], and StyleGAN2 [29, 49]. The test set contains 3,069 bona fides and between 961 and 963 attacks per morph type.

SMDD: The SMDD [9] dataset is a synthetic-based MAD database that bona fide images are created by StyleGAN2-ADA [29, 49] and morph attacks are generated based on such bona fide by using OpenCV morphing technique [33]. The training set and test set contain 25,000 bona fides and 15,000 morphs, each. Their extensive results showed that SMDD could be served as an effective training set for the MAD model, even when the MAD model encounters attacks created with unknown morphing techniques or data sources. Moreover, by considering the privacy-friendly characteristic and diversity of the SMDD dataset, we use the SMDD training set for further attack contamination exploration.

For the datasets that were also explored as the training data for supervised MAD methods in the experiments, only their respective train set was used for training, and only their test sets were used for evaluation. Both sets are identity-disjoint in all the datasets.

4.2 Implementation details

We use a CAE model consisting of seven convolutional blocks as our SPL-MAD backbone. We use a large-scale face dataset CASIA-WebFace [52] for face verification and identification task to train the CAE model. All the databases are cropped using landmark points obtained from Multi-task Cascaded Convolutional Networks (MTCNN) [55]. All crops are resized to to match the network input size. Overall, the input image size is

for training. In the training phase, we use the stochastic gradient descent (SGD) optimizer with a momentum of 0.9 and a weight decay of 5e-4, and the initial learning rate is 1e-5. The degradation of the learning rate is controlled by an exponential learning scheduler with a gamma of 0.98. The batch size in our experiment is 64, and the number of training epochs is 25. The first five epoch is used for warm-up without SPL. The implementation is based on in PyTorch toolbox

[36]. The initial standard deviation range and the shrink rate are set to 4 and 5e-3, respectively.

4.3 Evaluation metrics

We follow the standard definitions in ISO/IEC 30107-3 [27] to evaluate MAD performance, that is, Attack Presentation Classification Error Rate (APCER) and Bona fide presentation Classification Error Rate (BPCER). APCER is the proportion of attack samples misclassified as bona fides, and BPCER is the proportion of bona fide samples misclassified as attacks. To report the overall MAD performance and for comparison with other well-established MAD solutions, we report the equal error rate (EER) value where APCER and BPCER are equal. Furthermore, we plot receiver operating characteristic (ROC) curves where the x-axis is APCER, and the y-axis is (1-BPCER) at different operation points to give a visual evaluation on a wider range.

5 Results

In this section, we first analyze the experimental results of our developed SPL-MAD approach from two aspects: the contribution of the adapted SPL paradigm and the robustness of our approach on morphing attack contamination. Then, to put the performance of our unsupervised solution in the perspective of the supervised MAD performance, we compare our experimental results with three diverse supervised and well-performing MAD methods trained on five different sets of training data, totalling 15 supervised MAD solutions.

[innerwidth=*1/4]Test dataTrain data CASIA-Web CASIA-Web + SMDD
LMA-DRD D 31.81 18.18 22.73 20.45
PS 26.14 22.73 29.54 29.54
MorGAN LMA 42.95 14.46 15.86 15.06
GAN 61.04 40.56 44.78 39.35
FRLL-Morphs OpenCV 35.36 3.63 4.71 5.78
FaceMorpher 31.66 2.98 2.87 4.67
StyleGAN2 34.69 15.14 12.11 12.92
WebMorph 37.51 12.29 11.22 15.72
AMSL 36.22 11.22 8.87 12.09
FERET-Morphs OpenCV 45.24 32.13 36.14 30.21
FaceMorpher 39.81 27.69 36.14 25.76
StyleGAN2 41.52 32.57 34.28 28.95
FRGC-Morphs OpenCV 48.23 36.11 21.62 19.54
FaceMorpher 47.24 23.99 19.67 18.42
StyleGAN2 46.62 36.79 19.63 15.57
SMDD 29.59 7.60 7.01 9.19
Average performance 39.73 21.13 20.45 18.95
Table 1: The MAD performance in terms of EER (%) for the ablation study on SPL paradigm and on data contamination. The bold numbers indicate the lowest EER values in two training data protocols: face recognition dataset CASIA-WebFace [52], and CASIA-WebFace with a MAD dataset SMDD [9]. ’-’ refers to the baseline model training without SPL paragiam, and ’SPL’ refers to the training with SPL.
Figure 2: ROC curves achieved on different test sets and four different training settings. In most cases, the curves indicate better MAD performance when including our modified SPL paradigm and when including data contamination by SMDD.

5.1 Ablation study on SPL paradigm

To illustrate the effectiveness of the SPL paradigm within our SPL-MAD, we trained an unsupervised CAE model without SPL (denoted as baseline) and with SPL (i.e., SPL-MAD), respectively. Table 1 shows the comparison results on various test sets. The lowest EER values on each training data protocol (i.e., CASIA-WebFace and CASIA-WebFace with SMDD [9]) are shown in bold, respectively. From the results, we can observe that: 1) Including SPL in the training paradigm (i.e, SPL-MAD) consistently outperformed the training without SPL when training on uncontaminated data. 2) Similarly, including SPL in the training paradigm also outperformed the model trained without SPL on most test sets when the training data was contaminated with the SMDD data. 3) Overall, a model trained with the SPL paradigm yields better average performance than without SPL. For example, EER is decreased from 39.73% obtained by the baseline model to 21.13% by SPL-MAD and decreased from 20.45% to 18.95% in two training data protocols, respectively. Such observations point out the significant contribution of the SPL paradigm to the improvement of the MAD performance. In addition to quantitative results, the larger reconstruction error gaps between bona fides and attacks in Figure 1 support our conclusion and our motivation behind using the SPL paradigm. As shown in Figure 1, the SPL-MAD model achieved a larger gap between attack and bona fide than the baseline model on both MorGAN-LMA (error gap increases from 0.62 to 1.46) and SMDD dataset (error gap increases from 3.26 to 5.07). Furthermore, we plot the ROC curves for additional visual analysis. The ROC curves in Figure 2 show that the blue and red curves of SPL-MAD model are placed above the green and yellow curves of the baseline models (no SPL) in most cases. Only for the FRLL-based attacks, and only in the case of contaminated training data, the benefit of the SPL is not as clear as both with SPL and without SPL training paradigms achieve very close performances. These graphical observations are consistent with the previous quantitative findings. In general, the SPL module plays an important role in our unsupervised MAD learning and enhances the overall improvement of MAD performance, as theorized earlier in Section 3.

5.2 Ablation study on data contamination

To increase the diversity of bona fide identities and thus enhance the generalizability of MAD models, we leverage the publicly available face recognition datasets. However, most face recognition datasets were collected in the wild and might unintentionally include manipulated images. As a result, we provide two training data protocols by adding contaminated attacks to further demonstrate the robustness and effectiveness of our method trained purely on unlabeled data. One training set is a pure face recognition dataset CASIA-WebFace [52], and the other one is a combined training set consisting of the face recognition dataset and a face MAD training set (including both bona fides and attacks), that is CASIA-WebFace [52] and SMDD [9], respectively. The ratio of bona fide to attack data in the combined dataset is around 35:1. We stress again that our unsupervised attack does not consider any labels during training, even for the contamination data. Table 1 shows the comparison results in terms of the EER value obtained by the model trained on both training data protocols. It can be observed that baseline and SPL-MAD model trained with contaminated data gains 19.33% and 2.187% overall performance improvement over models trained on uncontaminated CASIA-WebFace. The reason behind such improvement might be that more data is included in the training phase, and the majority of training data are still bona fides. This observation suggests that our unsupervised MAD method sees a steady rise in the performance irrespective of the attack contamination. Moreover, the ROC curves in Figure 2 confirms our quantitative findings and indicate that our unsupervised model proves to be effective even under data contamination scenario in most cases.

[innerwidth=*1/4]Test dataTrain data Supervised Unsupervised
MixFaceNet PW-MAD Inception SPL-MAD
LMAD-DRD D 15.68 18.03 17.06 25.01 19.42 20.8 25.1 22.34 40.21 17.06 7.64 17.06 15.68 50.77 15.11 20.45
PS 21.77 18.44 27.05 27.05 23.72 26.48 23.72 29.41 44.11 20.39 11.37 12.75 22.34 38.42 19.01 29.54
MorGAN LMA 39.42 22.89 10.61 46.42 30.12 34.2 34.14 9.71 34.37 27.31 38.55 31.73 8.43 40.16 28.51 15.06
GAN 53.01 50.44 42.57 24.9 42.64 52.04 46.59 42.8 8.84 43.78 50.84 38.79 27.41 0.4 44.34 39.35
FRLL-Morphs OpenCV 8.82 13.22 8.91 17.66 4.39 17.33 15.69 13.96 45.59 2.42 13.72 10.76 6.86 55.89 5.38 5.78
FaceMorpher 7.80 10.97 7.34 15.65 3.87 13.88 15.14 10.92 44.57 2.20 16.62 15.81 6.32 66.14 3.17 4.67
StyleGAN2 20.07 15.29 13.41 23.51 8.89 29.97 27.64 18.11 48.53 16.64 37.24 19.58 20.56 55.03 11.37 12.92
WebMorph 25.97 29.04 20.61 30.39 12.35 33.78 28.51 35.75 52.43 16.65 57.38 58.32 30.88 77.42 9.86 15.72
AMSL 24.53 27.59 19.24 30.03 15.18 36.25 32.95 34.38 48.52 15.18 49.02 61.44 9.80 86.49 10.79 12.09
FERET-Morphs OpenCV 28.12 32.19 31.57 33.86 31.74 37.27 45.29 34.27 43.11 39.93 6.39 7.23 42.12 13.62 59.32 30.21
FaceMorpher 22.57 29.48 27.9 31.81 23.69 35.16 44.3 28.24 40.4 29.41 5.17 6.91 36.53 18.36 46.94 25.76
StyleGAN2 29.57 29.02 35.46 39.41 39.85 44.25 45.3 29.7 42.47 47.2 9.03 7.12 35.29 15.09 60.05 28.95
FRGC-Morphs OpenCV 23.81 25.04 31.62 21.11 20.67 57.06 48.6 29.74 53.55 26.45 34.32 13.65 36.17 59.66 19.63 19.54
FaceMorpher 22.83 23.54 29.38 19.98 18.10 56 50.7 30.49 51.61 23.4 34.96 19.71 35.1 56.91 16.06 18.42
StyleGAN2 32.71 28.68 21.7 21.95 11.62 37.38 38.42 16.43 26.62 14.32 41.14 25.85 36.19 47.03 15.26 15.57
SMDD 10.34 9.26 5.11 11.69 2.51 15.7 13.45 7.82 36.25 0.79 6.42 21.88 12.49 38.38 0.42 9.19
Average performance 24.76 24.31 22.60 26.37 20.42 35.12 34.12 25.62 43.49 22.82 27.48 23.72 24.92 49.17 24.32 18.95
Table 2: The MAD performance in terms of EER (%) in comparison with results of three supervise-learning-based and well-performing MAD solutions: MixFaceNet [2], PW-MAD [13], Inception [43] trained on five datasets. The results under intra-dataset evaluation are marked with , and the best MAD performance among 16 MAD models is denoted in bold. Note that for a fair comparison, the intra-dataset results are neglected when calculating the average performance, and our unsupervised SPL-MAD results are reported under the data contamination scenario.
Figure 3: The box and whisker plot of the performance variation per each model. Each box represents a model trained on different datasets. The x-axis is the trained dataset (D, PS, LMA, GAN, and S represent LMA-DRD (D), LMA-DRD (PS), MorGAN-LMA, MorGAN-GAN, SMDD, and an ”our” unsupervised SPL-MAD trained on a combined set of CASIA-WebFace and SMDD). Four colors indicate the used MAD models, i.e., MixFaceNet, PW-MAD, Inception, and our proposed SPL-MAD. The y-axis is the EER (%) value, and the red line within the box refers to the mean value of each model on all test sets (as in table 2), the box range represents the standard deviation, and the ”o” signs are extreme outliers. Note the low average EER value of our SPL-MAD, along with its low deviation and lack of extreme outlier performances.

5.3 Comparison to supervised MAD

Furthermore, to put our unsupervised SPL-MAD in the perspective of the supervised MAD performances, Table 2 presents the performance comparison of our method (i.e., SPL-MAD trained on contaminated data) to other three supervised single image MAD solutions [9, 13, 43]. The results under the intra-dataset scenario are marked with (which is expected to perform unfairly good), and the best average performance (lowest EER) among all models is in bold. It should be noted that we only consider the results in the cross-dataset setting when calculating the average performance for a fair comparison. From the results, we can conclude that our unsupervised SPL-MAD achieved comparable results with supervised methods in most test set cases. When considering the average performance on all datasets, our method obtains the lowest EER value (18.95%), where the second-lowest EER is 20.42%. The improvement in MAD performance can be attributed to the large-scale training data enabled by the unsupervised nature of the approach and the adaption of the SPL paradigm. In Figure 3, we plot the mean and standard deviation of the EER values achieved by our proposed SPL-MAD, and the reported 15 supervised MADs on all the testing datasets (excluding intra-dataset tests). The plot stresses that the average MAD performance of the proposed approach is comparable and better than the supervised methods. However, more importantly, as expected from an unsupervised approach, it performs more consistently (low deviation with no extreme outliers) than supervised methods that commonly fail when facing unknown morphing attacks.

6 Conclusion

In this work, we proposed a novel completely unsupervised MAD solution via self-paced anomaly detection (SPL-MAD), which benefits from the unsupervised nature of autoencoders training and the property of self-paced learning that automatically evaluates the training data without any prior knowledge. First, to address the lack of diversity and quantity of MAD datasets, we leverage the existing large-scale face recognition datasets to train our unsupervised model. In addition, we build a solid theoretical foundation for designing our unsupervised solution by empirically analyzing the reconstruction behavior on MAD data. This also led us to propose integrating our adaptive self-learning paradigm to improve the separability of reconstruction errors between bona fide and attack samples in a fully unsupervised manner. The experimental results demonstrated on a diverse set of MAD data indicated the higher generalizability of our unsupervised SPL-MAD solution in comparison to a wide range of supervised MAD solutions in dealing with morphing attacks created from unknown morphing techniques or data sources.


This research work has been funded by the German Federal Ministry of Education and Research and the Hessen State Ministry for Higher Education, Research and the Arts within their joint support of the National Research Center for Applied Cybersecurity ATHENE.


  • [1] Y. Bengio, J. Louradour, R. Collobert, and J. Weston. Curriculum learning. In

    Proceedings of the 26th Annual International Conference on Machine Learning, ICML 2009, Montreal, Quebec, Canada, June 14-18, 2009

    , pages 41–48, 2009.
  • [2] F. Boutros, N. Damer, M. Fang, F. Kirchbuchner, and A. Kuijper. Mixfacenets: Extremely efficient face recognition networks. In International IEEE Joint Conference on Biometrics, IJCB 2021, Shenzhen, China, August 4-7, 2021, pages 1–8. IEEE, 2021.
  • [3] F. Boutros, N. Damer, F. Kirchbuchner, and A. Kuijper. Elasticface: Elastic margin loss for deep face recognition. In

    IEEE Conference on Computer Vision and Pattern Recognition Workshops, CVPR Workshops 2022

    , page 1, 2022.
  • [4] Q. Cao, L. Shen, W. Xie, O. M. Parkhi, and A. Zisserman. Vggface2: A dataset for recognising faces across pose and age. In 13th IEEE International Conference on Automatic Face & Gesture Recognition, FG 2018, Xi’an, China, May 15-19, 2018, pages 67–74. IEEE Computer Society, 2018.
  • [5] N. Damer, V. Boller, Y. Wainakh, F. Boutros, P. Terhörst, A. Braun, and A. Kuijper. Detecting face morphing attacks by analyzing the directed distances of facial landmarks shifts. In Pattern Recognition - 40th German Conference, GCPR 2018, Stuttgart, Germany, October 9-12, 2018, Proceedings, volume 11269 of Lecture Notes in Computer Science, pages 518–534. Springer, 2018.
  • [6] N. Damer, F. Boutros, A. M. Saladie, F. Kirchbuchner, and A. Kuijper. Realistic dreams: Cascaded enhancement of gan-generated images with an example in face morphing attacks. In BTAS, pages 1–10. IEEE, 2019.
  • [7] N. Damer and K. Dimitrov. Practical view on face presentation attack detection. In BMVC. BMVA Press, 2016.
  • [8] N. Damer, J. H. Grebe, S. Zienert, F. Kirchbuchner, and A. Kuijper.

    On the generalization of detecting face morphing attacks as anomalies: Novelty vs. outlier detection.

    In BTAS, pages 1–5. IEEE, 2019.
  • [9] N. Damer, C. A. F. López, M. Fang, N. Spiller, M. V. Pham, and F. Boutros. Privacy-friendly synthetic data for the development of face morphing attack detectors. In IEEE Conference on Computer Vision and Pattern Recognition Workshops, CVPR Workshops 2022, page 1, 2022.
  • [10] N. Damer, K. B. Raja, M. Süßmilch, S. Venkatesh, F. Boutros, M. Fang, F. Kirchbuchner, R. Ramachandra, and A. Kuijper. Regenmorph: Visibly realistic GAN generated face morphing attacks by attack re-generation. In Advances in Visual Computing - 16th International Symposium, ISVC 2021, Virtual Event, October 4-6, 2021, Proceedings, Part I, volume 13017 of Lecture Notes in Computer Science, pages 251–264. Springer, 2021.
  • [11] N. Damer, A. M. Saladie, A. Braun, and A. Kuijper.

    MorGAN: Recognition vulnerability and attack detectability of face morphing attacks created by generative adversarial network.

    In BTAS, pages 1–10. IEEE, 2018.
  • [12] N. Damer, A. M. Saladie, S. Zienert, Y. Wainakh, P. Terhörst, F. Kirchbuchner, and A. Kuijper. To detect or not to detect: The right faces to morph. In ICB, pages 1–8. IEEE, 2019.
  • [13] N. Damer, N. Spiller, M. Fang, F. Boutros, F. Kirchbuchner, and A. Kuijper. PW-MAD: pixel-wise supervision for generalized face morphing attack detection. In Advances in Visual Computing - 16th International Symposium, ISVC 2021, Virtual Event, October 4-6, 2021, Proceedings, Part I, volume 13017 of Lecture Notes in Computer Science, pages 291–304. Springer, 2021.
  • [14] N. Damer, Y. Wainakh, V. Boller, S. von den Berken, P. Terhörst, A. Braun, and A. Kuijper. Crazyfaces: Unassisted circumvention of watchlist face identification. In BTAS, pages 1–9. IEEE, 2018.
  • [15] N. Damer, S. Zienert, Y. Wainakh, A. M. Saladie, F. Kirchbuchner, and A. Kuijper. A multi-detector solution towards an accurate and generalized detection of face morphing attacks. In 22th International Conference on Information Fusion, FUSION 2019, Ottawa, ON, Canada, July 2-5, 2019, pages 1–8. IEEE, 2019.
  • [16] L. Debiasi, N. Damer, A. M. Saladie, C. Rathgeb, U. Scherhag, C. Busch, F. Kirchbuchner, and A. Uhl. On the detection of gan-based face morphs using established morph detectors. In Image Analysis and Processing - ICIAP 2019 - 20th International Conference, Trento, Italy, September 9-13, 2019, Proceedings, Part II, volume 11752 of Lecture Notes in Computer Science, pages 345–356. Springer, 2019.
  • [17] L. DeBruine. debruine/webmorph: Beta release 2. Jan. 2018.
  • [18] L. DeBruine and B. Jones. Face research lab london set, May 2017.
  • [19] J. Deng, J. Guo, N. Xue, and S. Zafeiriou. Arcface: Additive angular margin loss for deep face recognition. In CVPR, pages 4690–4699. Computer Vision Foundation / IEEE, 2019.
  • [20] Y. Fan, R. He, J. Liang, and B. Hu. Self-paced learning: An implicit regularization perspective. In

    Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, February 4-9, 2017, San Francisco, California, USA

    , pages 1877–1883, 2017.
  • [21] M. Fang, N. Damer, F. Kirchbuchner, and A. Kuijper. Real masks and spoof faces: On the masked face presentation attack detection. Pattern Recognit., 123:108398, 2022.
  • [22] M. Ferrara, A. Franco, and D. Maltoni. On the effects of image alterations on face recognition accuracy. In T. Bourlai, editor, Face Recognition Across the Imaging Spectrum, pages 195–222. Springer, 2016.
  • [23] M. Ferrara, A. Franco, and D. Maltoni. Face morphing detection in the presence of printing/scanning and heterogeneous image sources. IET Biometrics, 10(3):290–303, 2021.
  • [24] M. Hasan, J. Choi, J. Neumann, A. K. Roy-Chowdhury, and L. S. Davis. Learning temporal regularity in video sequences. In 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pages 733–742, 2016.
  • [25] M. Ibsen, L. J. Gonzalez-Soler, C. Rathgeb, P. Drozdowski, M. Gomez-Barrero, and C. Busch. Differential anomaly detection for facial images. In 2021 IEEE International Workshop on Information Forensics and Security (WIFS), pages 1–6, 2021.
  • [26] International Civil Aviation Organization, ICAO. Machine readable passports – part 9 – deployment of biometric identification and electronic storage of data in eMRTDs. Civil Aviation Organization (ICAO), 2015.
  • [27] International Organization for Standardization. ISO/IEC DIS 30107-3:2016: Information Technology – Biometric presentation attack detection – P. 3: Testing and reporting, 2017.
  • [28] L. Jiang, D. Meng, T. Mitamura, and A. G. Hauptmann. Easy samples first: Self-paced reranking for zero-example multimedia search. In Proceedings of the ACM International Conference on Multimedia, MM ’14, Orlando, FL, USA, November 03 - 07, 2014, pages 547–556. ACM, 2014.
  • [29] T. Karras, M. Aittala, J. Hellsten, S. Laine, J. Lehtinen, and T. Aila. Training generative adversarial networks with limited data. In Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020, December 6-12, 2020, virtual, 2020.
  • [30] M. P. Kumar, B. Packer, and D. Koller. Self-paced learning for latent variable models. In Advances in Neural Information Processing Systems 23: 24th Annual Conference on Neural Information Processing Systems 2010. Proceedings of a meeting held 6-9 December 2010, Vancouver, British Columbia, Canada, pages 1189–1197, 2010.
  • [31] Z. Liu, P. Luo, X. Wang, and X. Tang. Deep learning face attributes in the wild. In 2015 IEEE International Conference on Computer Vision, ICCV 2015, Santiago, Chile, December 7-13, 2015, pages 3730–3738. IEEE Computer Society, 2015.
  • [32] A. Makrushin, T. Neubert, and J. Dittmann. Automatic generation and detection of visually faultless facial morphs. In VISIGRAPP (6: VISAPP), pages 39–50. SciTePress, 2017.
  • [33] S. Mallick. Face morph using opencv — c++ / python. LearnOpenCV, 1(1), 2016.
  • [34] F. V. Massoli, F. Carrara, G. Amato, and F. Falchi. Detection of face recognition adversarial attacks. Comput. Vis. Image Underst., 202:103103, 2021.
  • [35] T. Neubert, A. Makrushin, M. Hildebrandt, C. Kraetzer, and J. Dittmann. Extended stirtrace benchmarking of biometric and forensic qualities of morphed face images. IET Biometrics, 7(4):325–332, 2018.
  • [36] A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen, Z. Lin, N. Gimelshein, L. Antiga, A. Desmaison, A. Kopf, E. Yang, Z. DeVito, M. Raison, A. Tejani, S. Chilamkurthy, B. Steiner, L. Fang, J. Bai, and S. Chintala. Pytorch: An imperative style, high-performance deep learning library. In H. Wallach, H. Larochelle, A. Beygelzimer, F. dAlché-Buc, E. Fox, and R. Garnett, editors, Advances in Neural Information Processing Systems 32, pages 8024–8035. Curran Associates, Inc., 2019.
  • [37] P. Phillips, H. Wechsler, J. Huang, and P. J. Rauss. The feret database and evaluation procedure for face-recognition algorithms. In Image and Vision Computing, volume 16, pages 295–306, 1998.
  • [38] P. J. Phillips, P. J. Flynn, W. T. Scruggs, K. W. Bowyer, J. Chang, K. Hoffman, J. Marques, J. Min, and W. J. Worek. Overview of the face recognition grand challenge. In 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2005), 20-26 June 2005, San Diego, CA, USA, pages 947–954, 2005.
  • [39] A. Quek. Facemorpher. 2019.
  • [40] R. Raghavendra, K. B. Raja, and C. Busch. Detecting morphed face images. In BTAS, pages 1–7. IEEE, 2016.
  • [41] R. Raghavendra, K. B. Raja, S. Venkatesh, and C. Busch. Face morphing versus face averaging: Vulnerability and detection. In IJCB, pages 555–563. IEEE, 2017.
  • [42] R. Raghavendra, K. B. Raja, S. Venkatesh, and C. Busch. Transferable deep-cnn features for detecting digital and print-scanned morphed face images. In 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops, CVPR Workshops 2017, Honolulu, HI, USA, July 21-26, 2017, pages 1822–1830. IEEE Computer Society, 2017.
  • [43] R. Ramachandra, S. Venkatesh, K. B. Raja, and C. Busch. Detecting face morphing attacks with collaborative representation of steerable features. In CVIP (1), volume 1022 of AISC, pages 255–265. Springer, 2018.
  • [44] E. Sarkar, P. Korshunov, L. Colbois, and S. Marcel. Vulnerability analysis of face morphing attacks from landmarks and generative adversarial networks. arXiv preprint, Oct. 2020.
  • [45] U. Scherhag, R. Raghavendra, K. B. Raja, M. Gomez-Barrero, C. Rathgeb, and C. Busch. On the vulnerability of face recognition systems towards morphed face attacks. In IWBF, pages 1–6. IEEE, 2017.
  • [46] U. Scherhag, C. Rathgeb, and C. Busch. Performance variation of morphed face image detection algorithms across different datasets. In IWBF, pages 1–6. IEEE, 2018.
  • [47] S. Soleymani, A. Dabouei, F. Taherkhani, J. Dawson, and N. M. Nasrabadi. Mutual information maximization on disentangled representations for differential morph detection. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pages 1731–1741, 2021.
  • [48] S. Soleymani, A. Dabouei, F. Taherkhani, J. M. Dawson, and N. M. Nasrabadi. Mutual information maximization on disentangled representations for differential morph detection. In IEEE Winter Conference on Applications of Computer Vision, WACV 2021, Waikoloa, HI, USA, January 3-8, 2021, pages 1730–1740. IEEE, 2021.
  • [49] S. Venkatesh, H. Zhang, R. Ramachandra, K. B. Raja, N. Damer, and C. Busch. Can GAN generated morphs threaten face recognition systems equally as landmark based morphs? - vulnerability and detection. In 8th International Workshop on Biometrics and Forensics, IWBF 2020, Porto, Portugal, April 29-30, 2020, pages 1–6. IEEE, 2020.
  • [50] P. Voigt and A. v. d. Bussche. The EU General Data Protection Regulation (GDPR): A Practical Guide. Springer, 1st edition, 2017.
  • [51] L. Xiang, G. Ding, and J. Han. Learning from multiple experts: Self-paced knowledge distillation for long-tailed classification. In Computer Vision - ECCV 2020 - 16th European Conference, Glasgow, UK, August 23-28, 2020, Proceedings, Part V, pages 247–263, 2020.
  • [52] D. Yi, Z. Lei, S. Liao, and S. Z. Li. Learning face representation from scratch. CoRR, abs/1411.7923, 2014.
  • [53] D. Zhang, J. Han, L. Zhao, and D. Meng. Leveraging prior-knowledge for weakly supervised object detection under a collaborative self-paced curriculum learning framework. Int. J. Comput. Vis., 127(4):363–380, 2019.
  • [54] H. Zhang, S. Venkatesh, R. Ramachandra, K. B. Raja, N. Damer, and C. Busch. MIPGAN - generating strong and high quality morphing attacks using identity prior driven GAN. IEEE Trans. Biom. Behav. Identity Sci., 3(3):365–383, 2021.
  • [55] K. Zhang, Z. Zhang, Z. Li, and Y. Qiao. Joint face detection and alignment using multitask cascaded convolutional networks. IEEE Signal Process. Lett., 23(10):1499–1503, 2016.
  • [56] Y. Zhang, Z. Yin, Y. Li, G. Yin, J. Yan, J. Shao, and Z. Liu. Celeba-spoof: Large-scale face anti-spoofing dataset with rich annotations. In Computer Vision - ECCV 2020 - 16th European Conference, Glasgow, UK, August 23-28, 2020, Proceedings, Part XII, volume 12357 of Lecture Notes in Computer Science, pages 70–85. Springer, 2020.
  • [57] B. Zong, Q. Song, M. R. Min, W. Cheng, C. Lumezanu, D. Cho, and H. Chen.

    Deep autoencoding gaussian mixture model for unsupervised anomaly detection.

    In 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings, 2018.