An Experimental Evaluation on Deepfake Detection using Deep Face Recognition

10/04/2021
by   Sreeraj Ramachandran, et al.
Wichita State University
0

Significant advances in deep learning have obtained hallmark accuracy rates for various computer vision applications. However, advances in deep generative models have also led to the generation of very realistic fake content, also known as deepfakes, causing a threat to privacy, democracy, and national security. Most of the current deepfake detection methods are deemed as a binary classification problem in distinguishing authentic images or videos from fake ones using two-class convolutional neural networks (CNNs). These methods are based on detecting visual artifacts, temporal or color inconsistencies produced by deep generative models. However, these methods require a large amount of real and fake data for model training and their performance drops significantly in cross dataset evaluation with samples generated using advanced deepfake generation techniques. In this paper, we thoroughly evaluate the efficacy of deep face recognition in identifying deepfakes, using different loss functions and deepfake generation techniques. Experimental investigations on challenging Celeb-DF and FaceForensics++ deepfake datasets suggest the efficacy of deep face recognition in identifying deepfakes over two-class CNNs and the ocular modality. Reported results suggest a maximum Area Under Curve (AUC) of 0.98 and an Equal Error Rate (EER) of 7.1 on the Celeb-DF dataset. This EER is lower by 16.6 obtained for the two-class CNN and the ocular modality on the Celeb-DF dataset. Further on the FaceForensics++ dataset, an AUC of 0.99 and EER of 2.04 obtained. The use of biometric facial recognition technology has the advantage of bypassing the need for a large amount of fake data for model training and obtaining better generalizability to evolving deepfake creation techniques.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

04/08/2022

On Improving Cross-dataset Generalization of Deepfake Detectors

Facial manipulation by deep fake has caused major security risks and rai...
10/03/2019

Vulnerability of Face Recognition to Deep Morphing

It is increasingly easy to automatically swap faces in images and video ...
09/24/2020

BWCFace: Open-set Face Recognition using Body-worn Camera

With computer vision reaching an inflection point in the past decade, fa...
05/29/2021

Beyond the Spectrum: Detecting Deepfakes via Re-Synthesis

The rapid advances in deep generative models over the past years have le...
01/08/2019

FakeCatcher: Detection of Synthetic Portrait Videos using Biological Signals

As we enter into the AI era, the proliferation of deep learning approach...
08/26/2020

How Do the Hearts of Deep Fakes Beat? Deep Fake Source Detection via Interpreting Residuals with Biological Signals

Fake portrait video generation techniques have been posing a new threat ...
05/01/2021

Deep Insights of Deepfake Technology : A Review

Under the aegis of computer vision and deep learning technology, a new e...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

Fig. 1: Illustration of the Deepfake detection using deep face recognition. The deep face recognition models pretrained on large scale facial recognition datasets are used for detecting deep fakes. The deepfake detection is done by comparing the authentic template of the subject with the corresponding deep fakes as in face verification.

Synthesized media, particularly deepfakes, has become a major source of concern in recent years. The now known term “deepfake” is a combination of the words “deep learning” and “fake” that was first introduced in late

, referred to as deep adversarial models (for instance, generative adversarial networks (GAN) 

[12]) that generate fake videos by swapping a person’s face with the face of another person [9] using methods such as FaceSwap and FaceShifter [18]. Current line of research [32] also include expression swapping as one of the deepfake generation techniques, such as Face2Face [30] and NeuralTextures [31]. Since then the amount of generated media using deepfakes has increased rapidly, due to the rapid development in computer vision and more accessible and capable hardware. Deepfake has been a carrier of misinformation and malice since its inception, posing a political and social threat [6].

This deep learning and computer vision usage to generate high-fidelity synthetic media has thus been a major security risk and flagged as a top AI threat [15, 5]. As a result, there has been a resurgence in research into detecting facial modifications using both data-driven deep learning and biometric anti-spoofing techniques [16, 25, 27].

The current CNN-based (such as ResNet, MesoInceptionNet, and XceptionNet) deepfake detection methods are mainly based on detecting visual artifacts created due to the resolution and color inconsistency between the warped face area and the surrounding context [19, 20, 1, 29] during image blending operation. In [19], authors proposed a general method called face X-ray to detect forgery by detecting the blending boundary of a forged image using a two-class CNN model trained end-to-end using a classification loss and a loss associated with face X-ray. This model outperformed most of the existing CNN models in deepfake detection. Studies in [4, 7] used the optical flow of facial expression as a detection cue for altered video footage detection.

Most of the aforementioned deepfake detection methods [19, 20, 1, 29, 4, 7] are training-based and therefore, obtain very high performance in intra-dataset evaluation with an AUC of about . However, they obtain poor generalization to high-quality fakes generated due to evolving deepfake generation techniques in cross-dataset evaluation, thus, obtaining an AUC of about .

Media articles such as [5, 23] suggest the use of biometric technology in identifying deepfakes. Recently in 2020, Nguyen and Derakhshani [25] used ocular-based biometric matching to distinguish between real and fake images of identities using CNNs namely, lightCNN, ResNet, DenseNet and SqueezeNet trained for ocular-based user recognition. In [22], biometric tailored loss functions [22] (such as, Center, ArcFace, and A-Softmax) are used for two-class CNN training for deepfake detection. Worth mentioning, the study in [22] used only face recognition tailored loss functions in CNN training but not face recognition technology in detecting deepfakes.

With the advances in deepfake generation techniques, visual artifacts or other inconsistencies become progressively indistinguishable

in high quality and high-resolution deepfakes, but facial features are corrupted due to the swapping and image blending operation. Therefore, face recognition may be preferable in detecting fake images by matching corrupted feature vectors of the swapped face with the original templates of the target identity. A minimal analysis was carried out in  

[17]

on a first generation dataset using 129 measures of face image quality which include signal to noise ratio, specularity, blurriness, etc., along with Support Vector Machine for deepfake detection. Studies in 

[3, 2, 11, 8] have used behavioral biometrics i.e., facial expression, head, and body movement in deepfake detection. These studies performed limited evaluation on identity swapping based deep fake generation techniques [3, 2, 11, 8].

This paper aims to evaluate the efficacy of deep face recognition111The term face recognition and face verification are used interchangeably in this study., trained using different loss functions, in identifying deepfakes generated using various methods. This study will provide us insight

into which deepfake generation techniques could be effectively detected by face recognition technology and the optimum loss functions to be used. This work has the advantage of bypassing the need for model training on fake images. However, the identity labels are required for biometric comparison (template and query image comparison) in deepfake detection. Figure 

1 illustrates deepfake detection using deep face recognition.

In summary, the main contributions of this paper are as follows:

  • To evaluate the efficiency of face recognition in identifying deepfakes generated using various methods. To this aim, deep face recognition model based on ResNet-50 pretrained on large-scale facial recognition datasets namely, MS1M-ArcFace [10] and WebFace [36] using six different loss functions (Softmax, ArcFace, Combined Loss, SphereFace, CosFace, and Triplet loss) are used for deepfake detection.

  • Use of explainable AI based t-distributed stochastic neighbor embedding (t-SNE) [33] in visualizing the effectiveness of facial recognition technology in detecting deepfakes.

  • Thorough experimental investigation on commonly adopted CelebDF [21] and FaceForensics++ [28] datasets across five different deepfake detection methods namely, FaceSwap, Face2face, FaceShifter, NeuralTextures and Deepfakes222

    The method of using autoencoders in deepfake generation is simply termed as DeepFakes in existing literature 

    [32, 24]..

This paper is organized as follows: Section II discusses the methods for deepfake generation. Section III presents the dataset and the experimental protocol. Results and discussion are presented in section IV. Conclusions and future work are discussed in section V.

Ii Methods for Deepfake Generation

In this section, we will discuss various methods used for deepfake creation [32, 24]. Depending on the level of manipulation, these deepfake creation methods can be broadly categorized as Identity Swap and Expression Swap [32].

  1. Identity Swap: Consists of replacing the face of a person in a video with the face of another person. FaceSwap333https://github.com/deepfakes/faceswap, FaceShifter [18], and Deepfakes [26] are an example of identity swapping methods for deepfake creation explained as follows:

    1. FaceSwap: An identity swapping method that transfers the face region from a source video to a target video using a graphics-based approach based on detected facial landmarks. To swap the face of the source person to the target person, it uses face alignment, Gauss-Newton optimization, and image blending.

    2. FaceShifter: FaceShifter [18] is an identity swapping method that uses a new generator with Adaptive Attentional Denormalization (AAD) layers utilized to adaptively integrate the identity and the attributes for face synthesis, and an attributes encoder used to extract multi-level target face attributes.

    3. Deepfakes: In this method, two autoencoders that share the encoder are trained to reconstruct the source and target faces. To generate the fake image, the source face’s trained encoder and decoder are applied to the target face, and the autoencoder’s output is then blended using Poisson image editing.

  2. Expression Swapping: Consists of facial reenactment by modifying the facial expression of the person.

    1. Face2Face: In Face2Face [30], a temporary face identity is established using the first few frames, and then the expression is tracked over the remaining frames. Then, by transferring the source expression parameters of each frame to the target video, fake videos were created.

    2. NeuralTextures: NeuralTextures [31] employs a rendering network with a patch-based GAN-loss, to learn a neural texture of the target individual from the source video. Only the mouth-related facial expression is altered, making it extremely difficult to detect.

Iii Dataset and Experiment Protocol

In this section, we discuss the datasets used and the experimental protocol followed for deepfake detection using face recognition technology.

Iii-a Datasets

In order to evaluate the efficiency of face recognition technology in identifying high-quality deepfakes without visual artifacts, generated using various methods. We tested our model on the high quality deepfake datasets namely Celeb-DF [21] and FaceForensics++ [28] with most realistic fake videos. These commonly adopted high quality deepfake datasets also have subject identity labels which facilitates face recognition.

These datasets are described as follows:

  • Celeb-DF: The Celeb-DF [21] deepfake forensic dataset include genuine videos from celebrities as well as deepfake videos. Celeb-DF, in contrast to other datasets, has essentially no splicing borders, color mismatch, and inconsistencies in face orientation, among other evident deepfake visual artifacts.

  • FaceForensics++: FaceForensics++ [28] is an automated benchmark for facial manipulation detection. It consists of several manipulated videos created using two different categories: Identity Swapping (FaceSwap, FaceSwap-Kowalski, FaceShifter, Deepfakes) and Expression swapping (Face2Face, and NeuralTextures). We used the FaceForensics++ dataset’s compressed test set, which has a curated list of videos for each of these deepfake creation methods.

Iii-B Experimental Protocol

We used the popular ResNet [14] architecture as it is widely used for face recognition. ResNet is a short form of residual network based on the idea of “identity shortcut connection” where input features may skip certain layers [14]. In this study, we used ResNet- which has M parameters. ResNet-50 [14] is trained from scratch on MS1M-Arcface [13, 10] and WebFace12M dataset [36]. The MS1M-Arcface dataset [10], is a cleaned version of the MS1M dataset [13], containing around million images from subjects. The WebFace12M dataset [36], is a cleaned version and subset of the complete WebFace260M dataset [36], which contains around face images from identities. The face images were detected and aligned using MTCNN [35]. MTCNN utilizes a cascaded CNNs based framework for joint face detection and alignment. The images are then resized to for both training and evaluation. The model was trained using six different loss functions i.e., ArcFace [10], CosFace [34], SphereFace [34], Combined Margin [10], Triplet loss [10] and SoftMax. ArcFace, SphereFace and CosFace loss functions learn intra-class similarity and inter-class diversity for performance enhancement.

For the ResNet-

model, the batch-normalization layer followed by the last fully connected layer of size

and the final output layer equal to the number of classes (subjects) were used. The angular margin penalty hyper-parameters , and was set to , and

for the combined margin. The networks were trained using a Stochastic Gradient Descent (SGD) optimizer with a batch size of

for epochs. The learning rate was set equal to at the onset and was divided at , iterations, and finish at iterations following the original ArcFace [10] implementation. We also set the momentum to and weight decay to . All the experiments are conducted on an Intel Xeon processor and two Nvidia Quadro RTX GPUs.

For the subject-disjoint evaluation of the pretrained deep learning face recognition models for deepfake detection on Celeb-DF [21] and FaceForensics++ [28], the gallery set consists of real face images (frames extracted from videos) per subject. Subject-disjoint protocol means that the subject identities used in the training face recognition model do not overlap with identities used for deepfake evaluation. Recall that MS1M-ArcFace and WebFace12M datasets were used for training ResNet-50 based face recognition model. The probe set consists of

real and fake face images randomly selected per subject. Depending on the dataset’s multiple videos per subject (CelebDF, DFD (subset of FF++)) or single video per subject (FF++) are used to create pairs of frames. From the gallery and probe set of all the subjects, deep features of size

are extracted using the pretrained ResNet-50 model. The genuine match score is calculated between frames extracted from real videos and gallery face images of the corresponding subject, and the imposter score is calculated between frames extracted from fake videos and gallery face images of the target identity using cosine similarity metrics. We used Equal Error Rate (EER) and Area Under the Curve (AUC) as the performance metrics for the deepfake detection using face recognition technology.

Iv Results and Discussions

(a) FaceSwap
(b) Deepfakes
(c) NeuralTextures
Fig. 2: t-SNE visualization of the deep feature embedding of genuine and fake images for FaceSwap, DeepFakes and NeuralTextures techniques for randomly selected identities on MS1M-AF [10] dataset. Similar observation was noted for other identity swapping and expression methods and on WebFace12M [36] dataset.
(a) FaceSwap
(b) Deepfakes
(c) NeuralTextures
Fig. 3: The genuine and deepfake score distribution obtained on comparing the templates with the genuine query and deepfakes, respectively, for FaceSwap, Deepfakes and NeuralTextures techniques on MS1M-AF [10] dataset. Similar observation was noted for other identity swapping and expression methods and on WebFace12M [36] dataset.
Celeb-DF FaceForensics++
Identity Swapping Methods Expression Swapping Methods

Loss Deepfakes Deepfakes FaceShifter FaceSwap-K FaceSwap Face2Face NeuralTextures
AUC EER AUC EER AUC EER AUC EER AUC EER AUC EER AUC EER

MS1M-AF

Softmax
Arcface
Combined
SphereFace
CosFace
Triplet

WebFace12M

Softmax
Arcface
Combined
SphereFace
CosFace
Triplet

Nguyen and Derakhshani [25] (Ocular Recognition, Softmax)

Li et al. [19] (Face-Xray, Softmax)



TABLE I: The AUC and EER scores obtained on using ResNet-50 face recognition model in detecting deepfakes generated using various techniques on high quality Celeb-DF [21] and FaceForensics++ [28] datasets. These results are included for both MSIM-AF [10] and WebFace12M [36] datasets used for training the ResNet-50 based face recognition model. Comparison has been done with the existing methods based on ocular region [25] and the best performing face-Xray based CNN model [19] for deepfake detection.

Table I shows the deepfake detection performance in terms of Equal Error Rate (EER) and AUC evaluated on Celeb-DF and FaceForensics++ datasets, using face recognition ResNet-50 model pretrained on MS1M-ArcFace and WebFace12M dataset. The top performance results are highlighted in bold for both the training datasets and across the different deepfake generation techniques.

As can be seen from the table, Identity swapping methods namely, FaceSwap, FaceShifter, and Faceswap-Kowalski obtain AUC close to and EER of on an average, demonstrating the efficiency of face recognition in detecting fakes created using these methods. The deepfake methods based on autoencoders also obtained a high AUC score of about . This demonstrates that even high-quality deepfakes without apparent visual artifacts, such as those in Celeb-DF datasets, have their facial features corrupted using blending operation used in face-swapping techniques. These corrupted facial features are efficiently detected using a face recognition algorithm by matching original templates to the deepfake images of the target identity. The obtained results using face recognition technology significantly outperformed the ocular recognition for deepfake detection [25] by using subject-disjoint evaluation on CelebDF. The obtained results are also better than the popular two-class CNN-based Face-Xray model [19] by on the Celeb-DF dataset (see Table I). The Face-Xray model was chosen for comparison because it outperformed other CNN models in [19]. Note that the existing studies [3, 2, 11, 8] on using behavioral biometrics (such as facial expression and head-pose movement) for deepfake detection, performed a very limited evaluation on identity swapping based deep fake generation techniques and reported only AUC score. Therefore, we did not use these studies for cross-comparison.

Expression swapping methods namely, Face2Face and NeuralTextures obtained the least performance with EER of and , respectively, for the best case. The obtained EER is about   higher than those obtained for identity swapping techniques. The NeuralTextures approach further obtained lower performance over Face2Face, because it primarily altered the face expression that corresponded to the mouth region resulting in less deformed facial discriminators. Face2Face uses a re-targeting and warping procedure to swap expressions, leaving many of the original facial features intact, resulting in the facial recognition model failing in detecting deepfakes. Thus, facial recognition technology is not effective in detecting deepfake generated using expression swapping methods. This is primarily due to the fact that only expression is changed keeping the identity features intact. Experiments on the FF++ dataset suggest that even within the same environment and conditions (i.e. by using single video per subject), expression-swapping methods severely underperform. The experimental results also suggest that the dataset used for training face recognition model (i.e., MS1M-ArcFace and WebFace12M) has no impact on deepfake detection accuracy evaluated on Celeb-DF and FaceForensics++ deepfake datasets.

Among different loss functions that were used to train the face recognition model, Combined margin and CosFace loss performed the best by about , over other loss functions. Triplet loss performed the least on average. When trained on MS1M-Arcface dataset, CosFace obtained the best performance on detecting FaceSwap ( AUC, EER), FaceShifter ( AUC, EER), FaceSwap-Kowalski ( AUC, EER), and Deepfakes ( AUC, EER) in Faceforensic++ and Celeb-DF ( AUC, EER) dataset. Using SphereFace, the best performance ( AUC, ) was obtained in detecting Face2Face and NeuralTextures were best detected ( AUC, EER) using Softmax loss.

Similarly, when trained on WebFace12M dataset, Combined Margin obtained the best performance on detecting FaceSwap ( AUC, EER), FaceSwap-Kowalski ( AUC, EER), Deepfakes ( AUC, EER) for faceforensics++ and Celeb-DF ( AUC, EER) dataset. Face2Face ( AUC, EER) and NeuralTextures ( AUC, EER) were best detected using Softmax and FaceShifter ( AUC, EER) using CosFace loss function. Loss functions based on Cosine Margin Penalty, such as ArcFace, CosFace, and Combined Margin, fared better on an average by than the others in deepfake detection, whereas triplet loss obtained the least performance.

Figure 2 shows the t-SNE [33] visualization of the deep feature embeddings (number of components = for real and fake images) from FaceSwap, NeuralTextures and Deepfakes based fake creation techniques. There is a clear separation between facial features from genuine and deepfakes for the FaceSwap approach. A similar observation was obtained for other identity swapping methods as well. However, the real and fake features overlap when using expression swapping approaches, as can be seen for the NeuralTexture method.

Another noteworthy finding from these visualizations is that for Deepfakes, which are based on auto-encoders, deep features are considerably more spread out than the other approaches indicating the lack of consistency across different frames of the same subject. Figure 3 shows the histogram of the genuine and deepfake scores distribution obtained on comparing the templates with real and fake images using different deepfake generation techniques. Similar to t-SNE based visualization, expression swapping methods obtained higher overlap in genuine and deepfake distribution over identity swapping methods.

In summary, experimental results demonstrate the effectiveness of face recognition technology in identifying different identity swapping-based deepfake generation methods. Combined margin and CosFace loss functions obtained the best deepfake detection rate as they can attain better intra-class compactness as well as can maximize inter-class separability. The obtained results using face recognition technology significantly outperformed the existing biometric studies using facial region [25] and the popular Face-Xray model [19] for deepfake detection (see Table I). The facial recognition technology is not effective in detecting deepfake generation using expression swapping methods that only change expression keeping the identity features intact.

V Conclusion and Future Work

As most of the existing deepfake detection algorithms rely on visible structural artifacts or color inconsistencies, they do not perform well on high-quality datasets comprising next-generation deepfakes, such as those available in Celeb-DF and FaceForensics++. In this paper, we evaluated the effectiveness of deep face recognition in detecting high-quality deepfake images or videos from the real ones of the same identity, using the notion of detecting corrupted facial features rather than image anomalies. Experimental results demonstrated the efficiency of face recognition technology in identifying deepfake based identity swapping methods, surpassing those obtained by two-class CNNs on the same datasets. Combined margin and CosFace loss functions obtained the best deepfake detection rate. However, the face recognition technology could not be used for detecting expression swaps in deepfakes. One of the limitations of using biometric technology for detecting deepfakes is the requirement of the subject’s identity for biometric facial feature matching. As a part of future work, the bias of face recognition technology in identifying deepfakes across demographic variations will be evaluated. Through investigation will be done to understand the effectiveness and failure mode of face recognition technology across evolving deepfake generation techniques. The fusion of behavioral biometrics with facial features will be explored for enhanced deepfake detection performance.

Vi Acknowledgement

This work is supported in part from a grant no. #210716 from University Research/Creative Projects at Wichita State University. The research infrastructure is supported in part from a grant No. 13106715 from the Defense University Research Instrumentation Program (DURIP) from Air Force Office of Scientific Research.

References

  • [1] D. Afchar, V. Nozick, J. Yamagishi, and I. Echizen (2018-12) MesoNet: a compact facial video forgery detection network. 2018 IEEE International Workshop on Information Forensics and Security (WIFS). External Links: ISBN 9781538665367, Link, Document Cited by: §I, §I.
  • [2] S. Agarwal, H. Farid, T. El-Gaaly, and S. Lim (2020) Detecting deep-fake videos from appearance and behavior. In 2020 IEEE International Workshop on Information Forensics and Security (WIFS), Vol. , pp. 1–6. External Links: Document Cited by: §I, §IV.
  • [3] S. Agarwal, H. Farid, Y. Gu, M. He, K. Nagano, and H. Li (2019-06) Protecting World Leaders Against Deep Fakes. In

    Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshops

    ,
    Long Beach, CA, pp. 8. Cited by: §I, §IV.
  • [4] I. Amerini, L. Galteri, R. Caldelli, and A. Del Bimbo (2019) Deepfake video detection through optical flow based cnn. In 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW), Vol. , pp. 1205–1207. External Links: Document Cited by: §I, §I.
  • [5] C. Burt (2020-08) Deepfakes declared top ai threat, biometrics and content attribution scheme proposed to detect them: biometric update. BiometricUpdate.com. External Links: Link Cited by: §I, §I.
  • [6] (2019-09) Chinese deepfake app zao sparks privacy row after going viral. Guardian News and Media. External Links: Link Cited by: §I.
  • [7] A. Chintha, A. Rao, S. Sohrawardi, K. Bhatt, M. Wright, and R. Ptucha (2020) Leveraging edges and optical flow on faces for deepfake detection. In 2020 IEEE International Joint Conference on Biometrics (IJCB), Vol. , pp. 1–10. External Links: Document Cited by: §I, §I.
  • [8] D. Cozzolino, A. Rössler, J. Thies, M. Nießner, and L. Verdoliva (2020) ID-reveal: identity-aware deepfake video detection. CoRR abs/2012.02512. External Links: Link, 2012.02512 Cited by: §I, §IV.
  • [9] (2019-09) Deepfakes: what are they and why would i make one?. BBC. External Links: Link Cited by: §I.
  • [10] J. Deng, J. Guo, N. Xue, and S. Zafeiriou (2019) ArcFace: additive angular margin loss for deep face recognition. In IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2019, Long Beach, CA, USA, June 16-20, 2019, pp. 4690–4699. External Links: Document Cited by: 1st item, §III-B, §III-B, Fig. 2, Fig. 3, TABLE I.
  • [11] X. Dong, J. Bao, D. Chen, W. Zhang, N. Yu, D. Chen, F. Wen, and B. Guo (2020) Identity-driven deepfake detection. CoRR abs/2012.03930. External Links: Link, 2012.03930 Cited by: §I, §IV.
  • [12] I. J. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio (2014) Generative adversarial nets. In Proceedings of the 27th International Conference on Neural Information Processing Systems - Volume 2, NIPS’14, pp. 2672–2680. Cited by: §I.
  • [13] Y. Guo, L. Zhang, Y. Hu, X. He, and J. Gao (2016) MS-celeb-1m: A dataset and benchmark for large-scale face recognition. In Computer Vision - ECCV 2016 - 14th European Conference, Vol. 9907, pp. 87–102. Cited by: §III-B.
  • [14] K. He, X. Zhang, S. Ren, and J. Sun (2016) Deep residual learning for image recognition. In 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR, pp. 770–778. External Links: Document Cited by: §III-B.
  • [15] T. Hwang (2020-07) Deepfakes: a grounded threat assessment. Technical report Center for Security and Emerging Technology, Georgetown University. External Links: Link Cited by: §I.
  • [16] P. Korshunov and S. Marcel (2018) DeepFakes: a new threat to face recognition? assessment and detection. CoRR abs/1812.08685. External Links: Link, 1812.08685 Cited by: §I.
  • [17] P. Korshunov and S. Marcel (2018) DeepFakes: a new threat to face recognition? assessment and detection. CoRR abs/1812.08685. External Links: Link, 1812.08685 Cited by: §I.
  • [18] L. Li, J. Bao, H. Yang, D. Chen, and F. Wen (2019) FaceShifter: towards high fidelity and occlusion aware face swapping. CoRR abs/1912.13457. External Links: Link, 1912.13457 Cited by: §I, item 1b, item 1.
  • [19] L. Li, J. Bao, T. Zhang, H. Yang, D. Chen, F. Wen, and B. Guo (2020) Face x-ray for more general face forgery detection. In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2020, June 13-19, 2020, pp. 5000–5009. External Links: Document Cited by: §I, §I, TABLE I, §IV, §IV.
  • [20] Y. Li and S. Lyu (2019) Exposing deepfake videos by detecting face warping artifacts. In IEEE Conference on Computer Vision and Pattern Recognition Workshops, CVPR Workshops 2019, June 16-20, 2019, pp. 46–52. Cited by: §I, §I.
  • [21] Y. Li, P. Sun, H. Qi, and S. Lyu (2020) Celeb-DF: A Large-scale Challenging Dataset for DeepFake Forensics. In IEEE Conference on Computer Vision and Patten Recognition (CVPR), Seattle, WA, United States. Cited by: 3rd item, 1st item, §III-A, §III-B, TABLE I.
  • [22] Y. Liu, C. Chang, I. Chen, Y. Ku, and J. Chen (2021) An experimental evaluation of recent face recognition losses for deepfake detection. In 2020 25th International Conference on Pattern Recognition (ICPR), Vol. , pp. 9827–9834. External Links: Document Cited by: §I.
  • [23] (2020-09) Microsoft’s deepfake-spotting tech may have biometric applications. External Links: Link Cited by: §I.
  • [24] Y. Mirsky and W. Lee (2021-04) The creation and detection of deepfakes. ACM Computing Surveys 54 (1), pp. 1–41. External Links: ISSN 1557-7341, Link, Document Cited by: §II, footnote 2.
  • [25] H. M. Nguyen and R. Derakhshani (2020) Eyebrow recognition for identifying deepfake videos. In 2020 International Conference of the Biometrics Special Interest Group (BIOSIG), Vol. , pp. 1–5. External Links: Document Cited by: §I, §I, TABLE I, §IV, §IV.
  • [26] I. Perov, D. Gao, N. Chervoniy, K. Liu, S. Marangonda, C. Umé, Mr. Dpfks, C. S. Facenheim, L. RP, J. Jiang, S. Zhang, P. Wu, B. Zhou, and W. Zhang (2020) DeepFaceLab: A simple, flexible and extensible face swapping framework. CoRR abs/2005.05535. External Links: Link Cited by: item 1.
  • [27] C. Rathgeb, A. Botaljov, F. Stockhardt, S. Isadskiy, L. Debiasi, A. Uhl, and C. Busch (2020) PRNU-based detection of facial retouching. IET Biometrics 9 (4), pp. 154–164. External Links: Document, Link, https://ietresearch.onlinelibrary.wiley.com/doi/pdf/10.1049/iet-bmt.2019.0196 Cited by: §I.
  • [28] A. Rossler, D. Cozzolino, L. Verdoliva, C. Riess, J. Thies, and M. Nießner (2019) Faceforensics++: learning to detect manipulated facial images. In Proceedings of the IEEE ICCV, pp. 1–11. Cited by: 3rd item, 2nd item, §III-A, §III-B, TABLE I.
  • [29] J. Stehouwer, H. Dang, F. Liu, X. Liu, and A. Jain (2019) On the detection of digital face manipulation. arXiv preprint arXiv:1910.01717. Cited by: §I, §I.
  • [30] J. Thies, M. Zollhöfer, M. Stamminger, C. Theobalt, and M. Nießner (2016) Face2Face: real-time face capture and reenactment of rgb videos. In Proc. CVPR, Cited by: §I, item 2a.
  • [31] J. Thies, M. Zollhöfer, and M. Nießner (2019-04) Deferred Neural Rendering: Image Synthesis using Neural Textures. arXiv e-prints. External Links: 1904.12356 Cited by: §I, item 2b.
  • [32] R. Tolosana, R. Vera-Rodriguez, J. Fierrez, A. Morales, and J. Ortega-Garcia (2020) DeepFakes and beyond: a survey of face manipulation and fake detection. arXiv preprint arXiv:2001.00179. Cited by: §I, §II, footnote 2.
  • [33] L. van der Maaten and G. Hinton (2008) Visualizing data using t-sne.

    Journal of Machine Learning Research

    9 (86), pp. 2579–2605.
    External Links: Link Cited by: 2nd item, §IV.
  • [34] H. Wang, Y. Wang, Z. Zhou, X. Ji, D. Gong, J. Zhou, Z. Li, and W. Liu (2018) CosFace: large margin cosine loss for deep face recognition. In 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR June 18-22, pp. 5265–5274. External Links: Document Cited by: §III-B.
  • [35] K. Zhang, Z. Zhang, Z. Li, and Y. Qiao (2016-10) Joint face detection and alignment using multitask cascaded convolutional networks. IEEE Signal Processing Letters 23 (10), pp. 1499–1503. External Links: ISSN 1558-2361, Link, Document Cited by: §III-B.
  • [36] Z. Zhu, G. Huang, J. Deng, Y. Ye, J. Huang, X. Chen, J. Zhu, T. Yang, J. Lu, D. Du, and J. Zhou (2021) WebFace260M: A benchmark unveiling the power of million-scale deep face recognition. In IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2021, virtual, June 19-25, 2021, pp. 10492–10502. Cited by: 1st item, §III-B, Fig. 2, Fig. 3, TABLE I.