The rapid development of deep learning and generative modeling have promoted the progress of face manipulation and forgery technologies (i.e., deepfake), which largely lower the bar of creating photo-realistic facial images/videos. At present, there exist a variety of AI-based methods, which can be used for identity swapping, expression editing, attribute manipulation, etc. Face2Face , for instance, is a method that can edit the expression of a person to make it identical to that of another person. Face replacement methods like FaceSwap  is capable of further replacing the facial region of somebody with that of another person in a video.
The counterfeit products of AI-based facial manipulation or forgery make it difficult for human beings and conventional facial classification systems to distinguish between real and fake. Deepfake technologies are thus at high risk of malicious abuse, posing threats to face recognition applicationse.g., face recognition based payment and access control. Malicious facial manipulation and forgery may also infringe on individual privacy and reputation. Hence, in order to protect public safety and individual privacy, it is essential to develop effective methods for detecting deepfake.
In this work, we focus on the problem of detecting facial forgeries, that is, automatically detecting whether an image (or a video) of a human face has been manipulated or was forged by AI-based technologies. We adopt the term “deepfake detection” to describe the task, following prior arts.
A variety of methods have been developed for deepfake detection. Early work relies on handcrafted features, using for example noise variance analysis and digital shadow writing analysis [10, 19]
, to discover differences between real and synthetic images/videos. Recently, learning-based deepfake detection methods have been proposed and discussed, which advocate to use convolutional neural networks (CNNs) to achieve the goal. Although significant progress has been made over the last few years, existing methods normally fail to generalize to images/videos generated by unseen technologies or even unseen models. They can achieve a very high accuracy of % for a manipulation technology whose output images/videos have been seen during training of the detection model, yet fail to generalize to images/videos generated by unseen technologies. Furthermore, existing deepfake detection models are also very sensitive to image quality, as will be shown in our experiments, e.g., the model performance degrades significantly when the compression rate of the test images/videos is different from that of the training samples.
There are already a few methods trying to resolve this problem and improve the performance of deepfake detection, but most of them [28, 15] rely on modifying the architecture of the classification CNNs. In this paper, we attempt to address the problem from a different perspective, by innovating the training mechanism of deepfake detection models and adopting adversarial training. Adversarial training aims at discovering challenging samples that are not easily predicted by the current classification model. We believe training with these samples encourages the model to focus on more essential and generalizable features that could be used to distinguish the evolving fake images/videos from the real ones, thereby improving the model’s generalization ability to unseen forgeries.
We evaluate several different types of adversarial examples in adversarial training, including the additive  and spatial-transformed adversarial examples . We propose adversarially blurred examples which can be more suitable to the task, leading to improved generalization performance to both unseen forgery technologies and unseen image/video qualities in the test phase. Except for the input-gradient-based strategy for crafting adversarial examples [22, 33], a generator-based strategy is also advocated, which not only controls the computational cost for obtaining adversarial examples, but also improves their flexibility. Extensive experimental results verifies the effectiveness of our method. In summary, our contributions are:
We introduce adversarial training into the training process of deepfake detection models. We show that it improves the generalization ability and robustness of the models notably.
A novel method of generating adversarial examples based on image blurring is proposed, and it is shown to be more suitable to the adversarial training framework of deepfake detection.
Since our method focuses on innovating training strategies, our proposed adversarial training framework can be used together with many existing methods which modify the network structure to further improve the performance of deepfake detection model.
Extensive experiments show that the performance of several popular deepfake detection models can be improved by using our method, in the sense of better generalization ability on unseen forgery technologies and image/video qualities.
Ii Related Work
, over the last decade, with the rapid development of computer graphics and computer vision, facial manipulation techniques have become fantastic. For instance, Daleet al.  managed to swap faces in videos by reconstructing 3D face models of different people. 3D-based methods were also used by Garrido et al.  and Thies et al. , sometimes in combination with neural rendering technologies . In addition to the graphics-based methods, there are also many vision-based methods. In particular, the recent upsurge of deep learning has made these methods (e.g., DeepFakes , FaceSwap , and ZAO ) popular for synthesizing photo-realistic facial images, and the term “deepfake” also comes from the trend. Generative adversarial nets (GANs)  can also be used for facial attribute editing [32, 8, 25], face swapping [34, 4, 35], etc. In addition, GANs  were also used for direct synthesis of whole facial images from noises.
Deepfake Forensics. Maliciously forged deepfake images/videos are apparently harmful to the individual privacy, and they can pose a grave threat to the society. It is of great importance to develop effective deepfake detection solutions. While early attempts [36, 19, 21, 18] focused on the internal statistics or hand-crafted features of images/videos to discover the difference between real and fake, most recent methods were designed based on deep learning features [53, 11, 38, 5, 29]
or end-to-end trained deep binary classifiers[1, 39, 24, 42]. As have been criticized, most of the methods suffer from overfitting to the training dataset and cannot be effectively applied to many practical scenarios. There are methods trying to cope with the generalization issue of deepfake detection models. For instance, Li et al.  proposed a more generalizable deep face representation for achieveing the goal, by introducing an auxiliary task of predicting face x-ray. Stehouwer et al.  modified the network architecture and introduced an attention module for the task. Auto-encoders were also considered . Unlike these methods with innovated network architectures, our work in this paper focus solely on the training mechanism of deepfake detection models, thus it is orthogonal to these efforts and can be naturally combined with them to achieve even better results.
Adversarial Training. Adversarial training refers to the utilization of adversarial examples for augmenting the training set of models, which constitutes the main basis of defense against adversarial attacks [22, 27, 33, 48]. The development of adversarial training can be traced back to , in which the fast gradient sign method (FGSM) was proposed for improving the adversarial robustness. Madry et al.  further proposed to use a multi-step scheme called projected gradient descent (PGD), enabling more powerful robustness than that obtained with FGSM and many of its contemporary defense methods . Hussain et al.  revealed the vulnerability of existing deepfake detection models to adversarial examples. Ruiz et al.  adopted the method of generating adversarial examples to prevent photos from being used for generating deepfakes. In contrast to their methods, we advocate adversarial training for improving the performance of the classification-based deepfake detection model. In Sec. III, we will discuss how adversarial training can help improve the performance of deepfake detection, and we will also introduce a new type of adversarial examples which is more suitable to the deepfake detection task.
Iii Proposed Approach
This section introduces our proposed framework (based on adversarial training) for deepfake detection. First, we explain why adversarial training is advocated, and we will also revisit the basic concept of adversarial training, mostly on the basis of commonly used additive adversarial examples in Sec. III-A. Then, in Sec. III-B, we introduce a new and dedicated method for performing adversarial attacks and adversarial training, which is based on pixel-level Gaussian blurring. Finally, we introduce how generator-based adversarial training can be performed in Sec. III-C.
Iii-a Adversarial Training Framework
Deepfake detection is normally cast into a binary classification task. Predictions can be made on the basis of one image (as model input) or a sequence of images in a single video. For simplicity reasons, here we consider the case where only an image is fed to the model, and we note that our method can naturally generalize to models whose inputs are a sequence of images. Given a training set that includes a large number of images and their corresponding labels. The problem of many existing deepfake detection models is that normal training on barely guarantees the generalization to fake images generated by unseen technologies or compressed with different quality factors. A plausible solution to the problem is to introduce an “adversary” that keeps refining the training fake images and removing obvious artifacts that could easily be spotted by the deepfake detection model, such that the detection model learns to correctly classify more advanced fake images. This is in coordinate with the spirit of adversarial learning.
Let us revisit the traditional adversarial learning problem first. Assume that a classification model (e.g., the deepfake detection model) attempts to minimize the prediction loss for any given data (i.e., an image paired with its label ), in which collects all learnable parameters in the classification model, (, , and represent height, width, and number of channels of , respectively), and . The goal of the normal training mechanism is to find an appropriate set of parameters to minimize the empirical risk , while, targeting addressing the adversarial vulnerability of deep models, adversarial training aims at strengthening the models by generating adversarial examples and injecting them into the training set.
Over the past few years, a variety of adversarial examples have been proposed. The most popular method of generating adversarial examples is to add pixel-level perturbations to clean images, e.g., FGSM  suggests to obtain each adversarial example by adding a scaled input-gradient sign to each original image . That is:
The above equation is derived from an optimization problem maximizing the prediction loss of an input obtained by adding a perturbation (whose norm is no greater than ) to a clean image .
Adversarial training plays a zero-sum game which includes an auxiliary process that generates adversarial examples which maximizes the classification loss. The generated adversarial examples can be used instead of the original benign examples or in combination with them for training. For the former, we have
while for the latter, we can use the following optimization problem instead of (2):
Note that we introduce a set to constrain the allowable disturbance from each adversarial examples to its corresponding “clean” image. For FGSM as introduced in Eq. (1), we have . In general,
guarantees the visual similarity between the adversarial example and the original image.
Recently, some other kinds of adversarial examples have also been proposed. For instance, instead of imposing pixel-level additive perturbations, Xiao et al.  proposed to calculate an adversarial optical flow to spatially transform each pixel of the clean images accordingly. Let us use to represent the pixel on the -th row and
-th column of the original image, an adversarial optical flow vectoris learned in the method to transform to the corresponding position on the adversarial image . The magnitude of the adversarial optical flow is encouraged to be small while leading to large prediction loss of the adversarial example. Training with this sort of examples will be called spatial-transformed adversarial training (SAT), and training with the result of Eq. (1) will be called additive adversarial training (AAT) in this paper. Besides the introduced ones, there exist other types of adversarial examples, e.g., those utilizes white-balance .
Iii-B Blurring Adversarial Training
Albeit adversarial training based on the aforementioned examples have achieved improved robustness under adversarial attacks, their performance in enhancing the generalization ability of deepfake detection models is unclear. In fact, on natural image classification tasks (e.g
., on ImageNet), it has been demonstrated that adversarial training barely contributes to the generalization ability to normal test data, on account of the distribution drift between these adversarial examples and normal test samples, and the same problem might also exist in the task of deepfake detection. Here we propose a new type of adversarial examples that is shown to be more effective in the adversarial training framework for deepfake detection.
We know from prior work  that some high-frequency components of the fake images/videos are very easily spotted by models yet also very specific and difficult to generalize. That is, introducing Gaussian blur and JPEG compression 
augmentations probably improves the deep classification CNNs, and it might be even more effective to introduce a blurring-based adversarial training mechanism. Specifically, given an input imagewhose height, width, and number of channels are , , and , respectively, we obtain by performing pixel-wise Gaussian blur on . We use to represent the -th pixel of , and we attempt to learn a single-channel map with a size of , each of whose entries (e.g.,
) represents the standard deviation of a Gaussian kernel to be applied to the region centered at the corresponding pixel of image, i.e., . In details, for obtaining the value of , we first collect , then use it to calculate the kernel for performing Gaussian blur around . Suppose that the kernel size is chosen as , then we calculate the inner product between and (i.e., a region of pixels centered at the pixel with a radius of ). That is:
in which and represent the relative coordinates to the centre pixel in . Such a pixel-wise Gaussian blur can be easily implemented as a vectorized operation and thus is computationally very efficient. The map basically controls how much blurring is to be performed on the original training image. Larger entries of should lead to more blurry images and leave less obvious artifacts from the deepfake generator, while on the contrary, smaller entries of leave more obvious artifacts for the classification model to learn. We have , as all the entries of approach zero.
We aim at learning a reasonable map for each training image. Since the adversarial blurring is to be performed in a pixel-wise manner, we are able to blur more on image regions with less generalizable features. Similar to other adversarial examples, here the blurring-based adversarial examples are suggested to have less distortions from the original images, and we introduce a simple one-step scheme to achieve the goal, just like FGSM (except for the sign function):
in which is obtained by Eq. (5) and is an initialization of . In practice, we let be a matrix whose entries share a common value (e.g., ). One can further extend the simple one-step scheme in Eq. (6) to a multi-step scheme, just like from FGSM  to PGD . This requires iteratively performing Eq. (6), and it can indeed help achieve higher attack success rates. However, it also leads to longer training time in our adversarial training framework. Adversarial training can further be adopted on the crafted examples and we will call this method blurring adversarial training (BAT).
Iii-C Generator-based Methods
Most adversarial training methods craft adversarial examples using the input-gradient of the loss function. As has been mentioned, powerful adversarial examples are normally designed with a multi-step scheme, therefore the computationally cost increases as the the number of steps increases. We propose an alternative way of generating adversarial examples, by introducing a CNN-based generator to control the training complexity, somewhat similar to . We consider a CycleGAN  generator for generating . We emphasize that, unlike the work of Rusak et al.’s , for each original training image , we generate a specific map for it, considering the fact that the most transferable features on different images can reside in different spatial regions. Denote by and the set of learnable parameters for the deepfake detection model and that for the generator, respectively, we opt to playing the following min-max game:
The introduced generator can also be considered as an enhancement model for the original deepfake generator(s). The optimization problem Eq. (7) allows to train a generator whose goal is opposite to the deepfake detection model, e.g., to remove obvious artifacts and synthesize more realistic deepfakes that can invalidate the deepfake detection model. If the generator indeed learns to synthesize more realistic fake images, then the classification model can learn more about deepfake and thus becomes more generalizable. The generator in turn learns to further improve its generation ability. More importantly, the generator-based method can be more flexible, and it suffices to learn to conceal more generalizable features in different images in combination with BAT, as will be shown in our experiments. In practice, our generator is used to “enhance” both fake and real training images, to balance training data from both classes.
Our generator-based BAT is also related to GAN , which contains a pair of generator and discriminator as well. What makes our method significantly different is that our goal is to improve the deepfake detection model (i.e., our discriminator), while GAN aims to improve its generator. In our case, the generator is used to craft adversarial examples, while in a GAN, the goal of its generator is to capture the distribution of natural images. Moreover, our discriminator is used to distinguish fake from real images (all “enhanced” by the generator), while the GAN discriminator is used to distinguish whether an image is a synthesized one (synthesized by the GAN generator) or a natural image (directly collected from the training set).
Two Generators. Since the distribution of the real images and that of the fake images are different, we might need to introduce a very large generator to adapt it to both the two classes. Furthermore, the generator will have to first predict whether its input is real or fake and then attempt to process it to make it more like a fake one or a real one, to achieve the aforementioned goal. On this point, we propose to use two generators for images from the two classes, respectively, to alleviate this problem. That is, we introduce which only processes real images and which only processes fake images. By introducing the two different generators, each of them will only be responsible for images from a single class, and we can expect them to learn more specific adversarial strategies for the two classes. Experimental results in Sec. IV-A will show the empirical effectiveness of introducing the extra generator.
In this section, we first demonstrate the superiority of our method on the task of deepfake detection through a large number of experiments, and then we illustrate the effectiveness of our Gaussian blur adversarial attack through white box attack experiments on ImageNet.
|+ Combined AT||98.40||94.95||88.90||71.08||76.13||57.90||50.13||51.14||77.74||–||68.45||69.00||76.63||68.81|
|+ Combined AT||99.41||98.95||95.80||82.99||71.02||59.82||98.73||94.80||98.40||94.95||75.43||62.93||89.99||82.00||92.80||84.95||92.00||85.84|
Dataset. FaceForensics++ (FF++)  is a recently released large scale deepfake video detection dataset, containing 1,000 real videos, in which 720 videos were used for training, 140 videos were reserved for verification, and 140 videos were used for test. Each real video in the dataset was manipulated using four advanced methods, including DeepFakes (DF) , Face2Face (F2F) , FaceSwap (FS) , and NeuralTextures (NT) , to generate four fake videos. We followed the official split of the training, validation, and test sets in our experiments. Each video in the dataset were processed to have three video qualities, namely RAW, C23 (which is compressed from the raw data but has relatively high quality), and C40 (which is compressed to have relatively low quality). For each quality, there are 5,000 (real and fake) videos in total, and we extracted 270 frames from each video following the official implementation of face detection and alignment in . In order to evaluate the generalization ability of the model, we trained models on videos from one specific method and evaluated on those generated by a variety of manipulation methods and image/video qualities.
To make the evaluation more comprehensive, we introduced two more deepfake datasets: DFD  and Celeb-DF . DFD  contains 3,068 deepfake videos, which were forged based on 363 real videos. We used all the real and fake videos and randomly selected 10 frames from each of them for test. Celeb-DF contains 590 real videos and 5,639 fake videos, and we used its official test set.
Implementation Details. We apply our adversarial training mechanism to existing deepfake detection models to testify its effectiveness. We first considered EfficientNet  (which is a common choice of many winning solutions to the Deepfake Detection Challenge 111https://www.kaggle.com/c/deepfake-detection-challenge
). EfficientNet was originally designed for image classification and it was transferred to the deepfake detection task by replacing its highest fully-connected layer with one that outputs two dimensional logits. This layer was randomly initialized and the other layers of the model were all pre-trained on ImageNet. We used the RAdam  optimizer with and to train the model. We adopted a weight decay of . The learning rate was initialized to and cut by every epochs. For models trained along with generators, the learning rate of the generators was initialized to . For experiments involving Gaussian blur, we set the blur kernel size to , unless otherwise clarified. To ensure numerical stability, we optimize or generate rather than
for BAT in practice. All experiments were performed in a PyTorch environment, running with an Intel Xeon Gold 6130 CPU and an Nvidia Tesla V100 GPU. Besides EfficientNet, we also considered an Xception  model following  and a recent model proposed by Stehouwer et al. 
. We used the prediction accuracy and the area under the receiver operating characteristic curve (AUC) as evaluation metrics. Following prior arts, we report the AUC scores in percentage for comparison in the paper.
|Metric||EfficientNet ||+ Grad-AAT||+ Grad-SAT||+ Grad-BAT||+ Gen-BAT||+ Two-Gen-BAT||+ Combined AT|
|+ Gaussian Noise||98.66||95.19||83.78||60.49||61.19||51.76||43.68||48.85||74.44||–||58.06||57.96||69.97||62.85|
|+ Gaussian Blur||98.78||95.95||83.84||60.48||61.17||51.81||43.64||48.86||74.45||–||58.43||58.24||70.05||63.07|
|+ JPEG Compression||98.64||95.14||83.78||60.45||61.17||51.76||43.62||48.82||74.45||–||58.19||58.01||69.98||62.84|
|+ Combined Traditional||98.85||95.98||83.86||60.48||61.26||51.83||43.67||48.89||74.56||–||58.55||58.49||70.13||63.13|
|+ Two-Gen-BAT (ours)||98.72||95.26||87.51||69.40||74.65||56.19||48.99||50.43||76.60||–||66.84||67.91||75.55||67.84|
|+ Combined AT (ours)||98.40||94.95||88.90||71.08||76.13||57.90||50.13||51.14||77.74||–||68.45||69.00||76.63||68.81|
|+ Gaussian Noise||99.38||99.17||69.66||54.75||57.24||50.97||98.47||94.25||98.66||95.19||70.04||56.18||87.50||78.98||90.78||81.13||89.78||80.55|
|+ Gaussian Blur||99.47||99.24||77.63||57.46||59.24||51.29||98.88||94.65||98.78||95.95||71.08||57.89||89.54||79.83||90.92||82.03||90.75||81.83|
|+ JPEG Compression||99.39||99.17||95.07||80.19||67.95||56.18||98.25||94.30||98.64||95.14||73.19||61.21||89.33||80.46||91.19||83.20||91.95||82.66|
|+ Combined Traditional||99.46||99.26||83.57||73.52||68.11||57.29||98.98||94.90||98.85||95.98||73.26||61.58||89.69||80.78||91.75||83.36||92.07||83.75|
|+ Two-Gen-BAT (ours)||99.46||99.04||95.93||84.19||72.18||61.87||98.92||95.52||98.72||95.26||74.73||62.53||90.58||82.91||93.51||85.36||94.19||87.02|
|+ Combined AT (ours)||99.41||98.95||95.80||82.99||71.02||59.82||98.73||94.80||98.40||94.95||75.43||62.93||89.99||82.00||92.80||84.95||92.00||85.84|
Iv-a Different Settings for Adversarial Training
Since several different ways of generating adversarial examples and performing adversarial training have been introduced, we compare them first, including: (i) Grad-AAT: input-gradient-based additive adversarial training, (ii) Grad-SAT: input-gradient-based spatially-transformed adversarial training, (iii) Grad-BAT: input-gradient-based blurring adversarial training, (iv) Gen-BAT: generator-based blurring adversarial training, (v) Two-Gen-BAT: BAT with two generators, and, partially inspired by a general image degeneration process, (vi) Combined AT: combining Two-Gen-BAT with Grad-AAT (see Figure 3 for an overview). All competing models were trained on the NT data only and might be tested on the other fake data. Note that generator-based AAT and generator-based SAT did not show any improvement to the baseline solution in our experiment thus their results will not be shown. The number of real and fake videos in DFD  is seriously unbalanced, so we did not show the accuracy on DFD dataset.
It can be seen from Table I that the considered various types of adversarial training all improve the generalization ability of the obtained models to unseen forgery types (i.e., DF, F2F, FS, DFD, and Celeb-DF). Table II further shows that equipped with the operation of pixel-wise blurring, our Grad-BAT, Gen-BAT, and the combined AT also improve the generalization performance of the obtained models for different image/video qualities, though other adversarial training methods can hardly contribute under such circumstance. Also, the results demonstrate the performance of our BAT equipped with two generators is superior to that only equipped with a single generator. Moreover, although adversarial training leads to slightly degraded performance when the training and test data come from the same deepfake forgery technology and share the same image/video quality (c.f. the first column of Table I and II), introducing two generators mitigate such performance degradation to some extent. We also considered when data of different image/video qualities was trained altogether, and test was performed on all the qualities. Somewhat surprisingly, we found that the performance degradation (observed in Table II) was well-mitigated with our adversarial training (see Table III).
Iv-B Comparison to Other Methods
Adversarial training vs data augmentation. Adversarial training can be regarded as an advanced way of performing data augmentation. On this point, we further compare the proposed method to some traditional data augmentation strategies, including introducing traditional Gaussian noise, traditional Gaussian blur, and JPEG compression . Tables IV and V compare adversarial training (i.e., Two-Gen-BAT and combined AT) to these strategies and their combination (named “combined Traditional” in the tables). It can be seen that these traditional data augmentation strategies and their combination hardly improve model generalization to unseen forgery technologies, despite some unsurprising improvement in generalization to unseen image/video qualities. In addition to the results on EfficientNet, see Table X & XI in the appendices for similar results based on Xception.
Incorporating with other deepfake detection models. We would also like to see whether our adversarial training could improve more advanced models (than Xception  and EfficientNet ) similarly. We tested with the one proposed by Stehouwer et al. , and we tried applying our combined AT and Two-Gen-BAT on it. The whole FF++ training set (including DF, F2F, FS, and NT data of all qualities) was used for training to better suit the model setting in , and we tested the model performance on DFD, Celeb-DF, and the test set of FF++. It can be seen from Table VI that the adversarially trained models generalize considerably better on unseen forgery technologies, i.e., DFD and Celeb-DF.
|+ Two-Gen-BAT (ours)||97.45||93.25||86.49||-||69.28||63.17|
|+ Combined AT (ours)||97.62||93.38||87.10||-||70.61||64.85|
|+ Two-Gen-BAT (ours)||98.46||94.00||87.58||-||72.69||66.46|
|+ Combined AT (ours)||98.62||94.12||87.95||-||73.46||67.52|
|Stehouwer et al. ||96.79||94.17||86.84||-||59.74||59.36|
|+ Two-Gen-BAT (ours)||97.57||95.16||88.84||-||74.56||69.50|
|+ Combined AT (ours)||97.59||95.11||89.33||-||76.03||70.45|
Iv-C Adversarial Blurring as An Attack
We also tested how adversarial blurring performed as an attack. Following prior work, we used the ImageNet  data and models to test the attack. We selected 50,000 images from the official ImageNet verification set and crafted adversarial examples on the basis of these benign images. We adopted the adversarial accuracy (i.e., the prediction accuracy of the victim model on adversarial examples) as an evaluation metric, and we chose three ImageNet models for the experiment, namely Inception v3 (Inc-v3)  (top-1 accuracy: 77.21%), Inception v4 (Inc-v4)  (top-1 accuracy: 80.12%), and Inception-ResNet v2 (IncRes-v2)  (top-1 accuracy: 80.33%). We tested our (single-step) adversarial blurring and FGSM  in this experiment222To save space, we provide their iterative versions for attack in Appendix D., and we specifically tested the prediction accuracy of a model on adversarial examples crafted using other models, as an evaluation of the adversarial transferability (or say generalization). Table VII summarizes these results, and it can be seen that the adversarially blurred examples transfer reasonably well across ImageNet models. We let the kernel size be and be 0.1 for our adversarial blurring, while for FGSM, we let . All entries of in Eq. (6) were set to . More results can be found in the Appendix B.
We aim at improving the generalization ability of deepfake detection models, by introducing adversarial training to them. Towards the goal, we have developed a new type of adversarial attacks on the basis of image blurring, which can be effectively and efficiently crafted by introducing two generators for training the deepfake detection models. Our method encourages models to learn essential and generalizable features to distinguish fake from real, rather than those obvious artifacts that are too specific. The proposed adversarial blurring-based method can further be combined with the other adversarial methods (e.g., FGSM) to achieve further improvement in generalization. Extensive experiments have shown that our adversarial training improves the generalization ability of deepfake detection models to unseen image/video qualities and deepfake technologies. The performance of the proposed adversarial examples in attacking ImageNet models has also been tested.
Appendix A Using Xception As A Baseline
Here we report the results of models that use Xception  as backbone.
Table VIII and Table IX compare a variety of plausible settings of adversarial training. Consistent with the results in our main paper, here we still compare: (i) Grad-AAT: input-gradient-based additive adversarial training, (ii) Grad-SAT: input-gradient-based spatially-transformed adversarial training, (iii) Grad-BAT: input-gradient-based blurring adversarial training, (iv) Gen-BAT: generator-based blurring adversarial training, (v) Two-Gen-BAT: our BAT with two generators, and, partially inspired by a general image degeneration process, (vi) Combined AT: combining Two-Gen-BAT with Grad-AAT. It can be seen that, similar to the observations on EfficientNet , our BAT with two generators (i.e., Two-Gen-BAT) achieves significantly superior performance in comparison with many other settings, and, when combined with Grad-AAT, further improvement can be obtained.
Appendix B More Results for Attack
In addition to the results reported in Section IV, we further report results for performing attacks with different settings of the blurring kernel. We also provide iterative version of BAT and FGSM (or say PGD) for attack (see Table XIV).
|+ Combined AT||98.68||93.99||86.58||65.42||72.75||55.58||46.16||50.80||74.45||–||61.47||62.06||73.35||65.57|
|+ Combined AT||98.56||98.20||95.14||81.30||69.66||58.61||98.40||93.46||98.68||93.99||72.87||59.76||88.91||79.42||90.77||83.45||88.52||80.75|
|+ Gaussian Noise||98.53||94.58||82.75||61.60||69.96||51.16||42.99||48.46||69.75||–||55.01||53.73||69.85||61.91|
|+ Gaussian Blur||98.87||95.05||82.75||61.64||70.02||51.20||43.02||48.52||69.95||–||54.98||54.68||69.93||62.22|
|+ JPEG Compression||98.44||94.53||82.69||61.57||70.00||51.14||42.98||48.42||69.88||–||55.10||53.93||69.85||61.92|
|+ Combined Traditional||98.99||95.08||82.77||61.70||70.07||51.28||43.10||48.53||70.03||–||55.16||54.79||70.02||62.28|
|+ Two-Gen-BAT (ours)||98.61||94.15||86.18||64.54||72.02||55.35||45.87||50.47||73.15||–||59.69||60.29||72.59||65.16|
|+ Combined AT (ours)||98.68||93.99||86.58||65.42||72.75||55.58||46.16||50.80||74.45||–||61.47||62.06||73.35||65.57|
|+ Gaussian Noise||98.79||98.77||74.09||56.18||56.95||51.86||97.82||92.59||98.53||94.58||70.68||59.82||87.38||75.45||86.44||78.53||86.39||78.35|
|+ Gaussian Blur||99.26||99.14||78.42||59.65||57.59||52.34||98.15||92.98||98.87||95.05||70.89||59.98||87.58||75.69||86.83||78.76||86.44||78.45|
|+ JPEG Compression||98.92||98.88||83.86||67.76||66.24||54.35||97.90||92.69||98.44||94.53||71.96||61.27||87.45||76.33||87.01||79.39||87.53||79.56|
|+ Combined Traditional||99.14||99.18||84.85||68.90||66.75||55.38||98.51||92.97||98.99||95.08||72.16||62.03||87.57||76.52||87.45||80.40||87.57||79.06|
|+ Two-Gen-BAT (ours)||97.86||97.66||95.28||81.34||69.97||59.27||98.42||93.35||98.61||94.15||73.40||59.40||88.78||79.45||90.77||83.39||89.50||81.80|
|+ Combined AT (ours)||98.56||98.20||95.14||81.30||69.66||58.61||98.40||93.46||98.68||93.99||72.87||59.76||88.91||79.42||90.77||83.45||88.52||80.75|
Table XII summarizes how the kernel size of adversarial blurring affects the attack performance. We keep using the adversarial accuracy (i.e., the prediction accuracy of the victim model on adversarial examples) as an evaluation metric. Figure 4 demonstrates the adversarial examples generated using different kernel sizes. We let be and set be a map of all ones for Table XII and Figure 4. As expected, as the kernel size increases, we can obtain more blurry images and thus achieve higher attack success rates.
Table XIII summarizes the influence of initialization of the map (i.e., by settings different maps) for performing the adversarial blurring attack. We let be 0.1 and fixed the kernel size to for obtaining results in Table XIII.
Appendix C Multi-step Scheme in Adversarial Training
As has been mentioned in the main paper, multi-step adversarial examples can be introduced in the input-gradient-based adversarial training, though the training complexity shall increase drastically. We report some experimental results in the multi-step scheme. We tested the performance of using multi-step FGSM (or called iterative FGSM ) and multi-step (input-gradient-based) adversarial blurring in training deepfake detection models. We considered three steps in this experiment, and it was performed based on the EfficientNet backbone. Our results showed that the three-step FGSM in general did not lead to superior test set AUCs (NT: 98.61%, DF: 85.12%, F2F: 68.05%, FS: 44.58%) in comparison to the single-step FGSM (i.e., Grad-AAT, NT: 98.00%, DF: 85.91%, F2F: 71.12%, FS: 44.97%). For our adversarial blurring, the three-step scheme seems slightly more effective in generalizing to the DF and FS data (NT: 98.75%, DF: 86.48%, F2F: 69.67%, FS: 46.72%) in comparison to the single-step input-gradient-based scheme, yet further increasing the number of steps issued in similar or worse AUCs.
The training complexity of our two-generator-based BAT is similar to that of the three-step input-gradient-based BAT, showing that the generator-based method well trades off the test-set accuracy against the training complexity.
Appendix D The Role of Kernel Size in Adversarial Training
In order to better explore the impact of kernel size on our approach, we conducted a set of experiments with only different kernel sizes, and compare the performance of obtained models. It can be seen from Table XV and XVI that when the kernel size is increased, the generalization performance of the model is also improved. Future work can consider even larger kernel sizes.
Appendix E Hyper-parameter Tuning
In order to compare the traditional data augmentation methods and our method fairly, we fine-tuned each hyper-parameter carefully on the validation set and report test results using those obtained the best validation accuracies. For data augmentation using the traditional Gaussian noise, we fine-tuned the following hyper-parameters: , i.e., the probability of adding noise, and the mean and variance of the noise, in the range of and , respectively. We saw the best validation performance when using , , and an uniformly random variance in the range of for each training image. Testing with the traditional Gaussian blur augmentation, we set the kernel size to 9, and we fine-tuned the variance of each Gaussian kernel and also . In this setting, the best validation performance was obtained when each training image took a uniformly random variance in the range of . For JPEG compression, we let the compression quality for each image be randomly sampled between a lower bound and an upper bound. Empirical results on the validation set suggested that the two bounds be and , respectively. For the combination of traditional augmentations, we took the same values for all these hyper-parameters.
-  (2018) Mesonet: a compact facial video forgery detection network. In 2018 IEEE International Workshop on Information Forensics and Security (WIFS), pp. 1–7. Cited by: §II.
What else can fool deep learning? addressing color constancy errors on deep neural network performance. In Proceedings of the IEEE International Conference on Computer Vision, pp. 243–252. Cited by: §III-A.
-  (2018) Obfuscated gradients give a false sense of security: circumventing defenses to adversarial examples. arXiv preprint arXiv:1802.00420. Cited by: §II.
Towards open-set identity preserving face synthesis.
Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 6713–6722. Cited by: §II.
-  (2016) A deep learning approach to universal image manipulation detection using a new convolutional layer. In Proceedings of the 4th ACM Workshop on Information Hiding and Multimedia Security, pp. 5–10. Cited by: §II.
-  (2008) Face swapping: automatically replacing faces in photographs. In ACM SIGGRAPH 2008 papers, pp. 1–8. Cited by: §II.
-  (1997) Video rewrite: driving visual speech with audio. In Proceedings of the 24th annual conference on Computer graphics and interactive techniques, pp. 353–360. Cited by: §II.
-  (2018) . In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 8789–8797. Cited by: §II.
-  (2017) Xception: deep learning with depthwise separable convolutions. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1251–1258. Cited by: Appendix A, TABLE X, TABLE XI, TABLE VIII, TABLE IX, §IV-A, §IV-B, TABLE VI, §IV.
-  (2014) Image forgery localization through the fusion of camera-based, feature-based and pixel-based techniques. In 2014 IEEE International Conference on Image Processing (ICIP), pp. 5302–5306. Cited by: §I.
-  (2017) Recasting residual-based local descriptors as convolutional neural networks: an application to image forgery detection. In Proceedings of the 5th ACM Workshop on Information Hiding and Multimedia Security, pp. 159–164. Cited by: §II.
-  (2011) Video face replacement. In Proceedings of the 2011 SIGGRAPH Asia Conference, pp. 1–10. Cited by: §II.
-  DeepFakes. Note: www.github.com/deepfakes/faceswapAccessed: 2019-09-18 Cited by: §II, §IV.
-  (2009) Imagenet: a large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition, pp. 248–255. Cited by: TABLE XIII, §III-B, §IV-C, §IV, §IV.
Towards generalizable forgery detection with locality-aware autoencoder. arXiv preprint arXiv:1909.05999. Cited by: §I, §II.
-  (2019) Contributing data to deepfake detection research. Google AI Blog. Cited by: §IV-A, TABLE VI, §IV.
-  FaceSwap. Note: www.github.com/MarekKowalski/FaceSwapAccessed: 2019-09-30 Cited by: §I, §II, §IV.
-  (2012) Image forgery localization via fine-grained analysis of cfa artifacts. IEEE Transactions on Information Forensics and Security 7 (5), pp. 1566–1577. Cited by: §II.
-  (2012) Rich models for steganalysis of digital images. IEEE Transactions on Information Forensics and Security 7 (3), pp. 868–882. Cited by: §I, §II.
-  (2014) Automatic face reenactment. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 4217–4224. Cited by: §II.
-  (2015) CFA-aware features for steganalysis of color images. In Media Watermarking, Security, and Forensics 2015, Vol. 9409, pp. 94090V. Cited by: §II.
-  (2014) Explaining and harnessing adversarial examples. arXiv preprint arXiv:1412.6572. Cited by: §I, §II, Fig. 1, §III-A, §III-B, §IV-C.
-  (2014) Generative adversarial nets. In Advances in neural information processing systems, pp. 2672–2680. Cited by: §II, §III-C.
Deepfake video detection using recurrent neural networks. In 2018 15th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS), pp. 1–6. Cited by: §II.
-  (2019) Attgan: facial attribute editing by only changing what you want. IEEE Transactions on Image Processing 28 (11), pp. 5464–5478. Cited by: §II.
-  (2021) Adversarial deepfakes: evaluating vulnerability of deepfake detectors to adversarial examples. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 3348–3357. Cited by: §II.
-  (2016) Adversarial machine learning at scale. arXiv preprint arXiv:1611.01236. Cited by: Appendix C, §II.
-  (2020) Face x-ray for more general face forgery detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5001–5010. Cited by: §I, §II.
-  (2018) Exposing deepfake videos by detecting face warping artifacts. arXiv preprint arXiv:1811.00656 2. Cited by: §II.
-  (2020) Celeb-df: a large-scale challenging dataset for deepfake forensics. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3207–3216. Cited by: TABLE VI, §IV.
-  (2019) On the variance of the adaptive learning rate and beyond. arXiv preprint arXiv:1908.03265. Cited by: §IV.
-  (2019) Stgan: a unified selective transfer network for arbitrary image attribute editing. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 3673–3682. Cited by: §II.
-  (2017) Towards deep learning models resistant to adversarial attacks. arXiv preprint arXiv:1706.06083. Cited by: §I, §II, §III-B.
-  (2019) FSGAN: subject agnostic face swapping and reenactment. In Proceedings of the IEEE international conference on computer vision, pp. 7184–7193. Cited by: §II.
-  (2017) Realistic dynamic facial textures from a single image using gans. In Proceedings of the IEEE International Conference on Computer Vision, pp. 5429–5438. Cited by: §II.
-  (2012) Exposing image splicing with inconsistent local noise variances. In 2012 IEEE International Conference on Computational Photography (ICCP), pp. 1–10. Cited by: §I, §II.
-  (2019) PyTorch: an imperative style, high-performance deep learning library. In Advances in Neural Information Processing Systems, Vol. 32, pp. 8026–8037. Cited by: §IV.
-  (2017) Distinguishing computer graphics from natural images using convolution neural networks. In 2017 IEEE Workshop on Information Forensics and Security (WIFS), pp. 1–6. Cited by: §II.
-  (2019) Faceforensics++: learning to detect manipulated facial images. arXiv preprint arXiv:1901.08971. Cited by: TABLE XI, TABLE VIII, TABLE IX, §II, TABLE I, TABLE II, TABLE V, TABLE VI, §IV, §IV.
-  (2020) Disrupting deepfakes: adversarial attacks against conditional image translation networks and facial manipulation systems. In European Conference on Computer Vision, pp. 236–251. Cited by: §II.
-  (2020) A simple way to make neural networks robust against diverse image corruptions. arXiv preprint arXiv:2001.06057. Cited by: §III-C.
-  (2019) On the detection of digital face manipulation. arXiv preprint arXiv:1910.01717. Cited by: §II, §IV-B, TABLE VI, §IV.
Inception-v4, inception-resnet and the impact of residual connections on learning. arXiv preprint arXiv:1602.07261. Cited by: §IV-C.
-  (2016) Rethinking the inception architecture for computer vision. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 2818–2826. Cited by: §IV-C.
-  (2019) Efficientnet: rethinking model scaling for convolutional neural networks. arXiv preprint arXiv:1905.11946. Cited by: Appendix A, §IV-A, §IV-B, TABLE I, TABLE II, TABLE III, TABLE IV, TABLE V, TABLE VI, §IV.
-  (2019) Deferred neural rendering: image synthesis using neural textures. arXiv preprint arXiv:1904.12356. Cited by: §II, §IV.
-  (2016) Face2face: real-time face capture and reenactment of rgb videos. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2387–2395. Cited by: §I, §II, §IV.
-  (2019) Are labels required for improving adversarial robustness?. arXiv preprint arXiv:1905.13725. Cited by: §II.
-  (2020) CNN-generated images are surprisingly easy to spot… for now. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Vol. 7. Cited by: Appendix A, §III-B, §IV-B.
-  (2018) Spatially transformed adversarial examples. arXiv preprint arXiv:1801.02612. Cited by: §I, Fig. 1, §III-A.
-  (2019) Attributing fake images to gans: learning and analyzing gan fingerprints. In Proceedings of the IEEE International Conference on Computer Vision, pp. 7556–7566. Cited by: §I.
-  ZAO. Note: https://apps.apple.com/cn/app/zao/id1465199127Accessed: 2019-10-13 Cited by: §II.
-  (2017) Two-stream neural networks for tampered face detection. In 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp. 1831–1839. Cited by: §II.
-  (2017) Unpaired image-to-image translation using cycle-consistent adversarial networks. In Proceedings of the IEEE international conference on computer vision, pp. 2223–2232. Cited by: §III-C.