Self-supervised Transformer for Deepfake Detection

03/02/2022
by   Hanqing Zhao, et al.
25

The fast evolution and widespread of deepfake techniques in real-world scenarios require stronger generalization abilities of face forgery detectors. Some works capture the features that are unrelated to method-specific artifacts, such as clues of blending boundary, accumulated up-sampling, to strengthen the generalization ability. However, the effectiveness of these methods can be easily corrupted by post-processing operations such as compression. Inspired by transfer learning, neural networks pre-trained on other large-scale face-related tasks may provide useful features for deepfake detection. For example, lip movement has been proved to be a kind of robust and good-transferring highlevel semantic feature, which can be learned from the lipreading task. However, the existing method pre-trains the lip feature extraction model in a supervised manner, which requires plenty of human resources in data annotation and increases the difficulty of obtaining training data. In this paper, we propose a self-supervised transformer based audio-visual contrastive learning method. The proposed method learns mouth motion representations by encouraging the paired video and audio representations to be close while unpaired ones to be diverse. After pre-training with our method, the model will then be partially fine-tuned for deepfake detection task. Extensive experiments show that our self-supervised method performs comparably or even better than the supervised pre-training counterpart.

READ FULL TEXT
research
09/09/2023

Self-Supervised Transformer with Domain Adaptive Reconstruction for General Face Forgery Video Detection

Face forgery videos have caused severe social public concern, and variou...
research
07/27/2023

Self-Supervised Graph Transformer for Deepfake Detection

Deepfake detection methods have shown promising results in recognizing f...
research
10/28/2022

Facial Action Unit Detection and Intensity Estimation from Self-supervised Representation

As a fine-grained and local expression behavior measurement, facial acti...
research
11/23/2022

SS-CXR: Multitask Representation Learning using Self Supervised Pre-training from Chest X-Rays

Chest X-rays (CXRs) are a widely used imaging modality for the diagnosis...
research
07/07/2023

SpawnNet: Learning Generalizable Visuomotor Skills from Pre-trained Networks

The existing internet-scale image and video datasets cover a wide range ...
research
05/17/2021

Exploring Self-Supervised Representation Ensembles for COVID-19 Cough Classification

The usage of smartphone-collected respiratory sound, trained with deep l...
research
11/19/2021

Dynamic Graph Representation Learning via Graph Transformer Networks

Dynamic graph representation learning is an important task with widespre...

Please sign up or login with your details

Forgot password? Click here to reset