M2TR: Multi-modal Multi-scale Transformers for Deepfake Detection

04/20/2021
by   Junke Wang, et al.
0

The widespread dissemination of forged images generated by Deepfake techniques has posed a serious threat to the trustworthiness of digital information. This demands effective approaches that can detect perceptually convincing Deepfakes generated by advanced manipulation techniques. Most existing approaches combat Deepfakes with deep neural networks by mapping the input image to a binary prediction without capturing the consistency among different pixels. In this paper, we aim to capture the subtle manipulation artifacts at different scales for Deepfake detection. We achieve this with transformer models, which have recently demonstrated superior performance in modeling dependencies between pixels for a variety of recognition tasks in computer vision. In particular, we introduce a Multi-modal Multi-scale TRansformer (M2TR), which uses a multi-scale transformer that operates on patches of different sizes to detect the local inconsistency at different spatial levels. To improve the detection results and enhance the robustness of our method to image compression, M2TR also takes frequency information, which is further combined with RGB features using a cross modality fusion module. Developing and evaluating Deepfake detection methods requires large-scale datasets. However, we observe that samples in existing benchmarks contain severe artifacts and lack diversity. This motivates us to introduce a high-quality Deepfake dataset, SR-DF, which consists of 4,000 DeepFake videos generated by state-of-the-art face swapping and facial reenactment methods. On three Deepfake datasets, we conduct extensive experiments to verify the effectiveness of the proposed method, which outperforms state-of-the-art Deepfake detection methods.

READ FULL TEXT

page 1

page 2

page 5

page 11

research
09/16/2021

M2RNet: Multi-modal and Multi-scale Refined Network for RGB-D Salient Object Detection

Salient object detection is a fundamental topic in computer vision. Prev...
research
12/01/2021

Transformer-based Network for RGB-D Saliency Detection

RGB-D saliency detection integrates information from both RGB images and...
research
11/30/2022

Two-branch Multi-scale Deep Neural Network for Generalized Document Recapture Attack Detection

The image recapture attack is an effective image manipulation method to ...
research
06/06/2022

CORE: Consistent Representation Learning for Face Forgery Detection

Face manipulation techniques develop rapidly and arouse widespread publi...
research
07/27/2023

IML-ViT: Image Manipulation Localization by Vision Transformer

Advanced image tampering techniques are increasingly challenging the tru...
research
06/22/2022

Behavior Transformers: Cloning k modes with one stone

While behavior learning has made impressive progress in recent times, it...
research
03/28/2022

ObjectFormer for Image Manipulation Detection and Localization

Recent advances in image editing techniques have posed serious challenge...

Please sign up or login with your details

Forgot password? Click here to reset