Deepfake Video Detection with Spatiotemporal Dropout Transformer

07/14/2022
by   Daichi Zhang, et al.
0

While the abuse of deepfake technology has caused serious concerns recently, how to detect deepfake videos is still a challenge due to the high photo-realistic synthesis of each frame. Existing image-level approaches often focus on single frame and ignore the spatiotemporal cues hidden in deepfake videos, resulting in poor generalization and robustness. The key of a video-level detector is to fully exploit the spatiotemporal inconsistency distributed in local facial regions across different frames in deepfake videos. Inspired by that, this paper proposes a simple yet effective patch-level approach to facilitate deepfake video detection via spatiotemporal dropout transformer. The approach reorganizes each input video into bag of patches that is then fed into a vision transformer to achieve robust representation. Specifically, a spatiotemporal dropout operation is proposed to fully explore patch-level spatiotemporal cues and serve as effective data augmentation to further enhance model's robustness and generalization ability. The operation is flexible and can be easily plugged into existing vision transformers. Extensive experiments demonstrate the effectiveness of our approach against 25 state-of-the-arts with impressive robustness, generalizability, and representation ability.

READ FULL TEXT

page 2

page 4

page 9

research
03/14/2021

Towards Generalizable and Robust Face Manipulation Detection via Bag-of-local-feature

Over the past several years, in order to solve the problem of malicious ...
research
03/19/2021

Hopper: Multi-hop Transformer for Spatiotemporal Reasoning

This paper considers the problem of spatiotemporal object-centric reason...
research
03/25/2021

Frame-rate Up-conversion Detection Based on Convolutional Neural Network for Learning Spatiotemporal Features

With the advance in user-friendly and powerful video editing tools, anyo...
research
01/21/2021

SSTVOS: Sparse Spatiotemporal Transformers for Video Object Segmentation

In this paper we introduce a Transformer-based approach to video object ...
research
01/29/2019

Anomaly Locality in Video Surveillance

This paper strives for the detection of real-world anomalies such as bur...
research
10/23/2022

UIA-ViT: Unsupervised Inconsistency-Aware Method based on Vision Transformer for Face Forgery Detection

Intra-frame inconsistency has been proved to be effective for the genera...
research
04/13/2023

DNeRV: Modeling Inherent Dynamics via Difference Neural Representation for Videos

Existing implicit neural representation (INR) methods do not fully explo...

Please sign up or login with your details

Forgot password? Click here to reset