Landmark Enhanced Multimodal Graph Learning for Deepfake Video Detection

09/12/2022
by   Zhiyuan Yan, et al.
0

With the rapid development of face forgery technology, deepfake videos have attracted widespread attention in digital media. Perpetrators heavily utilize these videos to spread disinformation and make misleading statements. Most existing methods for deepfake detection mainly focus on texture features, which are likely to be impacted by external fluctuations, such as illumination and noise. Besides, detection methods based on facial landmarks are more robust against external variables but lack sufficient detail. Thus, how to effectively mine distinctive features in the spatial, temporal, and frequency domains and fuse them with facial landmarks for forgery video detection is still an open question. To this end, we propose a Landmark Enhanced Multimodal Graph Neural Network (LEM-GNN) based on multiple modalities' information and geometric features of facial landmarks. Specifically, at the frame level, we have designed a fusion mechanism to mine a joint representation of the spatial and frequency domain elements while introducing geometric facial features to enhance the robustness of the model. At the video level, we first regard each frame in a video as a node in a graph and encode temporal information into the edges of the graph. Then, by applying the message passing mechanism of the graph neural network (GNN), the multimodal feature will be effectively combined to obtain a comprehensive representation of the video forgery. Extensive experiments show that our method consistently outperforms the state-of-the-art (SOTA) on widely-used benchmarks.

READ FULL TEXT

page 4

page 13

page 15

research
09/21/2016

Detecting facial landmarks in the video based on a hybrid framework

To dynamically detect the facial landmarks in the video, we propose a no...
research
03/07/2022

End-to-end video instance segmentation via spatial-temporal graph neural networks

Video instance segmentation is a challenging task that extends image ins...
research
08/15/2022

STAR-GNN: Spatial-Temporal Video Representation for Content-based Retrieval

We propose a video feature representation learning framework called STAR...
research
05/24/2022

Evidential Temporal-aware Graph-based Social Event Detection via Dempster-Shafer Theory

The rising popularity of online social network services has attracted lo...
research
04/09/2021

Improving the Efficiency and Robustness of Deepfakes Detection through Precise Geometric Features

Deepfakes is a branch of malicious techniques that transplant a target f...
research
05/01/2022

Geometric Graph Representation with Learnable Graph Structure and Adaptive AU Constraint for Micro-Expression Recognition

Micro-expression recognition (MER) is valuable because the involuntary n...
research
06/24/2020

Comprehensive Information Integration Modeling Framework for Video Titling

In e-commerce, consumer-generated videos, which in general deliver consu...

Please sign up or login with your details

Forgot password? Click here to reset