Not made for each other- Audio-Visual Dissonance-based Deepfake Detection and Localization

05/29/2020
by   Komal Chugh, et al.
11

We propose detection of deepfake videos based on the dissimilarity between the audio and visual modalities, termed as the Modality Dissonance Score (MDS). We hypothesize that manipulation of either modality will lead to dis-harmony between the two modalities, eg, loss of lip-sync, unnatural facial and lip movements, etc. MDS is computed as an aggregate of dissimilarity scores between audio and visual segments in a video. Discriminative features are learnt for the audio and visual channels in a chunk-wise manner, employing the cross-entropy loss for individual modalities, and a contrastive loss that models inter-modality similarity. Extensive experiments on the DFDC and DeepFake-TIMIT Datasets show that our approach outperforms the state-of-the-art by up to 7 our technique identifies the manipulated video segments.

READ FULL TEXT

page 4

page 6

research
08/26/2021

Multi-Modulation Network for Audio-Visual Event Localization

We study the problem of localizing audio-visual events that are both aud...
research
05/02/2018

Investigating Audio, Visual, and Text Fusion Methods for End-to-End Automatic Personality Prediction

We propose a tri-modal architecture to predict Big Five personality trai...
research
06/12/2023

NPVForensics: Jointing Non-critical Phonemes and Visemes for Deepfake Detection

Deepfake technologies empowered by deep learning are rapidly evolving, c...
research
04/13/2022

Do You Really Mean That? Content Driven Audio-Visual Deepfake Dataset and Multimodal Method for Temporal Forgery Localization

Due to its high societal impact, deepfake detection is getting active at...
research
11/02/2022

Impact of annotation modality on label quality and model performance in the automatic assessment of laughter in-the-wild

Laughter is considered one of the most overt signals of joy. Laughter is...
research
09/15/2023

AV-MaskEnhancer: Enhancing Video Representations through Audio-Visual Masked Autoencoder

Learning high-quality video representation has shown significant applica...
research
05/27/2020

Modality Dropout for Improved Performance-driven Talking Faces

We describe our novel deep learning approach for driving animated faces ...

Please sign up or login with your details

Forgot password? Click here to reset