Modality Dropout for Improved Performance-driven Talking Faces

05/27/2020
by   Ahmed Hussen Abdelaziz, et al.
0

We describe our novel deep learning approach for driving animated faces using both acoustic and visual information. In particular, speech-related facial movements are generated using audiovisual information, and non-speech facial movements are generated using only visual information. To ensure that our model exploits both modalities during training, batches are generated that contain audio-only, video-only, and audiovisual input features. The probability of dropping a modality allows control over the degree to which the model exploits audio and visual information during training. Our trained model runs in real-time on resource limited hardware (e.g. a smart phone), it is user agnostic, and it is not dependent on a potentially error-prone transcription of the speech. We use subjective testing to demonstrate: 1) the improvement of audiovisual-driven animation over the equivalent video-only approach, and 2) the improvement in the animation of speech-related facial movements after introducing modality dropout. Before introducing dropout, viewers prefer audiovisual-driven animation in 51 18 audiovisual-driven animation increases to 74 video-only.

READ FULL TEXT

page 9

page 10

research
06/01/2023

Speech inpainting: Context-based speech synthesis guided by video

Audio and visual modalities are inherently connected in speech signals: ...
research
08/10/2023

Speech-Driven 3D Face Animation with Composite and Regional Facial Movements

Speech-driven 3D face animation poses significant challenges due to the ...
research
06/29/2017

Improving Speech Related Facial Action Unit Recognition by Audiovisual Information Fusion

It is challenging to recognize facial action unit (AU) from spontaneous ...
research
06/14/2019

Realistic Speech-Driven Facial Animation with GANs

Speech-driven facial animation is the process that automatically synthes...
research
05/29/2020

Not made for each other- Audio-Visual Dissonance-based Deepfake Detection and Localization

We propose detection of deepfake videos based on the dissimilarity betwe...
research
12/12/2019

Speech-driven facial animation using polynomial fusion of features

Speech-driven facial animation involves using a speech signal to generat...
research
12/10/2021

FaceFormer: Speech-Driven 3D Facial Animation with Transformers

Speech-driven 3D facial animation is challenging due to the complex geom...

Please sign up or login with your details

Forgot password? Click here to reset