DeepAI AI Chat
Log In Sign Up

Modality Dropout for Improved Performance-driven Talking Faces

05/27/2020
by   Ahmed Hussen Abdelaziz, et al.
Apple Inc.
0

We describe our novel deep learning approach for driving animated faces using both acoustic and visual information. In particular, speech-related facial movements are generated using audiovisual information, and non-speech facial movements are generated using only visual information. To ensure that our model exploits both modalities during training, batches are generated that contain audio-only, video-only, and audiovisual input features. The probability of dropping a modality allows control over the degree to which the model exploits audio and visual information during training. Our trained model runs in real-time on resource limited hardware (e.g. a smart phone), it is user agnostic, and it is not dependent on a potentially error-prone transcription of the speech. We use subjective testing to demonstrate: 1) the improvement of audiovisual-driven animation over the equivalent video-only approach, and 2) the improvement in the animation of speech-related facial movements after introducing modality dropout. Before introducing dropout, viewers prefer audiovisual-driven animation in 51 18 audiovisual-driven animation increases to 74 video-only.

READ FULL TEXT

page 9

page 10

06/01/2023

Speech inpainting: Context-based speech synthesis guided by video

Audio and visual modalities are inherently connected in speech signals: ...
06/14/2019

Realistic Speech-Driven Facial Animation with GANs

Speech-driven facial animation is the process that automatically synthes...
06/29/2017

Improving Speech Related Facial Action Unit Recognition by Audiovisual Information Fusion

It is challenging to recognize facial action unit (AU) from spontaneous ...
05/29/2020

Not made for each other- Audio-Visual Dissonance-based Deepfake Detection and Localization

We propose detection of deepfake videos based on the dissimilarity betwe...
10/19/2021

Talking Head Generation with Audio and Speech Related Facial Action Units

The task of talking head generation is to synthesize a lip synchronized ...
05/27/2019

Audio2Face: Generating Speech/Face Animation from Single Audio with Attention-Based Bidirectional LSTM Networks

We propose an end to end deep learning approach for generating real-time...
12/12/2019

Speech-driven facial animation using polynomial fusion of features

Speech-driven facial animation involves using a speech signal to generat...