DeepAI AI Chat
Log In Sign Up

Talking Head Generation with Audio and Speech Related Facial Action Units

by   Sen Chen, et al.

The task of talking head generation is to synthesize a lip synchronized talking head video by inputting an arbitrary face image and audio clips. Most existing methods ignore the local driving information of the mouth muscles. In this paper, we propose a novel recurrent generative network that uses both audio and speech-related facial action units (AUs) as the driving information. AU information related to the mouth can guide the movement of the mouth more accurately. Since speech is highly correlated with speech-related AUs, we propose an Audio-to-AU module in our system to predict the speech-related AU information from speech. In addition, we use AU classifier to ensure that the generated images contain correct AU information. Frame discriminator is also constructed for adversarial training to improve the realism of the generated face. We verify the effectiveness of our model on the GRID dataset and TCD-TIMIT dataset. We also conduct an ablation study to verify the contribution of each component in our model. Quantitative and qualitative experiments demonstrate that our method outperforms existing methods in both image quality and lip-sync accuracy.


page 4

page 8

page 10

page 15

page 16

page 17

page 18


Talking Face Generation by Conditional Recurrent Adversarial Network

Given an arbitrary face image and an arbitrary speech clip, the proposed...

Listen to Your Face: Inferring Facial Action Units from Audio Channel

Extensive efforts have been devoted to recognizing facial action units (...

SPACEx: Speech-driven Portrait Animation with Controllable Expression

Animating portraits using speech has received growing attention in recen...

High-Resolution Talking Face Generation via Mutual Information Approximation

Given an arbitrary speech clip and a facial image, talking face generati...

Photorealistic Lip Sync with Adversarial Temporal Convolutional Networks

Lip sync has emerged as a promising technique to generate mouth movement...

Modality Dropout for Improved Performance-driven Talking Faces

We describe our novel deep learning approach for driving animated faces ...