Talking Head Generation with Audio and Speech Related Facial Action Units

10/19/2021
by   Sen Chen, et al.
0

The task of talking head generation is to synthesize a lip synchronized talking head video by inputting an arbitrary face image and audio clips. Most existing methods ignore the local driving information of the mouth muscles. In this paper, we propose a novel recurrent generative network that uses both audio and speech-related facial action units (AUs) as the driving information. AU information related to the mouth can guide the movement of the mouth more accurately. Since speech is highly correlated with speech-related AUs, we propose an Audio-to-AU module in our system to predict the speech-related AU information from speech. In addition, we use AU classifier to ensure that the generated images contain correct AU information. Frame discriminator is also constructed for adversarial training to improve the realism of the generated face. We verify the effectiveness of our model on the GRID dataset and TCD-TIMIT dataset. We also conduct an ablation study to verify the contribution of each component in our model. Quantitative and qualitative experiments demonstrate that our method outperforms existing methods in both image quality and lip-sync accuracy.

READ FULL TEXT

page 4

page 8

page 10

page 15

page 16

page 17

page 18

research
04/27/2022

Talking Head Generation Driven by Speech-Related Facial Action Units and Audio- Based on Multimodal Representation Fusion

Talking head generation is to synthesize a lip-synchronized talking head...
research
04/13/2018

Talking Face Generation by Conditional Recurrent Adversarial Network

Given an arbitrary face image and an arbitrary speech clip, the proposed...
research
06/23/2017

Listen to Your Face: Inferring Facial Action Units from Audio Channel

Extensive efforts have been devoted to recognizing facial action units (...
research
11/17/2022

SPACEx: Speech-driven Portrait Animation with Controllable Expression

Animating portraits using speech has received growing attention in recen...
research
12/17/2018

High-Resolution Talking Face Generation via Mutual Information Approximation

Given an arbitrary speech clip and a facial image, talking face generati...
research
02/20/2020

Photorealistic Lip Sync with Adversarial Temporal Convolutional Networks

Lip sync has emerged as a promising technique to generate mouth movement...
research
06/20/2023

Audio-Driven 3D Facial Animation from In-the-Wild Videos

Given an arbitrary audio clip, audio-driven 3D facial animation aims to ...

Please sign up or login with your details

Forgot password? Click here to reset