Talking Head Generation Driven by Speech-Related Facial Action Units and Audio- Based on Multimodal Representation Fusion

04/27/2022
by   Sen Chen, et al.
0

Talking head generation is to synthesize a lip-synchronized talking head video by inputting an arbitrary face image and corresponding audio clips. Existing methods ignore not only the interaction and relationship of cross-modal information, but also the local driving information of the mouth muscles. In this study, we propose a novel generative framework that contains a dilated non-causal temporal convolutional self-attention network as a multimodal fusion module to promote the relationship learning of cross-modal features. In addition, our proposed method uses both audio- and speech-related facial action units (AUs) as driving information. Speech-related AU information can guide mouth movements more accurately. Because speech is highly correlated with speech-related AUs, we propose an audio-to-AU module to predict speech-related AU information. We utilize pre-trained AU classifier to ensure that the generated images contain correct AU information. We verify the effectiveness of the proposed model on the GRID and TCD-TIMIT datasets. An ablation study is also conducted to verify the contribution of each component. The results of quantitative and qualitative experiments demonstrate that our method outperforms existing methods in terms of both image quality and lip-sync accuracy.

READ FULL TEXT

page 15

page 17

page 19

page 20

research
10/19/2021

Talking Head Generation with Audio and Speech Related Facial Action Units

The task of talking head generation is to synthesize a lip synchronized ...
research
11/03/2021

A cross-modal fusion network based on self-attention and residual structure for multimodal emotion recognition

The audio-video based multimodal emotion recognition has attracted a lot...
research
12/10/2021

FaceFormer: Speech-Driven 3D Facial Animation with Transformers

Speech-driven 3D facial animation is challenging due to the complex geom...
research
10/02/2020

Stuttering Speech Disfluency Prediction using Explainable Attribution Vectors of Facial Muscle Movements

Speech disorders such as stuttering disrupt the normal fluency of speech...
research
06/29/2017

Improving Speech Related Facial Action Unit Recognition by Audiovisual Information Fusion

It is challenging to recognize facial action unit (AU) from spontaneous ...
research
07/10/2021

Speech2Video: Cross-Modal Distillation for Speech to Video Generation

This paper investigates a novel task of talking face video generation so...
research
06/23/2017

Listen to Your Face: Inferring Facial Action Units from Audio Channel

Extensive efforts have been devoted to recognizing facial action units (...

Please sign up or login with your details

Forgot password? Click here to reset