High-Resolution Talking Face Generation via Mutual Information Approximation

12/17/2018
by   Aihua Zheng, et al.
0

Given an arbitrary speech clip and a facial image, talking face generation aims to synthesize a talking face video with precise lip synchronization as well as a smooth transition of facial motion over the entire video speech. Most existing methods mainly focus on either disentangling the information in a single image or learning temporal information between frames. However, speech audio and video often have cross-modality coherence that has not been well addressed during synthesis. Therefore, this paper proposes a novel high-resolution talking face generation model for arbitrary person by discovering the cross-modality coherence via Mutual Information Approximation (MIA). By assuming the modality difference between audio and video is larger that of real video and generated video, we estimate mutual information between real audio and video, and then use a discriminator to enforce generated video distribution approach real video distribution. Furthermore, we introduce a dynamic attention technique on the mouth to enhance the robustness during the training stage. Experimental results on benchmark dataset LRW transcend the state-of-the-art methods on prevalent metrics with robustness on gender, pose variations and high-resolution synthesizing.

READ FULL TEXT

page 4

page 7

page 8

page 9

research
04/13/2018

Talking Face Generation by Conditional Recurrent Adversarial Network

Given an arbitrary face image and an arbitrary speech clip, the proposed...
research
05/09/2019

Hierarchical Cross-Modal Talking Face Generationwith Dynamic Pixel-Wise Loss

We devise a cascade GAN approach to generate talking face video, which i...
research
04/16/2021

MeshTalk: 3D Face Animation from Speech using Cross-Modality Disentanglement

This paper presents a generic method for generating full facial 3D anima...
research
10/19/2021

Talking Head Generation with Audio and Speech Related Facial Action Units

The task of talking head generation is to synthesize a lip synchronized ...
research
03/08/2022

StyleHEAT: One-Shot High-Resolution Editable Talking Face Generation via Pretrained StyleGAN

One-shot talking face generation aims at synthesizing a high-quality tal...
research
11/01/2018

Towards Highly Accurate and Stable Face Alignment for High-Resolution Videos

In recent years, heatmap regression based models have shown their effect...
research
03/28/2018

Lip Movements Generation at a Glance

Cross-modality generation is an emerging topic that aims to synthesize d...

Please sign up or login with your details

Forgot password? Click here to reset