Audio-driven Talking Face Video Generation with Learning-based Personalized Head Pose

02/24/2020
by   Ran Yi, et al.
0

Real-world talking faces often accompany with natural head movement. However, most existing talking face video generation methods only consider facial animation with fixed head pose. In this paper, we address this problem by proposing a deep neural network model that takes an audio signal A of a source person and a very short video V of a target person as input, and outputs a synthesized high-quality talking face video with personalized head pose (making use of the visual information in V), expression and lip synchronization (by considering both A and V). The most challenging issue in our work is that natural poses often cause in-plane and out-of-plane head rotations, which makes synthesized talking face video far from realistic. To address this challenge, we reconstruct 3D face animation and re-render it into synthesized frames. To fine tune these frames into realistic ones with smooth background transition, we propose a novel memory-augmented GAN module. By first training a general mapping based on a publicly available dataset and fine-tuning the mapping using the input short video of target person, we develop an effective strategy that only requires a small number of frames (about 300 frames) to learn personalized talking behavior including head pose. Extensive experiments and two user studies show that our method can generate high-quality (i.e., personalized head movements, expressions and good lip synchronization) talking face videos, which are naturally looking with more distinguishing head movement effects than the state-of-the-art methods.

READ FULL TEXT
research
02/24/2020

Audio-driven Talking Face Video Generation with Natural Head Pose

Real-world talking faces often accompany with natural head movement. How...
research
01/03/2022

DFA-NeRF: Personalized Talking Head Generation via Disentangled Face Attributes Neural Rendering

While recent advances in deep neural networks have made it possible to r...
research
06/08/2021

LipSync3D: Data-Efficient Learning of Personalized 3D Talking Faces from Video using Pose and Lighting Normalization

In this paper, we present a video-based learning framework for animating...
research
04/13/2018

Talking Face Generation by Conditional Recurrent Adversarial Network

Given an arbitrary face image and an arbitrary speech clip, the proposed...
research
04/22/2021

Pose-Controllable Talking Face Generation by Implicitly Modularized Audio-Visual Representation

While accurate lip synchronization has been achieved for arbitrary-subje...
research
01/16/2022

Audio-Driven Talking Face Video Generation with Dynamic Convolution Kernels

In this paper, we present a dynamic convolution kernel (DCK) strategy fo...
research
08/11/2021

FakeAVCeleb: A Novel Audio-Video Multimodal Deepfake Dataset

While significant advancements have been made in the generation of deepf...

Please sign up or login with your details

Forgot password? Click here to reset