Audio-driven Talking Face Video Generation with Natural Head Pose

02/24/2020
by   Ran Yi, et al.
0

Real-world talking faces often accompany with natural head movement. However, most existing talking face video generation methods only consider facial animation with fixed head pose. In this paper, we address this problem by proposing a deep neural network model that takes an audio signal A of a source person and a very short video V of a target person as input, and outputs a synthesized high-quality talking face video with natural head pose (making use of the visual information in V), expression and lip synchronization (by considering both A and V). The most challenging issue in our work is that natural poses often cause in-plane and out-of-plane head rotations, which makes synthesized talking face video far from realistic. To address this challenge, we reconstruct 3D face animation and re-render it into synthesized frames. To fine tune these frames into realistic ones with smooth background transition, we propose a novel memory-augmented GAN module. Extensive experiments and three user studies show that our method can generate high-quality (i.e., natural head movements, expressions and good lip synchronization) personalized talking face videos, outperforming the state-of-the-art methods.

READ FULL TEXT

page 3

page 5

page 7

page 8

page 12

research
02/24/2020

Audio-driven Talking Face Video Generation with Learning-based Personalized Head Pose

Real-world talking faces often accompany with natural head movement. How...
research
11/27/2022

VideoReTalking: Audio-based Lip Synchronization for Talking Head Video Editing In the Wild

We present VideoReTalking, a new system to edit the faces of a real-worl...
research
09/14/2023

HDTR-Net: A Real-Time High-Definition Teeth Restoration Network for Arbitrary Talking Face Generation Methods

Talking Face Generation (TFG) aims to reconstruct facial movements to ac...
research
07/18/2023

Plug the Leaks: Advancing Audio-driven Talking Face Generation by Preventing Unintended Information Flow

Audio-driven talking face generation is the task of creating a lip-synch...
research
08/29/2022

StableFace: Analyzing and Improving Motion Stability for Talking Face Generation

While previous speech-driven talking face generation methods have made s...
research
02/06/2022

Portrait Segmentation Using Deep Learning

A portrait is a painting, drawing, photograph, or engraving of a person,...
research
11/22/2022

SadTalker: Learning Realistic 3D Motion Coefficients for Stylized Audio-Driven Single Image Talking Face Animation

Generating talking head videos through a face image and a piece of speec...

Please sign up or login with your details

Forgot password? Click here to reset