Live Speech Portraits: Real-Time Photorealistic Talking-Head Animation

09/22/2021
by   Yuanxun Lu, et al.
1

To the best of our knowledge, we first present a live system that generates personalized photorealistic talking-head animation only driven by audio signals at over 30 fps. Our system contains three stages. The first stage is a deep neural network that extracts deep audio features along with a manifold projection to project the features to the target person's speech space. In the second stage, we learn facial dynamics and motions from the projected audio features. The predicted motions include head poses and upper body motions, where the former is generated by an autoregressive probabilistic model which models the head pose distribution of the target person. Upper body motions are deduced from head poses. In the final stage, we generate conditional feature maps from previous predictions and send them with a candidate image set to an image-to-image translation network to synthesize photorealistic renderings. Our method generalizes well to wild audio and successfully synthesizes high-fidelity personalized facial details, e.g., wrinkles, teeth. Our method also allows explicit control of head poses. Extensive qualitative and quantitative evaluations, along with user studies, demonstrate the superiority of our method over state-of-the-art techniques.

READ FULL TEXT

page 1

page 3

page 8

page 9

page 11

page 13

page 14

research
06/20/2023

Audio-Driven 3D Facial Animation from In-the-Wild Videos

Given an arbitrary audio clip, audio-driven 3D facial animation aims to ...
research
08/18/2021

FACIAL: Synthesizing Dynamic Talking Face with Implicit Attribute Learning

In this paper, we propose a talking face generation method that takes an...
research
02/20/2020

Photorealistic Lip Sync with Adversarial Temporal Convolutional Networks

Lip sync has emerged as a promising technique to generate mouth movement...
research
11/17/2022

SPACEx: Speech-driven Portrait Animation with Controllable Expression

Animating portraits using speech has received growing attention in recen...
research
09/09/2023

Speech2Lip: High-fidelity Speech to Lip Generation by Learning from a Short Video

Synthesizing realistic videos according to a given speech is still an op...
research
07/18/2023

Efficient Region-Aware Neural Radiance Fields for High-Fidelity Talking Portrait Synthesis

This paper presents ER-NeRF, a novel conditional Neural Radiance Fields ...
research
12/07/2022

Talking Head Generation with Probabilistic Audio-to-Visual Diffusion Priors

In this paper, we introduce a simple and novel framework for one-shot au...

Please sign up or login with your details

Forgot password? Click here to reset