Parametric Implicit Face Representation for Audio-Driven Facial Reenactment

06/13/2023
by   Ricong Huang, et al.
0

Audio-driven facial reenactment is a crucial technique that has a range of applications in film-making, virtual avatars and video conferences. Existing works either employ explicit intermediate face representations (e.g., 2D facial landmarks or 3D face models) or implicit ones (e.g., Neural Radiance Fields), thus suffering from the trade-offs between interpretability and expressive power, hence between controllability and quality of the results. In this work, we break these trade-offs with our novel parametric implicit face representation and propose a novel audio-driven facial reenactment framework that is both controllable and can generate high-quality talking heads. Specifically, our parametric implicit representation parameterizes the implicit representation with interpretable parameters of 3D face models, thereby taking the best of both explicit and implicit methods. In addition, we propose several new techniques to improve the three components of our framework, including i) incorporating contextual information into the audio-to-expression parameters encoding; ii) using conditional image synthesis to parameterize the implicit representation and implementing it with an innovative tri-plane structure for efficient learning; iii) formulating facial reenactment as a conditional image inpainting problem and proposing a novel data augmentation technique to improve model generalizability. Extensive experiments demonstrate that our method can generate more realistic results than previous methods with greater fidelity to the identities and talking styles of speakers.

READ FULL TEXT
research
01/31/2023

GeneFace: Generalized and High-Fidelity Audio-Driven 3D Talking Face Synthesis

Generating photo-realistic video portrait with arbitrary speech audio is...
research
03/20/2021

AD-NeRF: Audio Driven Neural Radiance Fields for Talking Head Synthesis

Generating high-fidelity talking head video by fitting with the input au...
research
10/10/2021

Self-Supervised 3D Face Reconstruction via Conditional Estimation

We present a conditional estimation (CEST) framework to learn 3D facial ...
research
04/15/2021

Audio-Driven Emotional Video Portraits

Despite previous success in generating audio-driven talking heads, most ...
research
05/30/2022

EAMM: One-Shot Emotional Talking Face via Audio-Based Emotion-Aware Motion Model

Although significant progress has been made to audio-driven talking face...
research
01/19/2022

Semantic-Aware Implicit Neural Audio-Driven Video Portrait Generation

Animating high-fidelity video portrait with speech audio is crucial for ...
research
09/05/2023

ReliTalk: Relightable Talking Portrait Generation from a Single Video

Recent years have witnessed great progress in creating vivid audio-drive...

Please sign up or login with your details

Forgot password? Click here to reset