LipSync3D: Data-Efficient Learning of Personalized 3D Talking Faces from Video using Pose and Lighting Normalization

06/08/2021
by   Avisek Lahiri, et al.
0

In this paper, we present a video-based learning framework for animating personalized 3D talking faces from audio. We introduce two training-time data normalizations that significantly improve data sample efficiency. First, we isolate and represent faces in a normalized space that decouples 3D geometry, head pose, and texture. This decomposes the prediction problem into regressions over the 3D face shape and the corresponding 2D texture atlas. Second, we leverage facial symmetry and approximate albedo constancy of skin to isolate and remove spatio-temporal lighting variations. Together, these normalizations allow simple networks to generate high fidelity lip-sync videos under novel ambient illumination while training with just a single speaker-specific video. Further, to stabilize temporal dynamics, we introduce an auto-regressive approach that conditions the model on its previous visual state. Human ratings and objective metrics demonstrate that our method outperforms contemporary state-of-the-art audio-driven video reenactment benchmarks in terms of realism, lip-sync and visual quality scores. We illustrate several applications enabled by our framework.

READ FULL TEXT

page 3

page 4

page 6

page 7

page 12

page 13

page 16

research
02/24/2020

Audio-driven Talking Face Video Generation with Learning-based Personalized Head Pose

Real-world talking faces often accompany with natural head movement. How...
research
07/19/2023

MODA: Mapping-Once Audio-driven Portrait Animation with Dual Attentions

Audio-driven portrait animation aims to synthesize portrait videos that ...
research
04/06/2023

That's What I Said: Fully-Controllable Talking Face Generation

The goal of this paper is to synthesise talking faces with controllable ...
research
08/16/2019

FSGAN: Subject Agnostic Face Swapping and Reenactment

We present Face Swapping GAN (FSGAN) for face swapping and reenactment. ...
research
11/27/2022

VideoReTalking: Audio-based Lip Synchronization for Talking Head Video Editing In the Wild

We present VideoReTalking, a new system to edit the faces of a real-worl...
research
11/12/2022

Unsupervised Anomaly Appraisal of Cleft Faces Using a StyleGAN2-based Model Adaptation Technique

This paper presents a novel machine learning framework to consistently d...
research
02/25/2022

FSGANv2: Improved Subject Agnostic Face Swapping and Reenactment

We present Face Swapping GAN (FSGAN) for face swapping and reenactment. ...

Please sign up or login with your details

Forgot password? Click here to reset