Neural Voice Puppetry: Audio-driven Facial Reenactment

12/11/2019
by   Justus Thies, et al.
11

We present Neural Voice Puppetry, a novel approach for audio-driven facial video synthesis. Given an audio sequence of a source person or digital assistant, we generate a photo-realistic output video of a target person that is in sync with the audio of the source input. This audio-driven facial reenactment is driven by a deep neural network that employs a latent 3D face model space. Through the underlying 3D representation, the model inherently learns temporal stability while we leverage neural rendering to generate photo-realistic output frames. Our approach generalizes across different people, allowing us to synthesize videos of a target actor with the voice of any unknown source actor or even synthetic voices that can be generated utilizing standard text-to-speech approaches. Neural Voice Puppetry has a variety of use-cases, including audio-driven video avatars, video dubbing, and text-driven video synthesis of a talking head. We demonstrate the capabilities of our method in a series of audio- and text-based puppetry examples. Our method is not only more general than existing works since we are generic to the input person, but we also show superior visual and lip sync quality compared to photo-realistic audio- and video-driven reenactment techniques.

READ FULL TEXT

page 1

page 2

page 4

page 7

page 8

page 9

page 10

research
11/02/2020

Facial Keypoint Sequence Generation from Audio

Whenever we speak, our voice is accompanied by facial movements and expr...
research
01/15/2020

Everybody's Talkin': Let Me Talk as You Want

We present a method to edit a target portrait footage by taking a sequen...
research
06/06/2023

Ada-TTA: Towards Adaptive High-Quality Text-to-Talking Avatar Synthesis

We are interested in a novel task, namely low-resource text-to-talking a...
research
04/29/2021

Text2Video: Text-driven Talking-head Video Synthesis with Phonetic Dictionary

With the advance of deep learning technology, automatic video generation...
research
06/16/2023

Unsupervised Learning of Style-Aware Facial Animation from Real Acting Performances

This paper presents a novel approach for text/speech-driven animation of...
research
09/11/2018

Neural Animation and Reenactment of Human Actor Videos

We propose a method for generating (near) video-realistic animations of ...
research
02/18/2021

AudioVisual Speech Synthesis: A brief literature review

This brief literature review studies the problem of audiovisual speech s...

Please sign up or login with your details

Forgot password? Click here to reset