End-to-End Speech-Driven Facial Animation with Temporal GANs

05/23/2018
by   Konstantinos Vougioukas, et al.
0

Speech-driven facial animation is the process which uses speech signals to automatically synthesize a talking character. The majority of work in this domain creates a mapping from audio features to visual features. This often requires post-processing using computer graphics techniques to produce realistic albeit subject dependent results. We present a system for generating videos of a talking head, using a still image of a person and an audio clip containing speech, that doesn't rely on any handcrafted intermediate features. To the best of our knowledge, this is the first method capable of generating subject independent realistic videos directly from raw audio. Our method can generate videos which have (a) lip movements that are in sync with the audio and (b) natural facial expressions such as blinks and eyebrow movements. We achieve this by using a temporal GAN with 2 discriminators, which are capable of capturing different aspects of the video. The effect of each component in our system is quantified through an ablation study. The generated videos are evaluated based on their sharpness, reconstruction quality, and lip-reading accuracy. Finally, a user study is conducted, confirming that temporal GANs lead to more natural sequences than a static GAN-based approach.

READ FULL TEXT

page 1

page 4

page 5

page 7

page 8

page 9

research
06/14/2019

Realistic Speech-Driven Facial Animation with GANs

Speech-driven facial animation is the process that automatically synthes...
research
02/20/2020

Photorealistic Lip Sync with Adversarial Temporal Convolutional Networks

Lip sync has emerged as a promising technique to generate mouth movement...
research
06/14/2019

Video-Driven Speech Reconstruction using Generative Adversarial Networks

Speech is a means of communication which relies on both audio and visual...
research
12/12/2019

Speech-driven facial animation using polynomial fusion of features

Speech-driven facial animation involves using a speech signal to generat...
research
01/15/2023

Learning Audio-Driven Viseme Dynamics for 3D Face Animation

We present a novel audio-driven facial animation approach that can gener...
research
02/12/2019

Puppet Dubbing

Dubbing puppet videos to make the characters (e.g. Kermit the Frog) conv...
research
09/29/2022

Facial Landmark Predictions with Applications to Metaverse

This research aims to make metaverse characters more realistic by adding...

Please sign up or login with your details

Forgot password? Click here to reset