A Comprehensive Multi-scale Approach for Speech and Dynamics Synchrony in Talking Head Generation

07/04/2023
by   Louis Airale, et al.
0

Animating still face images with deep generative models using a speech input signal is an active research topic and has seen important recent progress. However, much of the effort has been put into lip syncing and rendering quality while the generation of natural head motion, let alone the audio-visual correlation between head motion and speech, has often been neglected. In this work, we propose a multi-scale audio-visual synchrony loss and a multi-scale autoregressive GAN to better handle short and long-term correlation between speech and the dynamics of the head and lips. In particular, we train a stack of syncer models on multimodal input pyramids and use these models as guidance in a multi-scale generator network to produce audio-aligned motion unfolding over diverse time scales. Our generator operates in the facial landmark domain, which is a standard low-dimensional head representation. The experiments show significant improvements over the state of the art in head motion dynamics quality and in multi-scale audio-visual synchrony both in the landmark domain and in the image domain.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/02/2022

Autoregressive GAN for Semantic Unconditional Head Motion Generation

We address the task of unconditional head motion generation to animate s...
research
01/06/2023

Diffused Heads: Diffusion Models Beat GANs on Talking-Face Generation

Talking face generation has historically struggled to produce head movem...
research
11/22/2022

SadTalker: Learning Realistic 3D Motion Coefficients for Stylized Audio-Driven Single Image Talking Face Animation

Generating talking head videos through a face image and a piece of speec...
research
02/05/2020

Prediction of head motion from speech waveforms with a canonical-correlation-constrained autoencoder

This study investigates the direct use of speech waveforms to predict he...
research
10/06/2022

Audio-Visual Face Reenactment

This work proposes a novel method to generate realistic talking head vid...
research
07/22/2022

Visual Speech-Aware Perceptual 3D Facial Expression Reconstruction from Videos

The recent state of the art on monocular 3D face reconstruction from ima...
research
02/22/2022

Thinking the Fusion Strategy of Multi-reference Face Reenactment

In recent advances of deep generative models, face reenactment -manipula...

Please sign up or login with your details

Forgot password? Click here to reset