DeepAI AI Chat
Log In Sign Up

HeadGAN: Video-and-Audio-Driven Talking Head Synthesis

by   Michail Christos Doukas, et al.

Recent attempts to solve the problem of talking head synthesis using a single reference image have shown promising results. However, most of them fail to meet the identity preservation problem, or perform poorly in terms of photo-realism, especially in extreme head poses. We propose HeadGAN, a novel reenactment approach that conditions synthesis on 3D face representations, which can be extracted from any driving video and adapted to the facial geometry of any source. We improve the plausibility of mouth movements, by utilising audio features as a complementary input to the Generator. Quantitative and qualitative experiments demonstrate the merits of our approach.


page 1

page 3

page 4

page 6

page 7

page 8


Learning Dynamic Facial Radiance Fields for Few-Shot Talking Head Synthesis

Talking head synthesis is an emerging technology with wide applications ...

Diffused Heads: Diffusion Models Beat GANs on Talking-Face Generation

Talking face generation has historically struggled to produce head movem...

Neural Voice Puppetry: Audio-driven Facial Reenactment

We present Neural Voice Puppetry, a novel approach for audio-driven faci...

DiffTalk: Crafting Diffusion Models for Generalized Talking Head Synthesis

Talking head synthesis is a promising approach for the video production ...

Learned Spatial Representations for Few-shot Talking-Head Synthesis

We propose a novel approach for few-shot talking-head synthesis. While r...

A Keypoint Based Enhancement Method for Audio Driven Free View Talking Head Synthesis

Audio driven talking head synthesis is a challenging task that attracts ...

Audio-Visual Face Reenactment

This work proposes a novel method to generate realistic talking head vid...