DiffTalker: Co-driven audio-image diffusion for talking faces via intermediate landmarks

09/14/2023
by   Zipeng Qi, et al.
0

Generating realistic talking faces is a complex and widely discussed task with numerous applications. In this paper, we present DiffTalker, a novel model designed to generate lifelike talking faces through audio and landmark co-driving. DiffTalker addresses the challenges associated with directly applying diffusion models to audio control, which are traditionally trained on text-image pairs. DiffTalker consists of two agent networks: a transformer-based landmarks completion network for geometric accuracy and a diffusion-based face generation network for texture details. Landmarks play a pivotal role in establishing a seamless connection between the audio and image domains, facilitating the incorporation of knowledge from pre-trained diffusion models. This innovative approach efficiently produces articulate-speaking faces. Experimental results showcase DiffTalker's superior performance in producing clear and geometrically accurate talking faces, all without the need for additional alignment between audio and image features.

READ FULL TEXT

page 3

page 4

research
03/20/2018

Speech-Driven Facial Reenactment Using Conditional Generative Adversarial Networks

We present a novel approach to generating photo-realistic images of a fa...
research
04/18/2023

Audio-Driven Talking Face Generation with Diverse yet Realistic Facial Animations

Audio-driven talking face generation, which aims to synthesize talking f...
research
04/30/2020

APB2Face: Audio-guided face reenactment with auxiliary pose and blink signals

Audio-guided face reenactment aims at generating photorealistic faces us...
research
05/22/2023

AudioToken: Adaptation of Text-Conditioned Diffusion Models for Audio-to-Image Generation

In recent years, image generation has shown a great leap in performance,...
research
01/10/2023

Speech Driven Video Editing via an Audio-Conditioned Diffusion Model

In this paper we propose a method for end-to-end speech driven video edi...
research
07/02/2019

Landmark Assisted CycleGAN for Cartoon Face Generation

In this paper, we are interested in generating an cartoon face of a pers...
research
09/28/2016

A Simple, Fast and Highly-Accurate Algorithm to Recover 3D Shape from 2D Landmarks on a Single Image

Three-dimensional shape reconstruction of 2D landmark points on a single...

Please sign up or login with your details

Forgot password? Click here to reset