Multimodal-driven Talking Face Generation, Face Swapping, Diffusion Model

05/04/2023
by   Chao Xu, et al.
0

Multimodal-driven talking face generation refers to animating a portrait with the given pose, expression, and gaze transferred from the driving image and video, or estimated from the text and audio. However, existing methods ignore the potential of text modal, and their generators mainly follow the source-oriented feature rearrange paradigm coupled with unstable GAN frameworks. In this work, we first represent the emotion in the text prompt, which could inherit rich semantics from the CLIP, allowing flexible and generalized emotion control. We further reorganize these tasks as the target-oriented texture transfer and adopt the Diffusion Models. More specifically, given a textured face as the source and the rendered face projected from the desired 3DMM coefficients as the target, our proposed Texture-Geometry-aware Diffusion Model decomposes the complex transfer problem into multi-conditional denoising process, where a Texture Attention-based module accurately models the correspondences between appearance and geometry cues contained in source and target conditions, and incorporate extra implicit information for high-fidelity talking face generation. Additionally, TGDM can be gracefully tailored for face swapping. We derive a novel paradigm free of unstable seesaw-style optimization, resulting in simple, stable, and effective training and inference schemes. Extensive experiments demonstrate the superiority of our method.

READ FULL TEXT

page 5

page 6

page 7

page 8

page 9

page 10

page 11

research
05/04/2023

High-fidelity Generalized Emotional Talking Face Generation with Multi-modal Emotion Space Learning

Recently, emotional talking face generation has received considerable at...
research
11/22/2022

Person Image Synthesis via Denoising Diffusion Model

The pose-guided person image generation task requires synthesizing photo...
research
01/10/2023

DiffTalk: Crafting Diffusion Models for Generalized Talking Head Synthesis

Talking head synthesis is a promising approach for the video production ...
research
04/20/2023

Collaborative Diffusion for Multi-Modal Face Generation and Editing

Diffusion models arise as a powerful generative tool recently. Despite t...
research
05/30/2023

HiFA: High-fidelity Text-to-3D with Advanced Diffusion Guidance

Automatic text-to-3D synthesis has achieved remarkable advancements thro...
research
04/06/2023

Face Animation with an Attribute-Guided Diffusion Model

Face animation has achieved much progress in computer vision. However, p...
research
08/11/2020

Audio- and Gaze-driven Facial Animation of Codec Avatars

Codec Avatars are a recent class of learned, photorealistic face models ...

Please sign up or login with your details

Forgot password? Click here to reset