High-fidelity Generalized Emotional Talking Face Generation with Multi-modal Emotion Space Learning

05/04/2023
by   Chao Xu, et al.
0

Recently, emotional talking face generation has received considerable attention. However, existing methods only adopt one-hot coding, image, or audio as emotion conditions, thus lacking flexible control in practical applications and failing to handle unseen emotion styles due to limited semantics. They either ignore the one-shot setting or the quality of generated faces. In this paper, we propose a more flexible and generalized framework. Specifically, we supplement the emotion style in text prompts and use an Aligned Multi-modal Emotion encoder to embed the text, image, and audio emotion modality into a unified space, which inherits rich semantic prior from CLIP. Consequently, effective multi-modal emotion space learning helps our method support arbitrary emotion modality during testing and could generalize to unseen emotion styles. Besides, an Emotion-aware Audio-to-3DMM Convertor is proposed to connect the emotion condition and the audio sequence to structural representation. A followed style-based High-fidelity Emotional Face generator is designed to generate arbitrary high-resolution realistic identities. Our texture generator hierarchically learns flow fields and animated faces in a residual manner. Extensive experiments demonstrate the flexibility and generalization of our method in emotion control and the effectiveness of high-quality face synthesis.

READ FULL TEXT

page 2

page 3

page 5

page 6

page 7

page 8

research
05/30/2022

EAMM: One-Shot Emotional Talking Face via Audio-Based Emotion-Aware Motion Model

Although significant progress has been made to audio-driven talking face...
research
05/04/2023

Multimodal-driven Talking Face Generation, Face Swapping, Diffusion Model

Multimodal-driven talking face generation refers to animating a portrait...
research
03/17/2023

MMFace4D: A Large-Scale Multi-Modal 4D Face Dataset for Audio-Driven 3D Face Animation

Audio-Driven Face Animation is an eagerly anticipated technique for appl...
research
05/02/2022

Emotion-Controllable Generalized Talking Face Generation

Despite the significant progress in recent years, very few of the AI-bas...
research
03/14/2020

Perception of prosodic variation for speech synthesis using an unsupervised discrete representation of F0

In English, prosody adds a broad range of information to segment sequenc...
research
05/24/2022

M6-Fashion: High-Fidelity Multi-modal Image Generation and Editing

The fashion industry has diverse applications in multi-modal image gener...
research
05/09/2023

StyleSync: High-Fidelity Generalized and Personalized Lip Sync in Style-based Generator

Despite recent advances in syncing lip movements with any audio waves, c...

Please sign up or login with your details

Forgot password? Click here to reset