Imitating Arbitrary Talking Style for Realistic Audio-DrivenTalking Face Synthesis

10/30/2021
by   Haozhe Wu, et al.
0

People talk with diversified styles. For one piece of speech, different talking styles exhibit significant differences in the facial and head pose movements. For example, the "excited" style usually talks with the mouth wide open, while the "solemn" style is more standardized and seldomly exhibits exaggerated motions. Due to such huge differences between different styles, it is necessary to incorporate the talking style into audio-driven talking face synthesis framework. In this paper, we propose to inject style into the talking face synthesis framework through imitating arbitrary talking style of the particular reference video. Specifically, we systematically investigate talking styles with our collected Ted-HD dataset and construct style codes as several statistics of 3D morphable model (3DMM) parameters. Afterwards, we devise a latent-style-fusion (LSF) model to synthesize stylized talking faces by imitating talking styles from the style codes. We emphasize the following novel characteristics of our framework: (1) It doesn't require any annotation of the style, the talking style is learned in an unsupervised manner from talking videos in the wild. (2) It can imitate arbitrary styles from arbitrary videos, and the style codes can also be interpolated to generate new styles. Extensive experiments demonstrate that the proposed framework has the ability to synthesize more natural and expressive talking styles compared with baseline methods.

READ FULL TEXT

page 1

page 2

page 7

page 8

research
10/22/2021

BlendGAN: Implicitly GAN Blending for Arbitrary Stylized Face Generation

Generative Adversarial Networks (GANs) have made a dramatic leap in high...
research
07/18/2023

FACTS: Facial Animation Creation using the Transfer of Styles

The ability to accurately capture and express emotions is a critical asp...
research
04/30/2021

Dance Generation with Style Embedding: Learning and Transferring Latent Representations of Dance Styles

Choreography refers to creation of dance steps and motions for dances ac...
research
05/08/2019

Capture, Learning, and Synthesis of 3D Speaking Styles

Audio-driven 3D facial animation has been widely explored, but achieving...
research
05/30/2023

AlteredAvatar: Stylizing Dynamic 3D Avatars with Fast Style Adaptation

This paper presents a method that can quickly adapt dynamic 3D avatars t...
research
06/20/2023

Audio-Driven 3D Facial Animation from In-the-Wild Videos

Given an arbitrary audio clip, audio-driven 3D facial animation aims to ...
research
07/02/2021

MSN: Multi-Style Network for Trajectory Prediction

It is essential but challenging to predict future trajectories of variou...

Please sign up or login with your details

Forgot password? Click here to reset