Taming Diffusion Models for Audio-Driven Co-Speech Gesture Generation

03/16/2023
by   Lingting Zhu, et al.
0

Animating virtual avatars to make co-speech gestures facilitates various applications in human-machine interaction. The existing methods mainly rely on generative adversarial networks (GANs), which typically suffer from notorious mode collapse and unstable training, thus making it difficult to learn accurate audio-gesture joint distributions. In this work, we propose a novel diffusion-based framework, named Diffusion Co-Speech Gesture (DiffGesture), to effectively capture the cross-modal audio-to-gesture associations and preserve temporal coherence for high-fidelity audio-driven co-speech gesture generation. Specifically, we first establish the diffusion-conditional generation process on clips of skeleton sequences and audio to enable the whole framework. Then, a novel Diffusion Audio-Gesture Transformer is devised to better attend to the information from multiple modalities and model the long-term temporal dependency. Moreover, to eliminate temporal inconsistency, we propose an effective Diffusion Gesture Stabilizer with an annealed noise sampling strategy. Benefiting from the architectural advantages of diffusion models, we further incorporate implicit classifier-free guidance to trade off between diversity and gesture quality. Extensive experiments demonstrate that DiffGesture achieves state-of-theart performance, which renders coherent gestures with better mode coverage and stronger audio correlations. Code is available at https://github.com/Advocate99/DiffGesture.

READ FULL TEXT
research
05/08/2023

DiffuseStyleGesture: Stylized Audio-Driven Co-Speech Gesture Generation with Diffusion Models

The art of communication beyond speech there are gestures. The automatic...
research
08/29/2023

C2G2: Controllable Co-speech Gesture Generation with Latent Diffusion Model

Co-speech gesture generation is crucial for automatic digital avatar ani...
research
11/17/2022

Listen, denoise, action! Audio-driven motion synthesis with diffusion models

Diffusion models have experienced a surge of interest as highly expressi...
research
09/13/2023

UnifiedGesture: A Unified Gesture Synthesis Model for Multiple Skeletons

The automatic co-speech gesture generation draws much attention in compu...
research
06/20/2023

EMoG: Synthesizing Emotive Co-speech 3D Gesture with Diffusion Model

Although previous co-speech gesture generation methods are able to synth...
research
08/26/2023

The DiffuseStyleGesture+ entry to the GENEA Challenge 2023

In this paper, we introduce the DiffuseStyleGesture+, our solution for t...
research
08/25/2022

The ReprGesture entry to the GENEA Challenge 2022

This paper describes the ReprGesture entry to the Generation and Evaluat...

Please sign up or login with your details

Forgot password? Click here to reset