The DiffuseStyleGesture+ entry to the GENEA Challenge 2023

08/26/2023
by   Sicheng Yang, et al.
0

In this paper, we introduce the DiffuseStyleGesture+, our solution for the Generation and Evaluation of Non-verbal Behavior for Embodied Agents (GENEA) Challenge 2023, which aims to foster the development of realistic, automated systems for generating conversational gestures. Participants are provided with a pre-processed dataset and their systems are evaluated through crowdsourced scoring. Our proposed model, DiffuseStyleGesture+, leverages a diffusion model to generate gestures automatically. It incorporates a variety of modalities, including audio, text, speaker ID, and seed gestures. These diverse modalities are mapped to a hidden space and processed by a modified diffusion model to produce the corresponding gesture for a given speech input. Upon evaluation, the DiffuseStyleGesture+ demonstrated performance on par with the top-tier models in the challenge, showing no significant differences with those models in human-likeness, appropriateness for the interlocutor, and achieving competitive performance with the best model on appropriateness for agent speech. This indicates that our model is competitive and effective in generating realistic and appropriate gestures for given speech. The code, pre-trained models, and demos are available at https://github.com/YoungSeng/DiffuseStyleGesture/tree/DiffuseStyleGesturePlus/BEAT-TWH-main.

READ FULL TEXT

page 4

page 5

page 7

research
05/08/2023

DiffuseStyleGesture: Stylized Audio-Driven Co-Speech Gesture Generation with Diffusion Models

The art of communication beyond speech there are gestures. The automatic...
research
08/25/2022

The ReprGesture entry to the GENEA Challenge 2022

This paper describes the ReprGesture entry to the Generation and Evaluat...
research
05/25/2023

MPE4G: Multimodal Pretrained Encoder for Co-Speech Gesture Generation

When virtual agents interact with humans, gestures are crucial to delive...
research
05/15/2023

Laughing Matters: Introducing Laughing-Face Generation using Diffusion Models

Speech-driven animation has gained significant traction in recent years,...
research
03/16/2023

Taming Diffusion Models for Audio-Driven Co-Speech Gesture Generation

Animating virtual avatars to make co-speech gestures facilitates various...
research
08/11/2023

Audio is all in one: speech-driven gesture synthetics using WavLM pre-trained model

The generation of co-speech gestures for digital humans is an emerging a...

Please sign up or login with your details

Forgot password? Click here to reset