Rhythmic Gesticulator: Rhythm-Aware Co-Speech Gesture Synthesis with Hierarchical Neural Embeddings

10/04/2022
by   Tenglong Ao, et al.
0

Automatic synthesis of realistic co-speech gestures is an increasingly important yet challenging task in artificial embodied agent creation. Previous systems mainly focus on generating gestures in an end-to-end manner, which leads to difficulties in mining the clear rhythm and semantics due to the complex yet subtle harmony between speech and gestures. We present a novel co-speech gesture synthesis method that achieves convincing results both on the rhythm and semantics. For the rhythm, our system contains a robust rhythm-based segmentation pipeline to ensure the temporal coherence between the vocalization and gestures explicitly. For the gesture semantics, we devise a mechanism to effectively disentangle both low- and high-level neural embeddings of speech and motion based on linguistic theory. The high-level embedding corresponds to semantics, while the low-level embedding relates to subtle variations. Lastly, we build correspondence between the hierarchical embeddings of the speech and the motion, resulting in rhythm- and semantics-aware gesture synthesis. Evaluations with existing objective metrics, a newly proposed rhythmic metric, and human feedback show that our method outperforms state-of-the-art systems by a clear margin.

READ FULL TEXT

page 1

page 4

page 11

page 13

research
05/02/2023

AQ-GT: a Temporally Aligned and Quantized GRU-Transformer for Co-Speech Gesture Synthesis

The generation of realistic and contextually relevant co-speech gestures...
research
03/24/2022

Learning Hierarchical Cross-Modal Association for Co-Speech Gesture Generation

Generating speech-consistent body and gesture movements is a long-standi...
research
09/13/2023

UnifiedGesture: A Unified Gesture Synthesis Model for Multiple Skeletons

The automatic co-speech gesture generation draws much attention in compu...
research
04/20/2022

Exploration strategies for articulatory synthesis of complex syllable onsets

High-quality articulatory speech synthesis has many potential applicatio...
research
07/13/2023

Augmented Co-Speech Gesture Generation: Including Form and Meaning Features to Guide Learning-Based Gesture Synthesis

Due to their significance in human communication, the automatic generati...
research
03/26/2023

GestureDiffuCLIP: Gesture Diffusion Model with CLIP Latents

The automatic generation of stylized co-speech gestures has recently rec...
research
08/25/2021

Integrated Speech and Gesture Synthesis

Text-to-speech and co-speech gesture synthesis have until now been treat...

Please sign up or login with your details

Forgot password? Click here to reset