AQ-GT: a Temporally Aligned and Quantized GRU-Transformer for Co-Speech Gesture Synthesis

05/02/2023
by   Hendric Voß, et al.
0

The generation of realistic and contextually relevant co-speech gestures is a challenging yet increasingly important task in the creation of multimodal artificial agents. Prior methods focused on learning a direct correspondence between co-speech gesture representations and produced motions, which created seemingly natural but often unconvincing gestures during human assessment. We present an approach to pre-train partial gesture sequences using a generative adversarial network with a quantization pipeline. The resulting codebook vectors serve as both input and output in our framework, forming the basis for the generation and reconstruction of gestures. By learning the mapping of a latent space representation as opposed to directly mapping it to a vector representation, this framework facilitates the generation of highly realistic and expressive gestures that closely replicate human movement and behavior, while simultaneously avoiding artifacts in the generation process. We evaluate our approach by comparing it with established methods for generating co-speech gestures as well as with existing datasets of human behavior. We also perform an ablation study to assess our findings. The results show that our approach outperforms the current state of the art by a clear margin and is partially indistinguishable from human gesturing. We make our data pipeline and the generation framework publicly available.

READ FULL TEXT

page 5

page 7

research
07/13/2023

Augmented Co-Speech Gesture Generation: Including Form and Meaning Features to Guide Learning-Based Gesture Synthesis

Due to their significance in human communication, the automatic generati...
research
10/04/2022

Rhythmic Gesticulator: Rhythm-Aware Co-Speech Gesture Synthesis with Hierarchical Neural Embeddings

Automatic synthesis of realistic co-speech gestures is an increasingly i...
research
09/04/2019

Learning to gesticulate by observation using a deep generative approach

The goal of the system presented in this paper is to develop a natural t...
research
03/24/2022

Learning Hierarchical Cross-Modal Association for Co-Speech Gesture Generation

Generating speech-consistent body and gesture movements is a long-standi...
research
07/31/2021

Speech2AffectiveGestures: Synthesizing Co-Speech Gestures with Generative Adversarial Affective Expression Learning

We present a generative adversarial network to synthesize 3D pose sequen...
research
08/21/2023

Co-Speech Gesture Detection through Multi-phase Sequence Labeling

Gestures are integral components of face-to-face communication. They unf...
research
10/22/2020

Quantitative analysis of robot gesticulation behavior

Social robot capabilities, such as talking gestures, are best produced u...

Please sign up or login with your details

Forgot password? Click here to reset