Moving fast and slow: Analysis of representations and post-processing in speech-driven automatic gesture generation

07/16/2020
by   Taras Kucherenko, et al.
0

This paper presents a novel framework for speech-driven gesture production, applicable to virtual agents to enhance human-computer interaction. Specifically, we extend recent deep-learning-based, data-driven methods for speech-driven gesture generation by incorporating representation learning. Our model takes speech as input and produces gestures as output, in the form of a sequence of 3D coordinates. We provide an analysis of different representations for the input (speech) and the output (motion) of the network by both objective and subjective evaluations. We also analyse the importance of smoothing of the produced motion. Our results indicated that the proposed method improved on our baseline in terms of objective measures. For example, it better captured the motion dynamics and better matched the motion-speed distribution. Moreover, we performed user studies on two different datasets. The studies confirmed that our proposed method is perceived as more natural than the baseline, although the difference in the studies was eliminated by appropriate post-processing: hip-centering and smoothing. We conclude that it is important to take both feature representation, model architecture and post-processing into account when designing an automatic gesture-production method.

READ FULL TEXT

page 7

page 18

page 19

page 20

research
03/08/2019

Analyzing Input and Output Representations for Speech-Driven Gesture Generation

This paper presents a novel framework for automatic speech-driven gestur...
research
01/25/2020

Gesticulator: A framework for semantically-aware speech-driven gesture generation

During speech, people spontaneously gesticulate, which plays a key role ...
research
03/04/2021

It's A Match! Gesture Generation Using Expressive Parameter Matching

Automatic gesture generation from speech generally relies on implicit mo...
research
08/22/2022

The GENEA Challenge 2022: A large evaluation of data-driven co-speech gesture generation

This paper reports on the second GENEA Challenge to benchmark data-drive...
research
07/24/2019

A neural network based post-filter for speech-driven head motion synthesis

Despite the fact that neural networks are widely used for speech-driven ...
research
10/02/2020

Understanding the Predictability of Gesture Parameters from Speech and their Perceptual Importance

Gesture behavior is a natural part of human conversation. Much work has ...
research
10/29/2022

Articulatory Representation Learning Via Joint Factor Analysis and Neural Matrix Factorization

Articulatory representation learning is the fundamental research in mode...

Please sign up or login with your details

Forgot password? Click here to reset