Generating coherent spontaneous speech and gesture from text

01/14/2021
by   Simon Alexanderson, et al.
0

Embodied human communication encompasses both verbal (speech) and non-verbal information (e.g., gesture and head movements). Recent advances in machine learning have substantially improved the technologies for generating synthetic versions of both of these types of data: On the speech side, text-to-speech systems are now able to generate highly convincing, spontaneous-sounding speech using unscripted speech audio as the source material. On the motion side, probabilistic motion-generation methods can now synthesise vivid and lifelike speech-driven 3D gesticulation. In this paper, we put these two state-of-the-art technologies together in a coherent fashion for the first time. Concretely, we demonstrate a proof-of-concept system trained on a single-speaker audio and motion-capture dataset, that is able to generate both speech and full-body gestures together from text input. In contrast to previous approaches for joint speech-and-gesture generation, we generate full-body gestures from speech synthesis trained on recordings of spontaneous speech from the same person as the motion-capture data. We illustrate our results by visualising gesture spaces and text-speech-gesture alignments, and through a demonstration video at https://simonalexanderson.github.io/IVA2020 .

READ FULL TEXT
research
12/05/2022

Audio-Driven Co-Speech Gesture Video Generation

Co-speech gesture is crucial for human-machine interaction and digital e...
research
05/18/2023

QPGesture: Quantization-Based and Phase-Guided Motion Matching for Natural Speech-Driven Gesture Generation

Speech-driven gesture generation is highly challenging due to the random...
research
08/12/2021

Multimodal analysis of the predictability of hand-gesture properties

Embodied conversational agents benefit from being able to accompany thei...
research
03/24/2022

Learning Hierarchical Cross-Modal Association for Co-Speech Gesture Generation

Generating speech-consistent body and gesture movements is a long-standi...
research
08/18/2021

Speech Drives Templates: Co-Speech Gesture Synthesis with Learned Templates

Co-speech gesture generation is to synthesize a gesture sequence that no...
research
08/25/2021

Integrated Speech and Gesture Synthesis

Text-to-speech and co-speech gesture synthesis have until now been treat...
research
01/13/2023

A Comprehensive Review of Data-Driven Co-Speech Gesture Generation

Gestures that accompany speech are an essential part of natural and effi...

Please sign up or login with your details

Forgot password? Click here to reset