Speech Gesture Generation from the Trimodal Context of Text, Audio, and Speaker Identity

09/04/2020
by   Youngwoo Yoon, et al.
10

For human-like agents, including virtual avatars and social robots, making proper gestures while speaking is crucial in human–agent interaction. Co-speech gestures enhance interaction experiences and make the agents look alive. However, it is difficult to generate human-like gestures due to the lack of understanding of how people gesture. Data-driven approaches attempt to learn gesticulation skills from human demonstrations, but the ambiguous and individual nature of gestures hinders learning. In this paper, we present an automatic gesture generation model that uses the multimodal context of speech text, audio, and speaker identity to reliably generate gestures. By incorporating a multimodal context and an adversarial training scheme, the proposed model outputs gestures that are human-like and that match with speech content and rhythm. We also introduce a new quantitative evaluation metric for gesture generation models. Experiments with the introduced metric and subjective human evaluation showed that the proposed gesture generation model is better than existing end-to-end generation models. We further confirm that our model is able to work with synthesized audio in a scenario where contexts are constrained, and show that different gesture styles can be generated for the same speech by specifying different speaker identities in the style embedding space that is learned from videos of various speakers. All the code and data is available at https://github.com/ai4r/Gesture-Generation-from-Trimodal-Context.

READ FULL TEXT

page 1

page 10

page 13

research
08/05/2022

Real-time Gesture Animation Generation from Speech for Virtual Human Interaction

We propose a real-time system for synthesizing gestures directly from sp...
research
10/30/2018

Robots Learn Social Skills: End-to-End Learning of Co-Speech Gesture Generation for Humanoid Robots

Co-speech gestures enhance interaction experiences between humans as wel...
research
08/10/2021

SGToolkit: An Interactive Gesture Authoring Toolkit for Embodied Conversational Agents

Non-verbal behavior is essential for embodied agents like social robots,...
research
03/16/2020

A Formal Analysis of Multimodal Referring Strategies Under Common Ground

In this paper, we present an analysis of computationally generated mixed...
research
07/24/2020

Style Transfer for Co-Speech Gesture Animation: A Multi-Speaker Conditional-Mixture Approach

How can we teach robots or virtual assistants to gesture naturally? Can ...
research
08/25/2022

The ReprGesture entry to the GENEA Challenge 2022

This paper describes the ReprGesture entry to the Generation and Evaluat...
research
02/24/2021

A Framework for Integrating Gesture Generation Models into Interactive Conversational Agents

Embodied conversational agents (ECAs) benefit from non-verbal behavior f...

Please sign up or login with your details

Forgot password? Click here to reset