Freeform Body Motion Generation from Speech

03/04/2022
by   Jing Xu, et al.
0

People naturally conduct spontaneous body motions to enhance their speeches while giving talks. Body motion generation from speech is inherently difficult due to the non-deterministic mapping from speech to body motions. Most existing works map speech to motion in a deterministic way by conditioning on certain styles, leading to sub-optimal results. Motivated by studies in linguistics, we decompose the co-speech motion into two complementary parts: pose modes and rhythmic dynamics. Accordingly, we introduce a novel freeform motion generation model (FreeMo) by equipping a two-stream architecture, i.e., a pose mode branch for primary posture generation, and a rhythmic motion branch for rhythmic dynamics synthesis. On one hand, diverse pose modes are generated by conditional sampling in a latent space, guided by speech semantics. On the other hand, rhythmic dynamics are synced with the speech prosody. Extensive experiments demonstrate the superior performance against several baselines, in terms of motion diversity, quality and syncing with speech. Code and pre-trained models will be publicly available through https://github.com/TheTempAccount/Co-Speech-Motion-Generation.

READ FULL TEXT

page 5

page 8

research
12/08/2022

Generating Holistic 3D Human Motion from Speech

This work addresses the problem of generating 3D holistic body motions f...
research
05/18/2023

QPGesture: Quantization-Based and Phase-Guided Motion Matching for Natural Speech-Driven Gesture Generation

Speech-driven gesture generation is highly challenging due to the random...
research
07/15/2022

Diverse Human Motion Prediction via Gumbel-Softmax Sampling from an Auxiliary Space

Diverse human motion prediction aims at predicting multiple possible fut...
research
06/13/2023

Pose-aware Attention Network for Flexible Motion Retargeting by Body Part

Motion retargeting is a fundamental problem in computer graphics and com...
research
08/11/2023

Semantics2Hands: Transferring Hand Motion Semantics between Avatars

Human hands, the primary means of non-verbal communication, convey intri...
research
08/15/2021

Audio2Gestures: Generating Diverse Gestures from Speech Audio with Conditional Variational Autoencoders

Generating conversational gestures from speech audio is challenging due ...
research
08/29/2023

Let There Be Sound: Reconstructing High Quality Speech from Silent Videos

The goal of this work is to reconstruct high quality speech from lip mot...

Please sign up or login with your details

Forgot password? Click here to reset