Understanding Text-driven Motion Synthesis with Keyframe Collaboration via Diffusion Models

05/23/2023
by   Dong Wei, et al.
0

The emergence of text-driven motion synthesis technique provides animators with great potential to create efficiently. However, in most cases, textual expressions only contain general and qualitative motion descriptions, while lack fine depiction and sufficient intensity, leading to the synthesized motions that either (a) semantically compliant but uncontrollable over specific pose details, or (b) even deviates from the provided descriptions, bringing animators with undesired cases. In this paper, we propose DiffKFC, a conditional diffusion model for text-driven motion synthesis with keyframes collaborated. Different from plain text-driven designs, full interaction among texts, keyframes and the rest diffused frames are conducted at training, enabling realistic generation under efficient, collaborative dual-level control: coarse guidance at semantic level, with only few keyframes for direct and fine-grained depiction down to body posture level, to satisfy animator requirements without tedious labor. Specifically, we customize efficient Dilated Mask Attention modules, where only partial valid tokens participate in local-to-global attention, indicated by the dilated keyframe mask. For user flexibility, DiffKFC supports adjustment on importance of fine-grained keyframe control. Experimental results show that our model achieves state-of-the-art performance on text-to-motion datasets HumanML3D and KIT.

READ FULL TEXT

page 1

page 8

research
09/02/2023

AttT2M: Text-Driven Human Motion Generation with Multi-Perspective Attention Mechanism

Generating 3D human motion based on textual descriptions has been a rese...
research
09/04/2023

DiverseMotion: Towards Diverse Human Motion Generation via Discrete Diffusion

We present DiverseMotion, a new approach for synthesizing high-quality h...
research
09/12/2023

Fg-T2M: Fine-Grained Text-Driven Human Motion Generation via Diffusion Model

Text-driven human motion generation in computer vision is both significa...
research
05/21/2023

GMD: Controllable Human Motion Synthesis via Guided Diffusion Models

Denoising diffusion models have shown great promise in human motion synt...
research
11/11/2022

HumanDiffusion: a Coarse-to-Fine Alignment Diffusion Framework for Controllable Text-Driven Person Image Generation

Text-driven person image generation is an emerging and challenging task ...
research
05/16/2023

Make-An-Animation: Large-Scale Text-conditional 3D Human Motion Generation

Text-guided human motion generation has drawn significant interest becau...
research
06/01/2023

Make-Your-Video: Customized Video Generation Using Textual and Structural Guidance

Creating a vivid video from the event or scenario in our imagination is ...

Please sign up or login with your details

Forgot password? Click here to reset