AttT2M: Text-Driven Human Motion Generation with Multi-Perspective Attention Mechanism

09/02/2023
by   Chongyang Zhong, et al.
0

Generating 3D human motion based on textual descriptions has been a research focus in recent years. It requires the generated motion to be diverse, natural, and conform to the textual description. Due to the complex spatio-temporal nature of human motion and the difficulty in learning the cross-modal relationship between text and motion, text-driven motion generation is still a challenging problem. To address these issues, we propose AttT2M, a two-stage method with multi-perspective attention mechanism: body-part attention and global-local motion-text attention. The former focuses on the motion embedding perspective, which means introducing a body-part spatio-temporal encoder into VQ-VAE to learn a more expressive discrete latent space. The latter is from the cross-modal perspective, which is used to learn the sentence-level and word-level motion-text cross-modal relationship. The text-driven motion is finally generated with a generative transformer. Extensive experiments conducted on HumanML3D and KIT-ML demonstrate that our method outperforms the current state-of-the-art works in terms of qualitative and quantitative evaluation, and achieve fine-grained synthesis and action2motion. Our code is in https://github.com/ZcyMonkey/AttT2M

READ FULL TEXT

page 1

page 3

page 8

page 9

research
08/11/2022

ARMANI: Part-level Garment-Text Alignment for Unified Cross-Modal Fashion Design

Cross-modal fashion image synthesis has emerged as one of the most promi...
research
05/23/2023

Understanding Text-driven Motion Synthesis with Keyframe Collaboration via Diffusion Models

The emergence of text-driven motion synthesis technique provides animato...
research
05/02/2023

TMR: Text-to-Motion Retrieval Using Contrastive 3D Human Motion Synthesis

In this paper, we present TMR, a simple yet effective approach for text ...
research
05/25/2023

Text-to-Motion Retrieval: Towards Joint Understanding of Human Motion Data and Natural Language

Due to recent advances in pose-estimation methods, human motion can be e...
research
09/12/2023

Fg-T2M: Fine-Grained Text-Driven Human Motion Generation via Diffusion Model

Text-driven human motion generation in computer vision is both significa...
research
05/23/2023

Text-guided 3D Human Generation from 2D Collections

3D human modeling has been widely used for engaging interaction in gamin...
research
07/08/2022

Music-driven Dance Regeneration with Controllable Key Pose Constraints

In this paper, we propose a novel framework for music-driven dance motio...

Please sign up or login with your details

Forgot password? Click here to reset