UDE: A Unified Driving Engine for Human Motion Generation

11/29/2022
by   Zixiang Zhou, et al.
0

Generating controllable and editable human motion sequences is a key challenge in 3D Avatar generation. It has been labor-intensive to generate and animate human motion for a long time until learning-based approaches have been developed and applied recently. However, these approaches are still task-specific or modality-specific<cit.><cit.><cit.><cit.>. In this paper, we propose “UDE", the first unified driving engine that enables generating human motion sequences from natural language or audio sequences (see Fig. <ref>). Specifically, UDE consists of the following key components: 1) a motion quantization module based on VQVAE that represents continuous motion sequence as discrete latent code<cit.>, 2) a modality-agnostic transformer encoder<cit.> that learns to map modality-aware driving signals to a joint space, and 3) a unified token transformer (GPT-like<cit.>) network to predict the quantized latent code index in an auto-regressive manner. 4) a diffusion motion decoder that takes as input the motion tokens and decodes them into motion sequences with high diversity. We evaluate our method on HumanML3D<cit.> and AIST++<cit.> benchmarks, and the experiment results demonstrate our method achieves state-of-the-art performance. Project website: <https://github.com/zixiangzhou916/UDE/>

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset