MotionGPT: Finetuned LLMs are General-Purpose Motion Generators

06/19/2023
by   Yaqi Zhang, et al.
0

Generating realistic human motion from given action descriptions has experienced significant advancements because of the emerging requirement of digital humans. While recent works have achieved impressive results in generating motion directly from textual action descriptions, they often support only a single modality of the control signal, which limits their application in the real digital human industry. This paper presents a Motion General-Purpose generaTor (MotionGPT) that can use multimodal control signals, e.g., text and single-frame poses, for generating consecutive human motions by treating multimodal signals as special input tokens in large language models (LLMs). Specifically, we first quantize multimodal control signals into discrete codes and then formulate them in a unified prompt instruction to ask the LLMs to generate the motion answer. Our MotionGPT demonstrates a unified human motion generation model with multimodal control signals by tuning a mere 0.4 parameters. To the best of our knowledge, MotionGPT is the first method to generate human motion by multimodal control signals, which we hope can shed light on this new direction. Codes shall be released upon acceptance.

READ FULL TEXT

page 4

page 7

page 13

page 15

page 16

page 17

page 18

research
08/28/2023

MagicAvatar: Multimodal Avatar Generation and Animation

This report presents MagicAvatar, a framework for multimodal video gener...
research
05/16/2023

AMD: Autoregressive Motion Diffusion

Human motion generation aims to produce plausible human motion sequences...
research
06/26/2023

MotionGPT: Human Motion as a Foreign Language

Though the advancement of pre-trained large language models unfolds, the...
research
09/18/2023

Multimodal Foundation Models: From Specialists to General-Purpose Assistants

This paper presents a comprehensive survey of the taxonomy and evolution...
research
08/28/2023

Priority-Centric Human Motion Generation in Discrete Latent Space

Text-to-motion generation is a formidable task, aiming to produce human ...
research
11/27/2022

Unified Discrete Diffusion for Simultaneous Vision-Language Generation

The recently developed discrete diffusion models perform extraordinarily...
research
12/06/2022

Pretrained Diffusion Models for Unified Human Motion Synthesis

Generative modeling of human motion has broad applications in computer a...

Please sign up or login with your details

Forgot password? Click here to reset