Action-GPT: Leveraging Large-scale Language Models for Improved and Generalized Action Generation

11/28/2022
by   Sai Shashank Kalakonda, et al.
0

We introduce Action-GPT, a plug-and-play framework for incorporating Large Language Models (LLMs) into text-based action generation models. Action phrases in current motion capture datasets contain minimal and to-the-point information. By carefully crafting prompts for LLMs, we generate richer and fine-grained descriptions of the action. We show that utilizing these detailed descriptions instead of the original action phrases leads to better alignment of text and motion spaces. We introduce a generic approach compatible with stochastic (e.g. VAE-based) and deterministic (e.g. MotionCLIP) text-to-motion models. In addition, the approach enables multiple text descriptions to be utilized. Our experiments show (i) noticeable qualitative and quantitative improvement in the quality of synthesized motions, (ii) benefits of utilizing multiple LLM-generated descriptions, (iii) suitability of the prompt function, and (iv) zero-shot generation capabilities of the proposed approach. Project page: https://actiongpt.github.io

READ FULL TEXT

page 1

page 3

research
07/04/2023

Garbage in, garbage out: Zero-shot detection of crime using Large Language Models

This paper proposes exploiting the common sense knowledge learned by lar...
research
03/30/2022

Neural Pipeline for Zero-Shot Data-to-Text Generation

In data-to-text (D2T) generation, training on in-domain data leads to ov...
research
07/13/2023

InternVid: A Large-scale Video-Text Dataset for Multimodal Understanding and Generation

This paper introduces InternVid, a large-scale video-centric multimodal ...
research
09/04/2023

DiverseMotion: Towards Diverse Human Motion Generation via Discrete Diffusion

We present DiverseMotion, a new approach for synthesizing high-quality h...
research
01/27/2021

Syntactically Guided Generative Embeddings for Zero-Shot Skeleton Action Recognition

We introduce SynSE, a novel syntactically guided generative approach for...
research
09/08/2016

Learning Action Concept Trees and Semantic Alignment Networks from Image-Description Data

Action classification in still images has been a popular research topic ...
research
07/04/2022

TM2T: Stochastic and Tokenized Modeling for the Reciprocal Generation of 3D Human Motions and Texts

Inspired by the strong ties between vision and language, the two intimat...

Please sign up or login with your details

Forgot password? Click here to reset