MagicAvatar: Multimodal Avatar Generation and Animation

08/28/2023
by   Jianfeng Zhang, et al.
0

This report presents MagicAvatar, a framework for multimodal video generation and animation of human avatars. Unlike most existing methods that generate avatar-centric videos directly from multimodal inputs (e.g., text prompts), MagicAvatar explicitly disentangles avatar video generation into two stages: (1) multimodal-to-motion and (2) motion-to-video generation. The first stage translates the multimodal inputs into motion/ control signals (e.g., human pose, depth, DensePose); while the second stage generates avatar-centric video guided by these motion signals. Additionally, MagicAvatar supports avatar animation by simply providing a few images of the target person. This capability enables the animation of the provided human identity according to the specific motion derived from the first stage. We demonstrate the flexibility of MagicAvatar through various applications, including text-guided and video-guided avatar generation, as well as multimodal avatar animation.

READ FULL TEXT

page 1

page 3

page 4

page 5

research
07/30/2018

Pose Guided Human Video Generation

Due to the emergence of Generative Adversarial Networks, video synthesis...
research
06/19/2023

MotionGPT: Finetuned LLMs are General-Purpose Motion Generators

Generating realistic human motion from given action descriptions has exp...
research
04/27/2023

ChatVideo: A Tracklet-centric Multimodal and Versatile Video Understanding System

Existing deep video models are limited by specific tasks, fixed input-ou...
research
03/04/2022

Show Me What and Tell Me How: Video Synthesis via Multimodal Conditioning

Most methods for conditional video synthesis use a single modality as th...
research
01/17/2021

Narration Generation for Cartoon Videos

Research on text generation from multimodal inputs has largely focused o...
research
11/23/2017

Deep Video Generation, Prediction and Completion of Human Action Sequences

Current deep learning results on video generation are limited while ther...
research
11/23/2022

Tell Me What Happened: Unifying Text-guided Video Completion via Multimodal Masked Video Generation

Generating a video given the first several static frames is challenging ...

Please sign up or login with your details

Forgot password? Click here to reset