DisCo: Disentangled Control for Referring Human Dance Generation in Real World

06/30/2023
by   Tan Wang, et al.
0

Generative AI has made significant strides in computer vision, particularly in image/video synthesis conditioned on text descriptions. Despite the advancements, it remains challenging especially in the generation of human-centric content such as dance synthesis. Existing dance synthesis methods struggle with the gap between synthesized content and real-world dance scenarios. In this paper, we define a new problem setting: Referring Human Dance Generation, which focuses on real-world dance scenarios with three important properties: (i) Faithfulness: the synthesis should retain the appearance of both human subject foreground and background from the reference image, and precisely follow the target pose; (ii) Generalizability: the model should generalize to unseen human subjects, backgrounds, and poses; (iii) Compositionality: it should allow for composition of seen/unseen subjects, backgrounds, and poses from different sources. To address these challenges, we introduce a novel approach, DISCO, which includes a novel model architecture with disentangled control to improve the faithfulness and compositionality of dance synthesis, and an effective human attribute pre-training for better generalizability to unseen humans. Extensive qualitative and quantitative results demonstrate that DISCO can generate high-quality human dance images and videos with diverse appearances and flexible motions. Code, demo, video and visualization are available at: https://disco-dance.github.io/.

READ FULL TEXT

page 2

page 7

page 8

page 11

page 13

page 14

page 15

page 16

research
04/20/2018

Synthesizing Images of Humans in Unseen Poses

We address the computational problem of novel human pose synthesis. Give...
research
04/17/2023

Text2Performer: Text-Driven Human Video Generation

Text-driven content creation has evolved to be a transformative techniqu...
research
10/27/2021

Image Comes Dancing with Collaborative Parsing-Flow Video Synthesis

Transferring human motion from a source to a target person poses great p...
research
07/24/2022

TIPS: Text-Induced Pose Synthesis

In computer vision, human pose synthesis and transfer deal with probabil...
research
10/28/2019

Few-shot Video-to-Video Synthesis

Video-to-video synthesis (vid2vid) aims at converting an input semantic ...
research
04/07/2020

Human Motion Transfer from Poses in the Wild

In this paper, we tackle the problem of human motion transfer, where we ...
research
03/08/2021

Behavior-Driven Synthesis of Human Dynamics

Generating and representing human behavior are of major importance for v...

Please sign up or login with your details

Forgot password? Click here to reset