DragNUWA: Fine-grained Control in Video Generation by Integrating Text, Image, and Trajectory

08/16/2023
by   Shengming Yin, et al.
0

Controllable video generation has gained significant attention in recent years. However, two main limitations persist: Firstly, most existing works focus on either text, image, or trajectory-based control, leading to an inability to achieve fine-grained control in videos. Secondly, trajectory control research is still in its early stages, with most experiments being conducted on simple datasets like Human3.6M. This constraint limits the models' capability to process open-domain images and effectively handle complex curved trajectories. In this paper, we propose DragNUWA, an open-domain diffusion-based video generation model. To tackle the issue of insufficient control granularity in existing works, we simultaneously introduce text, image, and trajectory information to provide fine-grained control over video content from semantic, spatial, and temporal perspectives. To resolve the problem of limited open-domain trajectory control in current research, We propose trajectory modeling with three aspects: a Trajectory Sampler (TS) to enable open-domain control of arbitrary trajectories, a Multiscale Fusion (MF) to control trajectories in different granularities, and an Adaptive Training (AT) strategy to generate consistent videos following trajectories. Our experiments validate the effectiveness of DragNUWA, demonstrating its superior performance in fine-grained control in video generation. The homepage link is <https://www.microsoft.com/en-us/research/project/dragnuwa/>

READ FULL TEXT

page 1

page 2

page 5

page 9

page 10

page 11

research
05/23/2023

Control-A-Video: Controllable Text-to-Video Generation with Diffusion Models

This paper presents a controllable text-to-video (T2V) diffusion model, ...
research
01/04/2021

Outline to Story: Fine-grained Controllable Story Generation from Cascaded Events

Large-scale pretrained language models have shown thrilling generation c...
research
02/06/2023

Structure and Content-Guided Video Synthesis with Diffusion Models

Text-guided generative diffusion models unlock powerful image creation a...
research
03/29/2023

The secret of immersion: actor driven camera movement generation for auto-cinematography

Immersion plays a vital role when designing cinematic creations, yet the...
research
11/30/2020

DUT: Learning Video Stabilization by Simply Watching Unstable Videos

We propose a Deep Unsupervised Trajectory-based stabilization framework ...
research
04/30/2021

GODIVA: Generating Open-DomaIn Videos from nAtural Descriptions

Generating videos from text is a challenging task due to its high comput...
research
09/26/2021

Vronicle: A System for Producing Videos with Verifiable Provenance

Demonstrating the veracity of videos is a longstanding problem that has ...

Please sign up or login with your details

Forgot password? Click here to reset