Structure and Content-Guided Video Synthesis with Diffusion Models

02/06/2023
by   Patrick Esser, et al.
0

Text-guided generative diffusion models unlock powerful image creation and editing tools. While these have been extended to video generation, current approaches that edit the content of existing footage while retaining structure require expensive re-training for every input or rely on error-prone propagation of image edits across frames. In this work, we present a structure and content-guided video diffusion model that edits videos based on visual or textual descriptions of the desired output. Conflicts between user-provided content edits and structure representations occur due to insufficient disentanglement between the two aspects. As a solution, we show that training on monocular depth estimates with varying levels of detail provides control over structure and content fidelity. Our model is trained jointly on images and videos which also exposes explicit control of temporal consistency through a novel guidance method. Our experiments demonstrate a wide variety of successes; fine-grained control over output characteristics, customization based on a few reference images, and a strong user preference towards results by our model.

READ FULL TEXT

page 14

page 16

page 20

page 21

page 22

page 23

page 24

page 25

research
05/15/2023

Make-A-Protagonist: Generic Video Editing with An Ensemble of Experts

The text-driven image and video diffusion models have achieved unprecede...
research
08/17/2023

Edit Temporal-Consistent Videos with Image Diffusion Model

Large-scale text-to-image (T2I) diffusion models have been extended for ...
research
05/26/2023

ControlVideo: Adding Conditional Control for One Shot Text-to-Video Editing

In this paper, we present ControlVideo, a novel method for text-driven v...
research
08/16/2023

DragNUWA: Fine-grained Control in Video Generation by Integrating Text, Image, and Trajectory

Controllable video generation has gained significant attention in recent...
research
04/12/2023

DreamPose: Fashion Image-to-Video Synthesis via Stable Diffusion

We present DreamPose, a diffusion-based method for generating animated f...
research
08/16/2023

Dual-Stream Diffusion Net for Text-to-Video Generation

With the emerging diffusion models, recently, text-to-video generation h...
research
05/15/2023

Edit As You Wish: Video Description Editing with Multi-grained Commands

Automatically narrating a video with natural language can assist people ...

Please sign up or login with your details

Forgot password? Click here to reset