ControlVideo: Adding Conditional Control for One Shot Text-to-Video Editing

05/26/2023
by   Min Zhao, et al.
0

In this paper, we present ControlVideo, a novel method for text-driven video editing. Leveraging the capabilities of text-to-image diffusion models and ControlNet, ControlVideo aims to enhance the fidelity and temporal consistency of videos that align with a given text while preserving the structure of the source video. This is achieved by incorporating additional conditions such as edge maps, fine-tuning the key-frame and temporal attention on the source video-text pair with carefully designed strategies. An in-depth exploration of ControlVideo's design is conducted to inform future research on one-shot tuning video diffusion models. Quantitatively, ControlVideo outperforms a range of competitive baselines in terms of faithfulness and consistency while still aligning with the textual prompt. Additionally, it delivers videos with high visual realism and fidelity w.r.t. the source content, demonstrating flexibility in utilizing controls containing varying degrees of source video information, and the potential for multiple control combinations. The project page is available at \href{https://ml.cs.tsinghua.edu.cn/controlvideo/}{https://ml.cs.tsinghua.edu.cn/controlvideo/}.

READ FULL TEXT

page 3

page 8

page 9

page 13

page 14

page 15

page 16

page 17

research
07/19/2023

TokenFlow: Consistent Diffusion Features for Consistent Video Editing

The generative AI revolution has recently expanded to videos. Neverthele...
research
06/14/2023

VidEdit: Zero-Shot and Spatially Aware Text-Driven Video Editing

Recently, diffusion-based generative models have achieved remarkable suc...
research
12/22/2022

Tune-A-Video: One-Shot Tuning of Image Diffusion Models for Text-to-Video Generation

To reproduce the success of text-to-image (T2I) generation, recent works...
research
02/06/2023

Structure and Content-Guided Video Synthesis with Diffusion Models

Text-guided generative diffusion models unlock powerful image creation a...
research
08/21/2023

EVE: Efficient zero-shot text-based Video Editing with Depth Map Guidance and Temporal Consistency Constraints

Motivated by the superior performance of image diffusion models, more an...
research
02/02/2023

Dreamix: Video Diffusion Models are General Video Editors

Text-driven image and video diffusion models have recently achieved unpr...
research
05/02/2023

Key-Locked Rank One Editing for Text-to-Image Personalization

Text-to-image models (T2I) offer a new level of flexibility by allowing ...

Please sign up or login with your details

Forgot password? Click here to reset