StableVideo: Text-driven Consistency-aware Diffusion Video Editing

08/18/2023
by   Wenhao Chai, et al.
0

Diffusion-based methods can generate realistic images and videos, but they struggle to edit existing objects in a video while preserving their appearance over time. This prevents diffusion models from being applied to natural video editing in practical scenarios. In this paper, we tackle this problem by introducing temporal dependency to existing text-driven diffusion models, which allows them to generate consistent appearance for the edited objects. Specifically, we develop a novel inter-frame propagation mechanism for diffusion video editing, which leverages the concept of layered representations to propagate the appearance information from one frame to the next. We then build up a text-driven video editing framework based on this mechanism, namely StableVideo, which can achieve consistency-aware video editing. Extensive experiments demonstrate the strong editing capability of our approach. Compared with state-of-the-art video editing methods, our approach shows superior qualitative and quantitative results. Our code is available at \href{https://github.com/rese1f/StableVideo}{this https URL}.

READ FULL TEXT

page 1

page 3

page 4

page 5

page 6

page 7

page 8

page 13

research
01/30/2023

Shape-aware Text-driven Layered Video Editing

Temporal consistency is essential for video editing applications. Existi...
research
07/19/2023

TokenFlow: Consistent Diffusion Features for Consistent Video Editing

The generative AI revolution has recently expanded to videos. Neverthele...
research
09/02/2023

MagicProp: Diffusion-based Video Editing via Motion-aware Appearance Propagation

This paper addresses the issue of modifying the visual appearance of vid...
research
05/27/2023

Towards Consistent Video Editing with Text-to-Image Diffusion Models

Existing works have advanced Text-to-Image (TTI) diffusion models for vi...
research
02/02/2023

Dreamix: Video Diffusion Models are General Video Editors

Text-driven image and video diffusion models have recently achieved unpr...
research
01/10/2023

Speech Driven Video Editing via an Audio-Conditioned Diffusion Model

In this paper we propose a method for end-to-end speech driven video edi...
research
12/06/2022

Diffusion Video Autoencoders: Toward Temporally Consistent Face Video Editing via Disentangled Video Encoding

Inspired by the impressive performance of recent face image editing meth...

Please sign up or login with your details

Forgot password? Click here to reset