Edit Temporal-Consistent Videos with Image Diffusion Model

08/17/2023
by   Yuanzhi Wang, et al.
0

Large-scale text-to-image (T2I) diffusion models have been extended for text-guided video editing, yielding impressive zero-shot video editing performance. Nonetheless, the generated videos usually show spatial irregularities and temporal inconsistencies as the temporal characteristics of videos have not been faithfully modeled. In this paper, we propose an elegant yet effective Temporal-Consistent Video Editing (TCVE) method, to mitigate the temporal inconsistency challenge for robust text-guided video editing. In addition to the utilization of a pretrained 2D Unet for spatial content manipulation, we establish a dedicated temporal Unet architecture to faithfully capture the temporal coherence of the input video sequences. Furthermore, to establish coherence and interrelation between the spatial-focused and temporal-focused components, a cohesive joint spatial-temporal modeling unit is formulated. This unit effectively interconnects the temporal Unet with the pretrained 2D Unet, thereby enhancing the temporal consistency of the generated video output while simultaneously preserving the capacity for video content manipulation. Quantitative experimental results and visualization results demonstrate that TCVE achieves state-of-the-art performance in both video temporal consistency and video editing capability, surpassing existing benchmarks in the field.

READ FULL TEXT

page 1

page 3

page 5

page 6

page 7

research
07/19/2023

TokenFlow: Consistent Diffusion Features for Consistent Video Editing

The generative AI revolution has recently expanded to videos. Neverthele...
research
03/14/2023

Edit-A-Video: Single Video Editing with Object-Aware Consistency

Despite the fact that text-to-video (TTV) model has recently achieved re...
research
02/06/2023

Structure and Content-Guided Video Synthesis with Diffusion Models

Text-guided generative diffusion models unlock powerful image creation a...
research
06/14/2023

VidEdit: Zero-Shot and Spatially Aware Text-Driven Video Editing

Recently, diffusion-based generative models have achieved remarkable suc...
research
08/21/2023

EVE: Efficient zero-shot text-based Video Editing with Depth Map Guidance and Temporal Consistency Constraints

Motivated by the superior performance of image diffusion models, more an...
research
08/18/2023

SimDA: Simple Diffusion Adapter for Efficient Video Generation

The recent wave of AI-generated content has witnessed the great developm...
research
05/27/2023

Towards Consistent Video Editing with Text-to-Image Diffusion Models

Existing works have advanced Text-to-Image (TTI) diffusion models for vi...

Please sign up or login with your details

Forgot password? Click here to reset