VidEdit: Zero-Shot and Spatially Aware Text-Driven Video Editing

06/14/2023
by   Paul Couairon, et al.
0

Recently, diffusion-based generative models have achieved remarkable success for image generation and edition. However, their use for video editing still faces important limitations. This paper introduces VidEdit, a novel method for zero-shot text-based video editing ensuring strong temporal and spatial consistency. Firstly, we propose to combine atlas-based and pre-trained text-to-image diffusion models to provide a training-free and efficient editing method, which by design fulfills temporal smoothness. Secondly, we leverage off-the-shelf panoptic segmenters along with edge detectors and adapt their use for conditioned diffusion-based atlas editing. This ensures a fine spatial control on targeted regions while strictly preserving the structure of the original video. Quantitative and qualitative experiments show that VidEdit outperforms state-of-the-art methods on DAVIS dataset, regarding semantic faithfulness, image preservation, and temporal consistency metrics. With this framework, processing a single video only takes approximately one minute, and it can generate multiple compatible edits based on a unique text prompt. Project web-page at https://videdit.github.io

READ FULL TEXT

page 1

page 3

page 5

page 8

page 9

page 14

page 15

page 16

research
03/30/2023

Zero-Shot Video Editing Using Off-The-Shelf Image Diffusion Models

Large-scale text-to-image diffusion models achieve unprecedented success...
research
08/21/2023

EVE: Efficient zero-shot text-based Video Editing with Depth Map Guidance and Temporal Consistency Constraints

Motivated by the superior performance of image diffusion models, more an...
research
05/26/2023

ControlVideo: Adding Conditional Control for One Shot Text-to-Video Editing

In this paper, we present ControlVideo, a novel method for text-driven v...
research
02/02/2023

Dreamix: Video Diffusion Models are General Video Editors

Text-driven image and video diffusion models have recently achieved unpr...
research
08/11/2023

Zero-shot Text-driven Physically Interpretable Face Editing

This paper proposes a novel and physically interpretable method for face...
research
08/17/2023

Edit Temporal-Consistent Videos with Image Diffusion Model

Large-scale text-to-image (T2I) diffusion models have been extended for ...
research
07/24/2023

Interpolating between Images with Diffusion Models

One little-explored frontier of image generation and editing is the task...

Please sign up or login with your details

Forgot password? Click here to reset