Pix2Video: Video Editing using Image Diffusion

03/22/2023
by   Duygu Ceylan, et al.
0

Image diffusion models, trained on massive image collections, have emerged as the most versatile image generator model in terms of quality and diversity. They support inverting real images and conditional (e.g., text) generation, making them attractive for high-quality image editing applications. We investigate how to use such pre-trained image models for text-guided video editing. The critical challenge is to achieve the target edits while still preserving the content of the source video. Our method works in two simple steps: first, we use a pre-trained structure-guided (e.g., depth) image diffusion model to perform text-guided edits on an anchor frame; then, in the key step, we progressively propagate the changes to the future frames via self-attention feature injection to adapt the core denoising step of the diffusion model. We then consolidate the changes by adjusting the latent code for the frame before continuing the process. Our approach is training-free and generalizes to a wide range of edits. We demonstrate the effectiveness of the approach by extensive experimentation and compare it against four different prior and parallel efforts (on ArXiv). We demonstrate that realistic text-guided video edits are possible, without any compute-intensive preprocessing or video-specific finetuning.

READ FULL TEXT

page 1

page 4

page 5

page 6

page 8

page 12

page 13

page 14

research
07/19/2023

TokenFlow: Consistent Diffusion Features for Consistent Video Editing

The generative AI revolution has recently expanded to videos. Neverthele...
research
02/15/2023

PRedItOR: Text Guided Image Editing with Diffusion Prior

Diffusion models have shown remarkable capabilities in generating high q...
research
03/23/2023

Text2Video-Zero: Text-to-Image Diffusion Models are Zero-Shot Video Generators

Recent text-to-video generation approaches rely on computationally heavy...
research
02/16/2023

Boundary Guided Mixing Trajectory for Semantic Control with Diffusion Models

Applying powerful generative denoising diffusion models (DDMs) for downs...
research
02/25/2023

Directed Diffusion: Direct Control of Object Placement through Attention Guidance

Text-guided diffusion models such as DALLE-2, IMAGEN, and Stable Diffusi...
research
03/21/2023

Vox-E: Text-guided Voxel Editing of 3D Objects

Large scale text-guided diffusion models have garnered significant atten...
research
07/10/2023

AnimateDiff: Animate Your Personalized Text-to-Image Diffusion Models without Specific Tuning

With the advance of text-to-image models (e.g., Stable Diffusion) and co...

Please sign up or login with your details

Forgot password? Click here to reset