Edit-A-Video: Single Video Editing with Object-Aware Consistency

03/14/2023
by   Chaehun Shin, et al.
1

Despite the fact that text-to-video (TTV) model has recently achieved remarkable success, there have been few approaches on TTV for its extension to video editing. Motivated by approaches on TTV models adapting from diffusion-based text-to-image (TTI) models, we suggest the video editing framework given only a pretrained TTI model and a single <text, video> pair, which we term Edit-A-Video. The framework consists of two stages: (1) inflating the 2D model into the 3D model by appending temporal modules and tuning on the source video (2) inverting the source video into the noise and editing with target text prompt and attention map injection. Each stage enables the temporal modeling and preservation of semantic attributes of the source video. One of the key challenges for video editing include a background inconsistency problem, where the regions not included for the edit suffer from undesirable and inconsistent temporal alterations. To mitigate this issue, we also introduce a novel mask blending method, termed as sparse-causal blending (SC Blending). We improve previous mask blending methods to reflect the temporal consistency so that the area where the editing is applied exhibits smooth transition while also achieving spatio-temporal consistency of the unedited regions. We present extensive experimental results over various types of text and videos, and demonstrate the superiority of the proposed method compared to baselines in terms of background consistency, text alignment, and video editing quality.

READ FULL TEXT

page 1

page 4

page 5

page 7

page 8

research
08/17/2023

Edit Temporal-Consistent Videos with Image Diffusion Model

Large-scale text-to-image (T2I) diffusion models have been extended for ...
research
11/21/2020

Iterative Text-based Editing of Talking-heads Using Neural Retargeting

We present a text-based tool for editing talking-head video that enables...
research
05/27/2023

Towards Consistent Video Editing with Text-to-Image Diffusion Models

Existing works have advanced Text-to-Image (TTI) diffusion models for vi...
research
09/02/2023

MagicProp: Diffusion-based Video Editing via Motion-aware Appearance Propagation

This paper addresses the issue of modifying the visual appearance of vid...
research
08/23/2023

Blending-NeRF: Text-Driven Localized Editing in Neural Radiance Fields

Text-driven localized editing of 3D objects is particularly difficult as...
research
06/27/2018

Watch to Edit: Video Retargeting using Gaze

We present a novel approach to optimally retarget videos for varied disp...
research
02/02/2023

Dreamix: Video Diffusion Models are General Video Editors

Text-driven image and video diffusion models have recently achieved unpr...

Please sign up or login with your details

Forgot password? Click here to reset