Text2LIVE: Text-Driven Layered Image and Video Editing

04/05/2022
by   Omer Bar-Tal, et al.
0

We present a method for zero-shot, text-driven appearance manipulation in natural images and videos. Given an input image or video and a target text prompt, our goal is to edit the appearance of existing objects (e.g., object's texture) or augment the scene with visual effects (e.g., smoke, fire) in a semantically meaningful manner. We train a generator using an internal dataset of training examples, extracted from a single input (image or video and target text prompt), while leveraging an external pre-trained CLIP model to establish our losses. Rather than directly generating the edited output, our key idea is to generate an edit layer (color+opacity) that is composited over the original input. This allows us to constrain the generation process and maintain high fidelity to the original input via novel text-driven losses that are applied directly to the edit layer. Our method neither relies on a pre-trained generator nor requires user-provided edit masks. We demonstrate localized, semantic edits on high-resolution natural images and videos across a variety of objects and scenes.

READ FULL TEXT

page 1

page 2

page 9

page 10

page 11

page 12

page 14

research
10/17/2022

Imagic: Text-Based Real Image Editing with Diffusion Models

Text-conditioned image editing has recently attracted considerable inter...
research
01/02/2022

Splicing ViT Features for Semantic Appearance Transfer

We present a method for semantically transferring the visual appearance ...
research
10/17/2022

UniTune: Text-Driven Image Editing by Fine Tuning an Image Generation Model on a Single Image

We present UniTune, a simple and novel method for general text-driven im...
research
08/23/2023

Blending-NeRF: Text-Driven Localized Editing in Neural Radiance Fields

Text-driven localized editing of 3D objects is particularly difficult as...
research
06/11/2022

An Evaluation of OCR on Egocentric Data

In this paper, we evaluate state-of-the-art OCR methods on Egocentric da...
research
04/14/2022

Deformable Sprites for Unsupervised Video Decomposition

We describe a method to extract persistent elements of a dynamic scene f...
research
09/23/2021

Layered Neural Atlases for Consistent Video Editing

We present a method that decomposes, or "unwraps", an input video into a...

Please sign up or login with your details

Forgot password? Click here to reset