Text-Driven Stylization of Video Objects

06/24/2022
by   Sebastian Loeschcke, et al.
4

We tackle the task of stylizing video objects in an intuitive and semantic manner following a user-specified text prompt. This is a challenging task as the resulting video must satisfy multiple properties: (1) it has to be temporally consistent and avoid jittering or similar artifacts, (2) the resulting stylization must preserve both the global semantics of the object and its fine-grained details, and (3) it must adhere to the user-specified text prompt. To this end, our method stylizes an object in a video according to two target texts. The first target text prompt describes the global semantics and the second target text prompt describes the local semantics. To modify the style of an object, we harness the representational power of CLIP to get a similarity score between (1) the local target text and a set of local stylized views, and (2) a global target text and a set of stylized global views. We use a pretrained atlas decomposition network to propagate the edits in a temporally consistent manner. We demonstrate that our method can generate consistent style changes over time for a variety of objects and videos, that adhere to the specification of the target texts. We also show how varying the specificity of the target texts and augmenting the texts with a set of prefixes results in stylizations with different levels of detail. Full results are given on our project webpage: https://sloeschcke.github.io/Text-Driven-Stylization-of-Video-Objects/

READ FULL TEXT

page 2

page 6

page 10

page 11

page 12

page 13

page 14

research
03/01/2020

Fine-grained Video-Text Retrieval with Hierarchical Graph Reasoning

Cross-modal retrieval between videos and texts has attracted growing att...
research
10/20/2020

Real-time Localized Photorealistic Video Style Transfer

We present a novel algorithm for transferring artistic styles of semanti...
research
04/14/2019

VORNet: Spatio-temporally Consistent Video Inpainting for Object Removal

Video object removal is a challenging task in video processing that ofte...
research
11/14/2020

ActBERT: Learning Global-Local Video-Text Representations

In this paper, we introduce ActBERT for self-supervised learning of join...
research
11/11/2019

Interactive Attention for Semantic Text Matching

Semantic text matching, which matches a target text to a source text, is...
research
08/23/2023

Blending-NeRF: Text-Driven Localized Editing in Neural Radiance Fields

Text-driven localized editing of 3D objects is particularly difficult as...
research
11/27/2019

LucidDream: Controlled Temporally-Consistent DeepDream on Videos

In this work, we aim to propose a set of techniques to improve the contr...

Please sign up or login with your details

Forgot password? Click here to reset