Splicing ViT Features for Semantic Appearance Transfer

01/02/2022
by   Narek Tumanyan, et al.
4

We present a method for semantically transferring the visual appearance of one natural image to another. Specifically, our goal is to generate an image in which objects in a source structure image are "painted" with the visual appearance of their semantically related objects in a target appearance image. Our method works by training a generator given only a single structure/appearance image pair as input. To integrate semantic information into our framework - a pivotal component in tackling this task - our key idea is to leverage a pre-trained and fixed Vision Transformer (ViT) model which serves as an external semantic prior. Specifically, we derive novel representations of structure and appearance extracted from deep ViT features, untwisting them from the learned self-attention modules. We then establish an objective function that splices the desired structure and appearance representations, interweaving them together in the space of ViT features. Our framework, which we term "Splice", does not involve adversarial training, nor does it require any additional input information such as semantic segmentation or correspondences, and can generate high-resolution results, e.g., work in HD. We demonstrate high quality results on a variety of in-the-wild image pairs, under significant variations in the number of objects, their pose and appearance.

READ FULL TEXT

page 1

page 2

page 3

page 4

page 5

page 8

page 9

page 10

research
04/05/2022

Text2LIVE: Text-Driven Layered Image and Video Editing

We present a method for zero-shot, text-driven appearance manipulation i...
research
12/10/2021

Deep ViT Features as Dense Visual Descriptors

We leverage deep features extracted from a pre-trained Vision Transforme...
research
05/10/2018

Neural Best-Buddies: Sparse Cross-Domain Correspondence

Correspondence between images is a fundamental problem in computer visio...
research
07/18/2022

A Semantic-aware Attention and Visual Shielding Network for Cloth-changing Person Re-identification

Cloth-changing person reidentification (ReID) is a newly emerging resear...
research
07/01/2023

Internal-External Boundary Attention Fusion for Glass Surface Segmentation

Glass surfaces of transparent objects and mirrors are not able to be uni...
research
04/24/2014

On Learning Where To Look

Current automatic vision systems face two major challenges: scalability ...
research
02/08/2023

Neural Congealing: Aligning Images to a Joint Semantic Atlas

We present Neural Congealing – a zero-shot self-supervised framework for...

Please sign up or login with your details

Forgot password? Click here to reset