Neural Congealing: Aligning Images to a Joint Semantic Atlas

02/08/2023
by   Dolev Ofri-Amar, et al.
0

We present Neural Congealing – a zero-shot self-supervised framework for detecting and jointly aligning semantically-common content across a given set of images. Our approach harnesses the power of pre-trained DINO-ViT features to learn: (i) a joint semantic atlas – a 2D grid that captures the mode of DINO-ViT features in the input set, and (ii) dense mappings from the unified atlas to each of the input images. We derive a new robust self-supervised framework that optimizes the atlas representation and mappings per image set, requiring only a few real-world images as input without any additional input information (e.g., segmentation masks). Notably, we design our losses and training paradigm to account only for the shared content under severe variations in appearance, pose, background clutter or other distracting objects. We demonstrate results on a plethora of challenging image sets including sets of mixed domains (e.g., aligning images depicting sculpture and artwork of cats), sets depicting related yet different object categories (e.g., dogs and tigers), or domains for which large-scale training data is scarce (e.g., coffee mugs). We thoroughly evaluate our method and show that our test-time optimization approach performs favorably compared to a state-of-the-art method that requires extensive training on large-scale datasets.

READ FULL TEXT

page 1

page 4

page 6

page 7

page 8

page 9

page 14

research
05/17/2022

Self-supervised Neural Articulated Shape and Appearance Models

Learning geometry, motion, and appearance priors of object classes is im...
research
05/03/2019

SCOPS: Self-Supervised Co-Part Segmentation

Parts provide a good intermediate representation of objects that is robu...
research
06/15/2023

Diffusion Models for Zero-Shot Open-Vocabulary Segmentation

The variety of objects in the real world is nearly unlimited and is thus...
research
06/15/2021

TextStyleBrush: Transfer of Text Aesthetics from a Single Example

We present a novel approach for disentangling the content of a text imag...
research
01/02/2022

Splicing ViT Features for Semantic Appearance Transfer

We present a method for semantically transferring the visual appearance ...
research
12/10/2021

Deep ViT Features as Dense Visual Descriptors

We leverage deep features extracted from a pre-trained Vision Transforme...
research
12/02/2022

ObjectStitch: Generative Object Compositing

Object compositing based on 2D images is a challenging problem since it ...

Please sign up or login with your details

Forgot password? Click here to reset