ViCE: Self-Supervised Visual Concept Embeddings as Contextual and Pixel Appearance Invariant Semantic Representations

11/24/2021
by   Robin Karlsson, et al.
8

This work presents a self-supervised method to learn dense semantically rich visual concept embeddings for images inspired by methods for learning word embeddings in NLP. Our method improves on prior work by generating more expressive embeddings and by being applicable for high-resolution images. Viewing the generation of natural images as a stochastic process where a set of latent visual concepts give rise to observable pixel appearances, our method is formulated to learn the inverse mapping from pixels to concepts. Our method greatly improves the effectiveness of self-supervised learning for dense embedding maps by introducing superpixelization as a natural hierarchical step up from pixels to a small set of visually coherent regions. Additional contributions are regional contextual masking with nonuniform shapes matching visually coherent patches and complexity-based view sampling inspired by masked language models. The enhanced expressiveness of our dense embeddings is demonstrated by significantly improving the state-of-the-art representation quality benchmarks on COCO (+12.94 mIoU, +87.6%) and Cityscapes (+16.52 mIoU, +134.2%). Results show favorable scaling and domain generalization properties not demonstrated by prior work.

READ FULL TEXT

page 1

page 5

page 8

page 15

page 16

page 17

research
02/27/2020

Learning Representations by Predicting Bags of Visual Words

Self-supervised representation learning targets to learn convnet-based i...
research
08/22/2023

Time Does Tell: Self-Supervised Time-Tuning of Dense Image Representations

Spatially dense self-supervised learning is a rapidly growing problem do...
research
05/19/2023

S-JEA: Stacked Joint Embedding Architectures for Self-Supervised Visual Representation Learning

The recent emergence of Self-Supervised Learning (SSL) as a fundamental ...
research
02/22/2022

Hierarchical Perceiver

General perception systems such as Perceivers can process arbitrary moda...
research
09/06/2023

ViewMix: Augmentation for Robust Representation in Self-Supervised Learning

Joint Embedding Architecture-based self-supervised learning methods have...
research
03/28/2022

Learning Where to Learn in Cross-View Self-Supervised Learning

Self-supervised learning (SSL) has made enormous progress and largely na...
research
06/06/2022

Mapping Visual Themes among Authentic and Coordinated Memes

What distinguishes authentic memes from those created by state actors? I...

Please sign up or login with your details

Forgot password? Click here to reset