Intra-Instance VICReg: Bag of Self-Supervised Image Patch Embedding

06/17/2022
by   Yubei Chen, et al.
9

Recently, self-supervised learning (SSL) has achieved tremendous empirical advancements in learning image representation. However, our understanding and knowledge of the representation are still limited. This work shows that the success of the SOTA siamese-network-based SSL approaches is primarily based on learning a representation of image patches. Particularly, we show that when we learn a representation only for fixed-scale image patches and aggregate different patch representations linearly for an image (instance), it can achieve on par or even better results than the baseline methods on several benchmarks. Further, we show that the patch representation aggregation can also improve various SOTA baseline methods by a large margin. We also establish a formal connection between the SSL objective and the image patches co-occurrence statistics modeling, which supplements the prevailing invariance perspective. By visualizing the nearest neighbors of different image patches in the embedding space and projection space, we show that while the projection has more invariance, the embedding space tends to preserve more equivariance and locality. Finally, we propose a hypothesis for the future direction based on the discovery of this work.

READ FULL TEXT

page 8

page 9

page 14

page 15

page 16

page 17

page 18

page 19

research
04/14/2022

Masked Siamese Networks for Label-Efficient Learning

We propose Masked Siamese Networks (MSN), a self-supervised learning fra...
research
04/01/2023

Mask Hierarchical Features For Self-Supervised Learning

This paper shows that Masking the Deep hierarchical features is an effic...
research
10/31/2022

Embedding Space Augmentation for Weakly Supervised Learning in Whole-Slide Images

Multiple Instance Learning (MIL) is a widely employed framework for lear...
research
06/16/2021

PatchNet: Unsupervised Object Discovery based on Patch Embedding

We demonstrate that frequently appearing objects can be discovered by tr...
research
04/22/2010

Hashing Image Patches for Zooming

In this paper we present a Bayesian image zooming/super-resolution algor...
research
06/25/2023

A Self-Encoder for Learning Nearest Neighbors

We present the self-encoder, a neural network trained to guess the ident...
research
03/20/2022

SimAN: Exploring Self-Supervised Representation Learning of Scene Text via Similarity-Aware Normalization

Recently self-supervised representation learning has drawn considerable ...

Please sign up or login with your details

Forgot password? Click here to reset