Masked Siamese Networks for Label-Efficient Learning

04/14/2022
by   Mahmoud Assran, et al.
25

We propose Masked Siamese Networks (MSN), a self-supervised learning framework for learning image representations. Our approach matches the representation of an image view containing randomly masked patches to the representation of the original unmasked image. This self-supervised pre-training strategy is particularly scalable when applied to Vision Transformers since only the unmasked patches are processed by the network. As a result, MSNs improve the scalability of joint-embedding architectures, while producing representations of a high semantic level that perform competitively on low-shot image classification. For instance, on ImageNet-1K, with only 5,000 annotated images, our base MSN model achieves 72.4 of ImageNet-1K labels, we achieve 75.7 state-of-the-art for self-supervised learning on this benchmark. Our code is publicly available.

READ FULL TEXT

page 3

page 22

page 23

page 24

page 25

page 26

research
04/20/2022

Self-supervised Learning for Sonar Image Classification

Self-supervised learning has proved to be a powerful approach to learn i...
research
01/19/2023

Self-Supervised Learning from Images with a Joint-Embedding Predictive Architecture

This paper demonstrates an approach for learning highly semantic image r...
research
11/25/2022

Ladder Siamese Network: a Method and Insights for Multi-level Self-Supervised Learning

Siamese-network-based self-supervised learning (SSL) suffers from slow c...
research
06/17/2022

Intra-Instance VICReg: Bag of Self-Supervised Image Patch Embedding

Recently, self-supervised learning (SSL) has achieved tremendous empiric...
research
10/20/2022

MixMask: Revisiting Masked Siamese Self-supervised Learning in Asymmetric Distance

Recent advances in self-supervised learning integrate Masked Modeling an...
research
05/19/2023

S-JEA: Stacked Joint Embedding Architectures for Self-Supervised Visual Representation Learning

The recent emergence of Self-Supervised Learning (SSL) as a fundamental ...
research
12/14/2022

Efficient Self-supervised Learning with Contextualized Target Representations for Vision, Speech and Language

Current self-supervised learning algorithms are often modality-specific ...

Please sign up or login with your details

Forgot password? Click here to reset