Emerging Properties in Self-Supervised Vision Transformers

04/29/2021
by   Mathilde Caron, et al.
12

In this paper, we question if self-supervised learning provides new properties to Vision Transformer (ViT) that stand out compared to convolutional networks (convnets). Beyond the fact that adapting self-supervised methods to this architecture works particularly well, we make the following observations: first, self-supervised ViT features contain explicit information about the semantic segmentation of an image, which does not emerge as clearly with supervised ViTs, nor with convnets. Second, these features are also excellent k-NN classifiers, reaching 78.3 also underlines the importance of momentum encoder, multi-crop training, and the use of small patches with ViTs. We implement our findings into a simple self-supervised method, called DINO, which we interpret as a form of self-distillation with no labels. We show the synergy between DINO and ViTs by achieving 80.1

READ FULL TEXT
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 7

page 8

page 14

page 20

04/05/2021

An Empirical Study of Training Self-Supervised Vision Transformers

This paper does not describe a novel method. Instead, it studies a strai...
03/29/2022

Self-Supervised Leaf Segmentation under Complex Lighting Conditions

As an essential prerequisite task in image-based plant phenotyping, leaf...
03/31/2021

On the Origin of Species of Self-Supervised Learning

In the quiet backwaters of cs.CV, cs.LG and stat.ML, a cornucopia of new...
10/13/2020

Audio-Visual Self-Supervised Terrain Type Discovery for Mobile Platforms

The ability to both recognize and discover terrain characteristics is an...
10/25/2019

SPICE: Self-supervised Pitch Estimation

We propose a model to estimate the fundamental frequency in monophonic a...
10/11/2021

Revitalizing CNN Attentions via Transformers in Self-Supervised Visual Representation Learning

Studies on self-supervised visual representation learning (SSL) improve ...
01/13/2021

Self-Supervised Vessel Enhancement Using Flow-Based Consistencies

Vessel segmenting is an essential task in many clinical applications. Al...

Code Repositories

dino

PyTorch code for Vision Transformers training with the Self-Supervised learning method DINO


view repo

steam-DINO

Retrieve Steam games with similar store banners, with Facebook's DINO.


view repo

Dino-and-Paws

replicating and improving facebookai's self supervised DINO and Semi supervised PAWS


view repo

Vision-2-Transformers

Colección de investigaciones de Transformers en Computer Vision


view repo
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.