DeepAI AI Chat
Log In Sign Up

How Well Do Self-Supervised Models Transfer?

by   Linus Ericsson, et al.

Self-supervised visual representation learning has seen huge progress in recent months. However, no large scale evaluation has compared the many pre-trained models that are now available. In this paper, we evaluate the transfer performance of 13 top self-supervised models on 25 downstream tasks, including many-shot classification, few-shot classification, object detection and dense prediction. We compare their performance to a supervised baseline and conclude that on most datasets, the best self-supervised models outperform supervision, confirming the recently observed trend in the literature. We find ImageNet Top-1 accuracy to be highly correlated with transfer to many-shot recognition, but increasingly less so for few-shot, object detection and dense prediction, as well as to unstructured data. There is no single self-supervised method which dominates overall, but notably DeepCluster-v2 comes out on top in recognition and SimCLR-v2 in detection and dense prediction. Our analysis of feature properties suggests that top self-supervised learners struggle to preserve colour information as well as supervised (likely due to use of augmentation), but exhibit better calibration for recognition and suffer from less attentive overfitting than supervised learners.


A Survey of Self-Supervised and Few-Shot Object Detection

Labeling data is often expensive and time-consuming, especially for task...

Diverse Imagenet Models Transfer Better

A commonly accepted hypothesis is that models with higher accuracy on Im...

Multi-task Self-Supervised Visual Learning

We investigate methods for combining multiple self-supervised tasks--i.e...

Self-EMD: Self-Supervised Object Detection without ImageNet

In this paper, we propose a novel self-supervised representation learnin...

ControlFlag: A Self-supervised Idiosyncratic Pattern Detection System for Software Control Structures

Software debugging has been shown to utilize upwards of 50 time. Machine...

ConvNeXt V2: Co-designing and Scaling ConvNets with Masked Autoencoders

Driven by improved architectures and better representation learning fram...

Vision Models Are More Robust And Fair When Pretrained On Uncurated Images Without Supervision

Discriminative self-supervised learning allows training models on any ra...