How Well Do Self-Supervised Models Transfer?
Self-supervised visual representation learning has seen huge progress in recent months. However, no large scale evaluation has compared the many pre-trained models that are now available. In this paper, we evaluate the transfer performance of 13 top self-supervised models on 25 downstream tasks, including many-shot classification, few-shot classification, object detection and dense prediction. We compare their performance to a supervised baseline and conclude that on most datasets, the best self-supervised models outperform supervision, confirming the recently observed trend in the literature. We find ImageNet Top-1 accuracy to be highly correlated with transfer to many-shot recognition, but increasingly less so for few-shot, object detection and dense prediction, as well as to unstructured data. There is no single self-supervised method which dominates overall, but notably DeepCluster-v2 comes out on top in recognition and SimCLR-v2 in detection and dense prediction. Our analysis of feature properties suggests that top self-supervised learners struggle to preserve colour information as well as supervised (likely due to use of augmentation), but exhibit better calibration for recognition and suffer from less attentive overfitting than supervised learners.
READ FULL TEXT