Dense Contrastive Learning for Self-Supervised Visual Pre-Training

by   Xinlong Wang, et al.

To date, most existing self-supervised learning methods are designed and optimized for image classification. These pre-trained models can be sub-optimal for dense prediction tasks due to the discrepancy between image-level prediction and pixel-level prediction. To fill this gap, we aim to design an effective, dense self-supervised learning method that directly works at the level of pixels (or local features) by taking into account the correspondence between local features. We present dense contrastive learning, which implements self-supervised learning by optimizing a pairwise contrastive (dis)similarity loss at the pixel level between two views of input images. Compared to the baseline method MoCo-v2, our method introduces negligible computation overhead (only <1 transferring to downstream dense prediction tasks including object detection, semantic segmentation and instance segmentation; and outperforms the state-of-the-art methods by a large margin. Specifically, over the strong MoCo-v2 baseline, our method achieves significant improvements of 2.0 PASCAL VOC object detection, 1.1 instance segmentation, 3.0 mIoU on Cityscapes semantic segmentation. Code is available at:


Self-supervised Learning with Local Contrastive Loss for Detection and Semantic Segmentation

We present a self-supervised learning (SSL) method suitable for semi-glo...

Dense Semantic Contrast for Self-Supervised Visual Representation Learning

Self-supervised representation learning for visual pre-training has achi...

HoughCL: Finding Better Positive Pairs in Dense Self-supervised Learning

Recently, self-supervised methods show remarkable achievements in image-...

Self-Supervised Pre-training of Vision Transformers for Dense Prediction Tasks

We present a new self-supervised pre-training of Vision Transformers for...

Pixel-level Correspondence for Self-Supervised Learning from Video

While self-supervised learning has enabled effective representation lear...

Exploring Set Similarity for Dense Self-supervised Representation Learning

By considering the spatial correspondence, dense self-supervised represe...

Propagate Yourself: Exploring Pixel-Level Consistency for Unsupervised Visual Representation Learning

Contrastive learning methods for unsupervised visual representation lear...