DeepAI AI Chat
Log In Sign Up

Dense Semantic Contrast for Self-Supervised Visual Representation Learning

by   Xiaoni Li, et al.

Self-supervised representation learning for visual pre-training has achieved remarkable success with sample (instance or pixel) discrimination and semantics discovery of instance, whereas there still exists a non-negligible gap between pre-trained model and downstream dense prediction tasks. Concretely, these downstream tasks require more accurate representation, in other words, the pixels from the same object must belong to a shared semantic category, which is lacking in the previous methods. In this work, we present Dense Semantic Contrast (DSC) for modeling semantic category decision boundaries at a dense level to meet the requirement of these tasks. Furthermore, we propose a dense cross-image semantic contrastive learning framework for multi-granularity representation learning. Specially, we explicitly explore the semantic structure of the dataset by mining relations among pixels from different perspectives. For intra-image relation modeling, we discover pixel neighbors from multiple views. And for inter-image relations, we enforce pixel representation from the same semantic class to be more similar than the representation from different classes in one mini-batch. Experimental results show that our DSC model outperforms state-of-the-art methods when transferring to downstream dense prediction tasks, including object detection, semantic segmentation, and instance segmentation. Code will be made available.


page 1

page 3

page 8


Dense Contrastive Learning for Self-Supervised Visual Pre-Training

To date, most existing self-supervised learning methods are designed and...

Exploring Set Similarity for Dense Self-supervised Representation Learning

By considering the spatial correspondence, dense self-supervised represe...

iBOT: Image BERT Pre-Training with Online Tokenizer

The success of language Transformers is primarily attributed to the pret...

Self-Supervised Visual Representation Learning with Semantic Grouping

In this paper, we tackle the problem of learning visual representations ...

Self-Supervised Image Representation Learning with Geometric Set Consistency

We propose a method for self-supervised image representation learning un...

Deep Clustering by Semantic Contrastive Learning

Whilst contrastive learning has achieved remarkable success in self-supe...

Semantic Prediction: Which One Should Come First, Recognition or Prediction?

The ultimate goal of video prediction is not forecasting future pixel-va...