Unsupervised Object-Level Representation Learning from Scene Images

by   Jiahao Xie, et al.

Contrastive self-supervised learning has largely narrowed the gap to supervised pre-training on ImageNet. However, its success highly relies on the object-centric priors of ImageNet, i.e., different augmented views of the same image correspond to the same object. Such a heavily curated constraint becomes immediately infeasible when pre-trained on more complex scene images with many objects. To overcome this limitation, we introduce Object-level Representation Learning (ORL), a new self-supervised learning framework towards scene images. Our key insight is to leverage image-level self-supervised pre-training as the prior to discover object-level semantic correspondence, thus realizing object-level representation learning from scene images. Extensive experiments on COCO show that ORL significantly improves the performance of self-supervised learning on scene images, even surpassing supervised ImageNet pre-training on several downstream tasks. Furthermore, ORL improves the downstream performance when more unlabeled scene images are available, demonstrating its great potential of harnessing unlabeled data in the wild. We hope our approach can motivate future research on more general-purpose unsupervised representation learning from scene data. Project page: https://www.mmlab-ntu.com/project/orl/.



There are no comments yet.


page 2

page 9

page 10

page 11


Contrasting Contrastive Self-Supervised Representation Learning Models

In the past few years, we have witnessed remarkable breakthroughs in sel...

Object-Aware Cropping for Self-Supervised Learning

A core component of the recent success of self-supervised learning is cr...

Object-aware Contrastive Learning for Debiased Scene Representation

Contrastive self-supervised learning has shown impressive results in lea...

Divide and Contrast: Self-supervised Learning from Uncurated Data

Self-supervised learning holds promise in leveraging large amounts of un...

Self-EMD: Self-Supervised Object Detection without ImageNet

In this paper, we propose a novel self-supervised representation learnin...

CASTing Your Model: Learning to Localize Improves Self-Supervised Representations

Recent advances in self-supervised learning (SSL) have largely closed th...

Boundary-aware Self-supervised Learning for Video Scene Segmentation

Self-supervised learning has drawn attention through its effectiveness i...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.