Refine and Represent: Region-to-Object Representation Learning

08/25/2022
by   Akash Gokul, et al.
0

Recent works in self-supervised learning have demonstrated strong performance on scene-level dense prediction tasks by pretraining with object-centric or region-based correspondence objectives. In this paper, we present Region-to-Object Representation Learning (R2O) which unifies region-based and object-centric pretraining. R2O operates by training an encoder to dynamically refine region-based segments into object-centric masks and then jointly learns representations of the contents within the mask. R2O uses a "region refinement module" to group small image regions, generated using a region-level prior, into larger regions which tend to correspond to objects by clustering region-level features. As pretraining progresses, R2O follows a region-to-object curriculum which encourages learning region-level features early on and gradually progresses to train object-centric representations. Representations learned using R2O lead to state-of-the art performance in semantic segmentation for PASCAL VOC (+0.7 mIOU) and Cityscapes (+0.4 mIOU) and instance segmentation on MS COCO (+0.3 mask AP). Further, after pretraining on ImageNet, R2O pretrained models are able to surpass existing state-of-the-art in unsupervised object segmentation on the Caltech-UCSD Birds 200-2011 dataset (+2.9 mIoU) without any further training. We provide the code/models from this work at https://github.com/KKallidromitis/r2o.

READ FULL TEXT

page 2

page 19

page 20

page 21

research
04/17/2023

Self-Supervised Learning from Non-Object Centric Images with a Geometric Transformation Sensitive Architecture

Most invariance-based self-supervised methods rely on single object-cent...
research
03/19/2021

Efficient Visual Pretraining with Contrastive Detection

Self-supervised pretraining has been shown to yield powerful representat...
research
12/06/2022

Semantically Enhanced Global Reasoning for Semantic Segmentation

Recent advances in pixel-level tasks (e.g., segmentation) illustrate the...
research
08/19/2023

Semantics Meets Temporal Correspondence: Self-supervised Object-centric Learning in Videos

Self-supervised methods have shown remarkable progress in learning high-...
research
08/25/2023

Self-supervised Scene Text Segmentation with Object-centric Layered Representations Augmented by Text Regions

Text segmentation tasks have a very wide range of application values, su...
research
09/24/2021

Dense Contrastive Visual-Linguistic Pretraining

Inspired by the success of BERT, several multimodal representation learn...
research
01/02/2023

Learning Road Scene-level Representations via Semantic Region Prediction

In this work, we tackle two vital tasks in automated driving systems, i....

Please sign up or login with your details

Forgot password? Click here to reset