R-MAE: Regions Meet Masked Autoencoders

06/08/2023
by   Duy-Kien Nguyen, et al.
2

Vision-specific concepts such as "region" have played a key role in extending general machine learning frameworks to tasks like object detection. Given the success of region-based detectors for supervised learning and the progress of intra-image methods for contrastive learning, we explore the use of regions for reconstructive pre-training. Starting from Masked Autoencoding (MAE) both as a baseline and an inspiration, we propose a parallel pre-text task tailored to address the one-to-many mapping between images and regions. Since such regions can be generated in an unsupervised way, our approach (R-MAE) inherits the wide applicability from MAE, while being more "region-aware". We conduct thorough analyses during the development of R-MAE, and converge on a variant that is both effective and efficient (1.3 consistent quantitative improvements when generalized to various pre-training data and downstream detection and segmentation benchmarks. Finally, we provide extensive qualitative visualizations to enhance the understanding of R-MAE's behaviour and potential. Code will be made available at https://github.com/facebookresearch/r-mae.

READ FULL TEXT

page 3

page 4

page 7

page 8

page 10

page 11

research
02/09/2022

Point-Level Region Contrast for Object Detection Pre-Training

In this work we present point-level region contrast, a self-supervised p...
research
11/26/2021

Contrastive Object-level Pre-training with Spatial Noise Curriculum Learning

The goal of contrastive learning based pre-training is to leverage large...
research
06/12/2022

GLIPv2: Unifying Localization and Vision-Language Understanding

We present GLIPv2, a grounded VL understanding model, that serves both l...
research
10/04/2022

CFL-Net: Image Forgery Localization Using Contrastive Learning

Conventional forgery localizing methods usually rely on different forger...
research
07/14/2022

ConCL: Concept Contrastive Learning for Dense Prediction Pre-training in Pathology Images

Detectingandsegmentingobjectswithinwholeslideimagesis essential in compu...
research
04/29/2021

A Large-Scale Study on Unsupervised Spatiotemporal Representation Learning

We present a large-scale study on unsupervised spatiotemporal representa...
research
04/12/2023

CLIP Surgery for Better Explainability with Enhancement in Open-Vocabulary Tasks

Contrastive Language-Image Pre-training (CLIP) is a powerful multimodal ...

Please sign up or login with your details

Forgot password? Click here to reset