DeepAI AI Chat
Log In Sign Up

Region Proposal Network Pre-Training Helps Label-Efficient Object Detection

by   Linus Ericsson, et al.

Self-supervised pre-training, based on the pretext task of instance discrimination, has fueled the recent advance in label-efficient object detection. However, existing studies focus on pre-training only a feature extractor network to learn transferable representations for downstream detection tasks. This leads to the necessity of training multiple detection-specific modules from scratch in the fine-tuning phase. We argue that the region proposal network (RPN), a common detection-specific module, can additionally be pre-trained towards reducing the localization error of multi-stage detectors. In this work, we propose a simple pretext task that provides an effective pre-training for the RPN, towards efficiently improving downstream object detection performance. We evaluate the efficacy of our approach on benchmark object detection tasks and additional downstream tasks, including instance segmentation and few-shot detection. In comparison with multi-stage detectors without RPN pre-training, our approach is able to consistently improve downstream task performance, with largest gains found in label-scarce settings.


page 2

page 4

page 7

page 10

page 11


Benchmarking Self-Supervised Learning on Diverse Pathology Datasets

Computational pathology can lead to saving human lives, but models are a...

Point-Level Region Contrast for Object Detection Pre-Training

In this work we present point-level region contrast, a self-supervised p...

Rethinking Few-Shot Object Detection on a Multi-Domain Benchmark

Most existing works on few-shot object detection (FSOD) focus on a setti...

Effective Adaptation in Multi-Task Co-Training for Unified Autonomous Driving

Aiming towards a holistic understanding of multiple downstream tasks sim...

GridCLIP: One-Stage Object Detection by Grid-Level CLIP Representation Learning

A vision-language foundation model pretrained on very large-scale image-...

MV-JAR: Masked Voxel Jigsaw and Reconstruction for LiDAR-Based Self-Supervised Pre-Training

This paper introduces the Masked Voxel Jigsaw and Reconstruction (MV-JAR...

Self-Supervised Representation Learning from Temporal Ordering of Automated Driving Sequences

Self-supervised feature learning enables perception systems to benefit f...