A Study on Self-Supervised Object Detection Pretraining

07/09/2022
by   Trung Dang, et al.
11

In this work, we study different approaches to self-supervised pretraining of object detection models. We first design a general framework to learn a spatially consistent dense representation from an image, by randomly sampling and projecting boxes to each augmented view and maximizing the similarity between corresponding box features. We study existing design choices in the literature, such as box generation, feature extraction strategies, and using multiple views inspired by its success on instance-level image representation learning techniques. Our results suggest that the method is robust to different choices of hyperparameters, and using multiple views is not as effective as shown for instance-level image representation learning. We also design two auxiliary tasks to predict boxes in one view from their features in the other view, by (1) predicting boxes from the sampled set by using a contrastive loss, and (2) predicting box coordinates using a transformer, which potentially benefits downstream object detection tasks. We found that these tasks do not lead to better object detection performance when finetuning the pretrained model on labeled data.

READ FULL TEXT
research
03/10/2021

Spatially Consistent Representation Learning

Self-supervised learning has been widely used to obtain transferrable re...
research
12/09/2022

Contrastive View Design Strategies to Enhance Robustness to Domain Shifts in Downstream Object Detection

Contrastive learning has emerged as a competitive pretraining method for...
research
06/04/2021

Aligning Pretraining for Detection via Object-Level Contrastive Learning

Image-level contrastive representation learning has proven to be highly ...
research
07/19/2021

Exploring Set Similarity for Dense Self-supervised Representation Learning

By considering the spatial correspondence, dense self-supervised represe...
research
06/02/2022

Siamese Image Modeling for Self-Supervised Vision Representation Learning

Self-supervised learning (SSL) has delivered superior performance on a v...
research
03/08/2021

Unsupervised Pretraining for Object Detection by Patch Reidentification

Unsupervised representation learning achieves promising performances in ...
research
06/14/2021

Latent Correlation-Based Multiview Learning and Self-Supervision: A Unifying Perspective

Multiple views of data, both naturally acquired (e.g., image and audio) ...

Please sign up or login with your details

Forgot password? Click here to reset