A Simple and Effective Use of Object-Centric Images for Long-Tailed Object Detection

by   Cheng Zhang, et al.

Object frequencies in daily scenes follow a long-tailed distribution. Many objects do not appear frequently enough in scene-centric images (e.g., sightseeing, street views) for us to train accurate object detectors. In contrast, these objects are captured at a higher frequency in object-centric images, which are intended to picture the objects of interest. Motivated by this phenomenon, we propose to take advantage of the object-centric images to improve object detection in scene-centric images. We present a simple yet surprisingly effective framework to do so. On the one hand, our approach turns an object-centric image into a useful training example for object detection in scene-centric images by mitigating the domain gap between the two image sources in both the input and label space. On the other hand, our approach employs a multi-stage procedure to train the object detector, such that the detector learns the diverse object appearances from object-centric images while being tied to the application domain of scene-centric images. On the LVIS dataset, our approach can improve the object detection (and instance segmentation) accuracy of rare objects by 50 performance of other classes.


page 1

page 3

page 4

page 5

page 9

page 14

page 16

page 17


Image-Level or Object-Level? A Tale of Two Resampling Strategies for Long-Tailed Detection

Training on datasets with long-tailed distributions has been challenging...

3D Video Object Detection with Learnable Object-Centric Global Optimization

We explore long-term temporal visual correspondence-based optimization f...

Seeing What Is Not There: Learning Context to Determine Where Objects Are Missing

Most of computer vision focuses on what is in an image. We propose to tr...

Stitcher: Feedback-driven Data Provider for Object Detection

Object detectors commonly vary quality according to scales, where the pe...

Scene-centric vs. Object-centric Image-Text Cross-modal Retrieval: A Reproducibility Study

Most approaches to cross-modal retrieval (CMR) focus either on object-ce...

Context Forest for efficient object detection with large mixture models

We present Context Forest (ConF), a technique for predicting properties ...

Code Repositories


Mosaic of Object-centric Images as Scene-centric Images (MosaicOS) for long-tailed object detection and instance segmentation.

view repo

Please sign up or login with your details

Forgot password? Click here to reset