Log In Sign Up

DetNet: A Backbone network for Object Detection

by   Zeming Li, et al.

Recent CNN based object detectors, no matter one-stage methods like YOLO, SSD, and RetinaNe or two-stage detectors like Faster R-CNN, R-FCN and FPN are usually trying to directly finetune from ImageNet pre-trained models designed for image classification. There has been little work discussing on the backbone feature extractor specifically designed for the object detection. More importantly, there are several differences between the tasks of image classification and object detection. 1. Recent object detectors like FPN and RetinaNet usually involve extra stages against the task of image classification to handle the objects with various scales. 2. Object detection not only needs to recognize the category of the object instances but also spatially locate the position. Large downsampling factor brings large valid receptive field, which is good for image classification but compromises the object location ability. Due to the gap between the image classification and object detection, we propose DetNet in this paper, which is a novel backbone network specifically designed for object detection. Moreover, DetNet includes the extra stages against traditional backbone network for image classification, while maintains high spatial resolution in deeper layers. Without any bells and whistles, state-of-the-art results have been obtained for both object detection and instance segmentation on the MSCOCO benchmark based on our DetNet (4.8G FLOPs) backbone. The code will be released for the reproduction.


page 13

page 15


Efficient Scale-Permuted Backbone with Learned Resource Distribution

Recently, SpineNet has demonstrated promising results on object detectio...

Proper Reuse of Image Classification Features Improves Object Detection

A common practice in transfer learning is to initialize the downstream m...

GiraffeDet: A Heavy-Neck Paradigm for Object Detection

In conventional object detection frameworks, a backbone body inherited f...

Localizing Grouped Instances for Efficient Detection in Low-Resource Scenarios

State-of-the-art detection systems are generally evaluated on their abil...

CBNetV2: A Composite Backbone Network Architecture for Object Detection

Consistent performance gains through exploring more effective network st...

GaTector: A Unified Framework for Gaze Object Prediction

Gaze object prediction (GOP) is a newly proposed task that aims to disco...