Light-Head R-CNN: In Defense of Two-Stage Object Detector

by   Zeming Li, et al.

In this paper, we first investigate why typical two-stage methods are not as fast as single-stage, fast detectors like YOLO and SSD. We find that Faster R-CNN and R-FCN perform an intensive computation after or before RoI warping. Faster R-CNN involves two fully connected layers for RoI recognition, while R-FCN produces a large score maps. Thus, the speed of these networks is slow due to the heavy-head design in the architecture. Even if we significantly reduce the base model, the computation cost cannot be largely decreased accordingly. We propose a new two-stage detector, Light-Head R-CNN, to address the shortcoming in current two-stage approaches. In our design, we make the head of network as light as possible, by using a thin feature map and a cheap R-CNN subnet (pooling and single fully-connected layer). Our ResNet-101 based light-head R-CNN outperforms state-of-art object detectors on COCO while keeping time efficiency. More importantly, simply replacing the backbone with a tiny network (e.g, Xception), our Light-Head R-CNN gets 30.7 mmAP at 102 FPS on COCO, significantly outperforming the single-stage, fast detectors like YOLO and SSD on both speed and accuracy. Code will be made publicly available.



There are no comments yet.


page 8


Rethinking Classification and Localization in R-CNN

Modern R-CNN based detectors share the RoI feature extractor head for bo...

Probabilistic two-stage detection

We develop a probabilistic interpretation of two-stage object detection....

CBNet: A Novel Composite Backbone Network Architecture for Object Detection

In existing CNN based detectors, the backbone network is a very importan...

3DSSD: Point-based 3D Single Stage Object Detector

Currently, there have been many kinds of voxel-based 3D single stage det...

ScratchDet:Exploring to Train Single-Shot Object Detectors from Scratch

Current state-of-the-art object objectors are fine-tuned from the off-th...

SSH: Single Stage Headless Face Detector

We introduce the Single Stage Headless (SSH) face detector. Unlike two s...

An Energy and GPU-Computation Efficient Backbone Network for Real-Time Object Detection

As DenseNet conserves intermediate features with diverse receptive field...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.