OneNet: Towards End-to-End One-Stage Object Detection

by   Peize Sun, et al.

End-to-end one-stage object detection trailed thus far. This paper discovers that the lack of classification cost between sample and ground-truth in label assignment is the main obstacle for one-stage detectors to remove Non-maximum Suppression(NMS) and reach end-to-end. Existing one-stage object detectors assign labels by only location cost, e.g. box IoU or point distance. Without classification cost, sole location cost leads to redundant boxes of high confidence scores in inference, making NMS necessary post-processing. To design an end-to-end one-stage object detector, we propose Minimum Cost Assignment. The cost is the summation of classification cost and location cost between sample and ground-truth. For each object ground-truth, only one sample of minimum cost is assigned as the positive sample; others are all negative samples. To evaluate the effectiveness of our method, we design an extremely simple one-stage detector named OneNet. Our results show that when trained with Minimum Cost Assignment, OneNet avoids producing duplicated boxes and achieves to end-to-end detector. On COCO dataset, OneNet achieves 35.0 AP/80 FPS and 37.7 AP/50 FPS with image size of 512 pixels. We hope OneNet could serve as an effective baseline for end-to-end one-stage object detection. The code is available at: <>.


page 6

page 7


Object Detection Made Simpler by Eliminating Heuristic NMS

We show a simple NMS-free, end-to-end object detection framework, of whi...

Improvement of Classification in One-Stage Detector

RetinaNet proposed Focal Loss for classification task and improved one-s...

End-to-End Object Detection with Fully Convolutional Network

Mainstream object detectors based on the fully convolutional network has...

SWA Object Detection

Do you want to improve 1.0 AP for your object detector without any infer...

Hashing-based Non-Maximum Suppression for Crowded Object Detection

In this paper, we propose an algorithm, named hashing-based non-maximum ...

Group DETR: Fast Training Convergence with Decoupled One-to-Many Label Assignment

Detection Transformer (DETR) relies on One-to-One label assignment, i.e....

Seeing without Looking: Contextual Rescoring of Object Detections for AP Maximization

The majority of current object detectors lack context: class predictions...