DeepAI AI Chat
Log In Sign Up

FQDet: Fast-converging Query-based Detector

by   Cédric Picron, et al.

Recently, two-stage Deformable DETR introduced the query-based two-stage head, a new type of two-stage head different from the region-based two-stage heads of classical detectors as Faster R-CNN. In query-based two-stage heads, the second stage selects one feature per detection, called the query, as opposed to pooling a rectangular grid of features as in region-based detectors. In this work, we further improve the query-based head from Deformable DETR, significantly speeding up the convergence while increasing its performance. This is achieved by incorporating classical techniques such as anchor generation within the query-based paradigm. By combining the best of both the classical and the query-based worlds, our FQDet head peaks at 45.4 AP on the 2017 COCO validation set when using a ResNet-50+TPN backbone, only after training for 12 epochs using the 1x schedule. We outperform other high-performing two-stage heads such as e.g. Cascade R-CNN, while using the same backbone and while often being computationally cheaper. Additionally, when using the large ResNeXt-101-DCN+TPN backbone and multi-scale testing, our FQDet head achieves 52.9 AP on the 2017 COCO test-dev set after only 12 epochs of training. Code will be released.


Light-Head R-CNN: In Defense of Two-Stage Object Detector

In this paper, we first investigate why typical two-stage methods are no...

StageInteractor: Query-based Object Detector with Cross-stage Interaction

Previous object detectors make predictions based on dense grid points or...

Enhanced Training of Query-Based Object Detection via Selective Query Recollection

This paper investigates a phenomenon where query-based object detectors ...

AdaMixer: A Fast-Converging Query-Based Object Detector

Traditional object detectors employ the dense paradigm of scanning over ...

Augmenting Proposals by the Detector Itself

Lacking enough high quality proposals for RoI box head has impeded two-s...

Pixel-Semantic Revise of Position Learning A One-Stage Object Detector with A Shared Encoder-Decoder

We analyze that different methods based channel or position attention me...

You Should Look at All Objects

Feature pyramid network (FPN) is one of the key components for object de...