Evaluating Large-Vocabulary Object Detectors: The Devil is in the Details

02/01/2021
by   Achal Dave, et al.
6

By design, average precision (AP) for object detection aims to treat all classes independently: AP is computed independently per category and averaged. On the one hand, this is desirable as it treats all classes, rare to frequent, equally. On the other hand, it ignores cross-category confidence calibration, a key property in real-world use cases. Unfortunately, we find that on imbalanced, large-vocabulary datasets, the default implementation of AP is neither category independent, nor does it directly reward properly calibrated detectors. In fact, we show that the default implementation produces a gameable metric, where a simple, nonsensical re-ranking policy can improve AP by a large margin. To address these limitations, we introduce two complementary metrics. First, we present a simple fix to the default AP implementation, ensuring that it is truly independent across categories as originally intended. We benchmark recent advances in large-vocabulary detection and find that many reported gains do not translate to improvements under our new per-class independent evaluation, suggesting recent improvements may arise from difficult to interpret changes to cross-category rankings. Given the importance of reliably benchmarking cross-category rankings, we consider a pooled version of AP (AP-pool) that rewards properly calibrated detectors by directly comparing cross-category rankings. Finally, we revisit classical approaches for calibration and find that explicitly calibrating detectors improves state-of-the-art on AP-pool by 1.7 points.

READ FULL TEXT

page 1

page 5

page 8

research
09/06/2020

Smooth-AP: Smoothing the Path Towards Large-Scale Image Retrieval

Optimising a ranking-based metric, such as Average Precision (AP), is no...
research
11/12/2019

Equalization Loss for Large Vocabulary Instance Segmentation

Recent object detection and instance segmentation tasks mainly focus on ...
research
12/15/2020

Equalization Loss v2: A New Gradient Balance Approach for Long-tailed Object Detection

Recently proposed decoupled training methods emerge as a dominant paradi...
research
08/18/2019

A Delay Metric for Video Object Detection: What Average Precision Fails to Tell

Average precision (AP) is a widely used metric to evaluate detection acc...
research
04/12/2019

Towards Accurate One-Stage Object Detection with AP-Loss

One-stage object detectors are trained by optimizing classification-loss...
research
12/27/2019

Seeing without Looking: Contextual Rescoring of Object Detections for AP Maximization

The majority of current object detectors lack context: class predictions...
research
05/12/2022

Infrared Invisible Clothing:Hiding from Infrared Detectors at Multiple Angles in Real World

Thermal infrared imaging is widely used in body temperature measurement,...

Please sign up or login with your details

Forgot password? Click here to reset