ASAP-NMS: Accelerating Non-Maximum Suppression Using Spatially Aware Priors

by   Rohun Tripathi, et al.

The widely adopted sequential variant of Non Maximum Suppression (or Greedy-NMS) is a crucial module for object-detection pipelines. Unfortunately, for the region proposal stage of two/multi-stage detectors, NMS is turning out to be a latency bottleneck due to its sequential nature. In this article, we carefully profile Greedy-NMS iterations to find that a major chunk of computation is wasted in comparing proposals that are already far-away and have a small chance of suppressing each other. We address this issue by comparing only those proposals that are generated from nearby anchors. The translation-invariant property of the anchor lattice affords generation of a lookup table, which provides an efficient access to nearby proposals, during NMS. This leads to an Accelerated NMS algorithm which leverages Spatially Aware Priors, or ASAP-NMS, and improves the latency of the NMS step from 13.6ms to 1.2 ms on a CPU without sacrificing the accuracy of a state-of-the-art two-stage detector on COCO and VOC datasets. Importantly, ASAP-NMS is agnostic to image resolution and can be used as a simple drop-in module during inference. Using ASAP-NMS at run-time only, we obtain an mAP of 44.2%@25Hz on the COCO dataset with a V100 GPU.


Probabilistic two-stage detection

We develop a probabilistic interpretation of two-stage object detection....

Toward Scale-Invariance and Position-Sensitive Region Proposal Networks

Accurately localising object proposals is an important precondition for ...

Exploring Structure-aware Transformer over Interaction Proposals for Human-Object Interaction Detection

Recent high-performing Human-Object Interaction (HOI) detection techniqu...

Region Proposal by Guided Anchoring

Region anchors are the cornerstone of modern object detection techniques...

Revisiting Feature Alignment for One-stage Object Detection

Recently, one-stage object detectors gain much attention due to their si...

VL-NMS: Breaking Proposal Bottlenecks in Two-Stage Visual-Language Matching

The prevailing framework for matching multimodal inputs is based on a tw...

Croesus: Multi-Stage Processing and Transactions for Video-Analytics in Edge-Cloud Systems

Emerging edge applications require both a fast response latency and comp...