IoU-Enhanced Attention for End-to-End Task Specific Object Detection

09/21/2022
by   Jing Zhao, et al.
1

Without densely tiled anchor boxes or grid points in the image, sparse R-CNN achieves promising results through a set of object queries and proposal boxes updated in the cascaded training manner. However, due to the sparse nature and the one-to-one relation between the query and its attending region, it heavily depends on the self attention, which is usually inaccurate in the early training stage. Moreover, in a scene of dense objects, the object query interacts with many irrelevant ones, reducing its uniqueness and harming the performance. This paper proposes to use IoU between different boxes as a prior for the value routing in self attention. The original attention matrix multiplies the same size matrix computed from the IoU of proposal boxes, and they determine the routing scheme so that the irrelevant features can be suppressed. Furthermore, to accurately extract features for both classification and regression, we add two lightweight projection heads to provide the dynamic channel masks based on object query, and they multiply with the output from dynamic convs, making the results suitable for the two different tasks. We validate the proposed scheme on different datasets, including MS-COCO and CrowdHuman, showing that it significantly improves the performance and increases the model convergence speed.

READ FULL TEXT
research
11/16/2018

DeRPN: Taking a further step toward more general object detection

Most current detection methods have adopted anchor boxes as regression r...
research
05/04/2022

Dynamic Sparse R-CNN

Sparse R-CNN is a recent strong object detection baseline by set predict...
research
07/19/2018

Deep Adaptive Proposal Network for Object Detection in Optical Remote Sensing Images

Object detection is a fundamental and challenging problem in aerial and ...
research
11/25/2021

BoxeR: Box-Attention for 2D and 3D Transformers

In this paper, we propose a simple attention mechanism, we call Box-Atte...
research
08/15/2023

Learning Image Deraining Transformer Network with Dynamic Dual Self-Attention

Recently, Transformer-based architecture has been introduced into single...
research
12/24/2019

Dense RepPoints: Representing Visual Objects with Dense Point Sets

We present an object representation, called Dense RepPoints, for flexibl...
research
02/01/2017

Evolving Boxes for Fast Vehicle Detection

We perform fast vehicle detection from traffic surveillance cameras. A n...

Please sign up or login with your details

Forgot password? Click here to reset