RelationNet++: Bridging Visual Representations for Object Detection via Transformer Decoder

10/29/2020
by   Cheng Chi, et al.
0

Existing object detection frameworks are usually built on a single format of object/part representation, i.e., anchor/proposal rectangle boxes in RetinaNet and Faster R-CNN, center points in FCOS and RepPoints, and corner points in CornerNet. While these different representations usually drive the frameworks to perform well in different aspects, e.g., better classification or finer localization, it is in general difficult to combine these representations in a single framework to make good use of each strength, due to the heterogeneous or non-grid feature extraction by different representations. This paper presents an attention-based decoder module similar as that in Transformer <cit.> to bridge other representations into a typical object detector built on a single representation format, in an end-to-end fashion. The other representations act as a set of key instances to strengthen the main query representation features in the vanilla detectors. Novel techniques are proposed towards efficient computation of the decoder module, including a key sampling approach and a shared location embedding approach. The proposed module is named bridging visual representations (BVR). It can perform in-place and we demonstrate its broad effectiveness in bridging other representations into prevalent object detection frameworks, including RetinaNet, Faster R-CNN, FCOS and ATSS, where about 1.5∼3.0 AP improvements are achieved. In particular, we improve a state-of-the-art framework with a strong backbone by about 2.0 AP, reaching 52.7 AP on COCO test-dev. The resulting network is named RelationNet++. The code will be available at https://github.com/microsoft/RelationNet2.

READ FULL TEXT

page 2

page 3

page 5

research
04/06/2022

Unleashing Vanilla Vision Transformer with Masked Image Modeling for Object Detection

We present an approach to efficiently and effectively adapt a masked ima...
research
11/18/2020

End-to-End Object Detection with Adaptive Clustering Transformer

End-to-end Object Detection with Transformer (DETR)proposes to perform o...
research
07/27/2020

Corner Proposal Network for Anchor-free, Two-stage Object Detection

The goal of object detection is to determine the class and location of o...
research
06/15/2021

Dynamic Head: Unifying Object Detection Heads with Attentions

The complex nature of combining localization and classification in objec...
research
07/05/2020

HoughNet: Integrating near and long-range evidence for bottom-up object detection

This paper presents HoughNet, a one-stage, anchor-free, voting-based, bo...
research
03/20/2020

Detection in Crowded Scenes: One Proposal, Multiple Predictions

We propose a simple yet effective proposal-based object detector, aiming...
research
06/13/2019

Grid R-CNN Plus: Faster and Better

Grid R-CNN is a well-performed objection detection framework. It transfo...

Please sign up or login with your details

Forgot password? Click here to reset