Guiding Query Position and Performing Similar Attention for Transformer-Based Detection Heads

08/22/2021
by   Xiaohu Jiang, et al.
0

After DETR was proposed, this novel transformer-based detection paradigm which performs several cross-attentions between object queries and feature maps for predictions has subsequently derived a series of transformer-based detection heads. These models iterate object queries after each cross-attention. However, they don't renew the query position which indicates object queries' position information. Thus model needs extra learning to figure out the newest regions that query position should express and need more attention. To fix this issue, we propose the Guided Query Position (GQPos) method to embed the latest location information of object queries to query position iteratively. Another problem of such transformer-based detection heads is the high complexity to perform attention on multi-scale feature maps, which hinders them from improving detection performance at all scales. Therefore we propose a novel fusion scheme named Similar Attention (SiA): besides the feature maps is fused, SiA also fuse the attention weights maps to accelerate the learning of high-resolution attention weight map by well-learned low-resolution attention weight map. Our experiments show that the proposed GQPos improves the performance of a series of models, including DETR, SMCA, YoloS, and HoiTransformer and SiA consistently improve the performance of multi-scale transformer-based detection heads like DETR and HoiTransformer.

READ FULL TEXT
research
08/24/2022

Towards Efficient Use of Multi-Scale Features in Transformer-Based Object Detectors

Multi-scale features have been proven highly effective for object detect...
research
02/28/2023

Enhancing Classification with Hierarchical Scalable Query on Fusion Transformer

Real-world vision based applications require fine-grained classification...
research
01/31/2023

Priors are Powerful: Improving a Transformer for Multi-camera 3D Detection with 2D Priors

Transfomer-based approaches advance the recent development of multi-came...
research
10/08/2020

Improving Attention Mechanism with Query-Value Interaction

Attention mechanism has played critical roles in various state-of-the-ar...
research
05/27/2021

When Liebig's Barrel Meets Facial Landmark Detection: A Practical Model

In recent years, significant progress has been made in the research of f...
research
08/13/2021

CNN-based Two-Stage Parking Slot Detection Using Region-Specific Multi-Scale Feature Extraction

Autonomous parking systems start with the detection of available parking...
research
01/20/2023

Asynchronously Trained Distributed Topographic Maps

Topographic feature maps are low dimensional representations of data, th...

Please sign up or login with your details

Forgot password? Click here to reset