Semantic-Aligned Matching for Enhanced DETR Convergence and Multi-Scale Feature Fusion

07/28/2022
by   Gongjie Zhang, et al.
14

The recently proposed DEtection TRansformer (DETR) has established a fully end-to-end paradigm for object detection. However, DETR suffers from slow training convergence, which hinders its applicability to various detection tasks. We observe that DETR's slow convergence is largely attributed to the difficulty in matching object queries to relevant regions due to the unaligned semantics between object queries and encoded image features. With this observation, we design Semantic-Aligned-Matching DETR++ (SAM-DETR++) to accelerate DETR's convergence and improve detection performance. The core of SAM-DETR++ is a plug-and-play module that projects object queries and encoded image features into the same feature embedding space, where each object query can be easily matched to relevant regions with similar semantics. Besides, SAM-DETR++ searches for multiple representative keypoints and exploits their features for semantic-aligned matching with enhanced representation capacity. Furthermore, SAM-DETR++ can effectively fuse multi-scale features in a coarse-to-fine manner on the basis of the designed semantic-aligned matching. Extensive experiments show that the proposed SAM-DETR++ achieves superior convergence speed and competitive detection accuracy. Additionally, as a plug-and-play method, SAM-DETR++ can complement existing DETR convergence solutions with even better performance, achieving 44.8 training epochs and 49.1 ResNet-50. Codes are available at https://github.com/ZhangGongjie/SAM-DETR .

READ FULL TEXT

page 1

page 4

page 5

page 7

research
03/14/2022

Accelerating DETR Convergence via Semantic-Aligned Matching

The recently developed DEtection TRansformer (DETR) establishes a new ob...
research
08/24/2022

Towards Efficient Use of Multi-Scale Features in Transformer-Based Object Detectors

Multi-scale features have been proven highly effective for object detect...
research
02/13/2023

CFNet: Cascade Fusion Network for Dense Prediction

Multi-scale features are essential for dense prediction tasks, including...
research
03/02/2023

FeatAug-DETR: Enriching One-to-Many Matching for DETRs with Feature Augmentation

One-to-one matching is a crucial design in DETR-like object detection fr...
research
05/08/2022

Unsupervised Homography Estimation with Coplanarity-Aware GAN

Estimating homography from an image pair is a fundamental problem in ima...
research
03/17/2022

Semantic-aligned Fusion Transformer for One-shot Object Detection

One-shot object detection aims at detecting novel objects according to m...
research
08/05/2023

Landmark Detection using Transformer Toward Robot-assisted Nasal Airway Intubation

Robot-assisted airway intubation application needs high accuracy in loca...

Please sign up or login with your details

Forgot password? Click here to reset