SSD-MonoDTR: Supervised Scale-constrained Deformable Transformer for Monocular 3D Object Detection

05/12/2023
by   Xuan He, et al.
0

Transformer-based methods have demonstrated superior performance for monocular 3D object detection recently, which predicts 3D attributes from a single 2D image. Most existing transformer-based methods leverage visual and depth representations to explore valuable query points on objects, and the quality of the learned queries has a great impact on detection accuracy. Unfortunately, existing unsupervised attention mechanisms in transformer are prone to generate low-quality query features due to inaccurate receptive fields, especially on hard objects. To tackle this problem, this paper proposes a novel “Supervised Scale-constrained Deformable Attention” (SSDA) for monocular 3D object detection. Specifically, SSDA presets several masks with different scales and utilizes depth and visual features to predict the local feature for each query. Imposing the scale constraint, SSDA could well predict the accurate receptive field of a query to support robust query feature generation. What is more, SSDA is assigned with a Weighted Scale Matching (WSM) loss to supervise scale prediction, which presents more confident results as compared to the unsupervised attention mechanisms. Extensive experiments on “KITTI” demonstrate that SSDA significantly improves the detection accuracy especially on moderate and hard objects, yielding SOTA performance as compared to the existing approaches. Code will be publicly available at https://github.com/mikasa3lili/SSD-MonoDETR.

READ FULL TEXT

page 1

page 3

page 8

research
09/02/2023

S^3-MonoDETR: Supervised Shape Scale-perceptive Deformable Transformer for Monocular 3D Object Detection

Recently, transformer-based methods have shown exceptional performance i...
research
03/24/2022

MonoDETR: Depth-aware Transformer for Monocular 3D Object Detection

Monocular 3D object detection has long been a challenging task in autono...
research
09/12/2022

CenterFormer: Center-based Transformer for 3D Object Detection

Query-based transformer has shown great potential in constructing long-r...
research
03/24/2021

M3DSSD: Monocular 3D Single Stage Object Detector

In this paper, we propose a Monocular 3D Single Stage object Detector (M...
research
03/23/2023

MoGDE: Boosting Mobile Monocular 3D Object Detection with Ground Depth Estimation

Monocular 3D object detection (Mono3D) in mobile settings (e.g., on a ve...
research
09/27/2022

CrossDTR: Cross-view and Depth-guided Transformers for 3D Object Detection

To achieve accurate 3D object detection at a low cost for autonomous dri...
research
03/14/2022

Accelerating DETR Convergence via Semantic-Aligned Matching

The recently developed DEtection TRansformer (DETR) establishes a new ob...

Please sign up or login with your details

Forgot password? Click here to reset