S^3-MonoDETR: Supervised Shape Scale-perceptive Deformable Transformer for Monocular 3D Object Detection

09/02/2023
by   Xuan He, et al.
0

Recently, transformer-based methods have shown exceptional performance in monocular 3D object detection, which can predict 3D attributes from a single 2D image. These methods typically use visual and depth representations to generate query points on objects, whose quality plays a decisive role in the detection accuracy. However, current unsupervised attention mechanisms without any geometry appearance awareness in transformers are susceptible to producing noisy features for query points, which severely limits the network performance and also makes the model have a poor ability to detect multi-category objects in a single training process. To tackle this problem, this paper proposes a novel "Supervised Shape Scale-perceptive Deformable Attention" (S^3-DA) module for monocular 3D object detection. Concretely, S^3-DA utilizes visual and depth features to generate diverse local features with various shapes and scales and predict the corresponding matching distribution simultaneously to impose valuable shape scale perception for each query. Benefiting from this, S^3-DA effectively estimates receptive fields for query points belonging to any category, enabling them to generate robust query features. Besides, we propose a Multi-classification-based Shape&Scale Matching (MSM) loss to supervise the above process. Extensive experiments on KITTI and Waymo Open datasets demonstrate that S^3-DA significantly improves the detection accuracy, yielding state-of-the-art performance of single-category and multi-category 3D object detection in a single training process compared to the existing approaches. The source code will be made publicly available at https://github.com/mikasa3lili/S3-MonoDETR.

READ FULL TEXT

page 1

page 3

page 4

page 10

research
05/12/2023

SSD-MonoDTR: Supervised Scale-constrained Deformable Transformer for Monocular 3D Object Detection

Transformer-based methods have demonstrated superior performance for mon...
research
03/24/2022

MonoDETR: Depth-aware Transformer for Monocular 3D Object Detection

Monocular 3D object detection has long been a challenging task in autono...
research
03/03/2023

BSH-Det3D: Improving 3D Object Detection with BEV Shape Heatmap

The progress of LiDAR-based 3D object detection has significantly enhanc...
research
12/03/2022

IDMS: Instance Depth for Multi-scale Monocular 3D Object Detection

Due to the lack of depth information of images and poor detection accura...
research
09/12/2022

CenterFormer: Center-based Transformer for 3D Object Detection

Query-based transformer has shown great potential in constructing long-r...
research
04/11/2022

Category-Aware Transformer Network for Better Human-Object Interaction Detection

Human-Object Interactions (HOI) detection, which aims to localize a huma...
research
07/21/2022

DEVIANT: Depth EquiVarIAnt NeTwork for Monocular 3D Object Detection

Modern neural networks use building blocks such as convolutions that are...

Please sign up or login with your details

Forgot password? Click here to reset