MonoDETR: Depth-aware Transformer for Monocular 3D Object Detection

03/24/2022
by   Renrui Zhang, et al.
0

Monocular 3D object detection has long been a challenging task in autonomous driving, which requires to decode 3D predictions solely from a single 2D image. Most existing methods follow conventional 2D object detectors to first localize objects by their centers, and then predict 3D attributes using center-neighboring local features. However, such center-based pipeline views 3D prediction as a subordinate task and lacks inter-object depth interactions with global spatial clues. In this paper, we introduce a simple framework for Monocular DEtection with depth-aware TRansformer, named MonoDETR. We enable the vanilla transformer to be depth-aware and enforce the whole detection process guided by depth. Specifically, we represent 3D object candidates as a set of queries and produce non-local depth embeddings of the input image by a lightweight depth predictor and an attention-based depth encoder. Then, we propose a depth-aware decoder to conduct both inter-query and query-scene depth feature communication. In this way, each object estimates its 3D attributes adaptively from the depth-informative regions on the image, not limited by center-around features. With minimal handcrafted designs, MonoDETR is an end-to-end framework without additional data, anchors or NMS and achieves competitive performance on KITTI benchmark among state-of-the-art center-based networks. Extensive ablation studies demonstrate the effectiveness of our approach and its potential to serve as a transformer baseline for future monocular research. Code is available at https://github.com/ZrrSkywalker/MonoDETR.git.

READ FULL TEXT

page 1

page 2

page 3

page 4

page 9

research
03/21/2022

MonoDTR: Monocular 3D Object Detection with Depth-Aware Transformer

Monocular 3D object detection is an important yet challenging task in au...
research
04/22/2021

FCOS3D: Fully Convolutional One-Stage Monocular 3D Object Detection

Monocular 3D object detection is an important task for autonomous drivin...
research
05/12/2023

SSD-MonoDTR: Supervised Scale-constrained Deformable Transformer for Monocular 3D Object Detection

Transformer-based methods have demonstrated superior performance for mon...
research
03/30/2021

Depth-conditioned Dynamic Message Propagation for Monocular 3D Object Detection

The objective of this paper is to learn context- and depth-aware feature...
research
03/24/2021

M3DSSD: Monocular 3D Single Stage Object Detector

In this paper, we propose a Monocular 3D Single Stage object Detector (M...
research
09/02/2023

S^3-MonoDETR: Supervised Shape Scale-perceptive Deformable Transformer for Monocular 3D Object Detection

Recently, transformer-based methods have shown exceptional performance i...
research
07/21/2022

DEVIANT: Depth EquiVarIAnt NeTwork for Monocular 3D Object Detection

Modern neural networks use building blocks such as convolutions that are...

Please sign up or login with your details

Forgot password? Click here to reset