M3DeTR: Multi-representation, Multi-scale, Mutual-relation 3D Object Detection with Transformers

by   Tianrui Guan, et al.

We present a novel architecture for 3D object detection, M3DeTR, which combines different point cloud representations (raw, voxels, bird-eye view) with different feature scales based on multi-scale feature pyramids. M3DeTR is the first approach that unifies multiple point cloud representations, feature scales, as well as models mutual relationships between point clouds simultaneously using transformers. We perform extensive ablation experiments that highlight the benefits of fusing representation and scale, and modeling the relationships. Our method achieves state-of-the-art performance on the KITTI 3D object detection dataset and Waymo Open Dataset. Results show that M3DeTR improves the baseline significantly by 1.48 Waymo Open Dataset. In particular, our approach ranks 1st on the well-known KITTI 3D Detection Benchmark for both car and cyclist classes, and ranks 1st on Waymo Open Dataset with single frame point cloud input.



There are no comments yet.


page 5

page 9


DV-Det: Efficient 3D Point Cloud Object Detection with Dynamic Voxelization

In this work, we propose a novel two-stage framework for the efficient 3...

3D Object Recognition with Ensemble Learning --- A Study of Point Cloud-Based Deep Learning Models

In this study, we present an analysis of model-based ensemble learning f...

Pillar in Pillar: Multi-Scale and Dynamic Feature Extraction for 3D Object Detection in Point Clouds

Sparsity and varied density are two of the main obstacles for 3D detecti...

Associate-3Ddet: Perceptual-to-Conceptual Association for 3D Point Cloud Object Detection

Object detection from 3D point clouds remains a challenging task, though...

Multi-Modality Cut and Paste for 3D Object Detection

Three-dimensional (3D) object detection is essential in autonomous drivi...

MPPNet: Multi-Frame Feature Intertwining with Proxy Points for 3D Temporal Object Detection

Accurate and reliable 3D detection is vital for many applications includ...

PiFeNet: Pillar-Feature Network for Real-Time 3D Pedestrian Detection from Point Cloud

We present PiFeNet, an efficient and accurate real-time 3D detector for ...

Code Repositories


Code base for M3DeTR: Multi-representation, Multi-scale, Mutual-relation 3D Object Detection with Transformers

view repo
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.