M^2BEV: Multi-Camera Joint 3D Detection and Segmentation with Unified Birds-Eye View Representation

04/11/2022
by   Enze Xie, et al.
17

In this paper, we propose M^2BEV, a unified framework that jointly performs 3D object detection and map segmentation in the Birds Eye View (BEV) space with multi-camera image inputs. Unlike the majority of previous works which separately process detection and segmentation, M^2BEV infers both tasks with a unified model and improves efficiency. M^2BEV efficiently transforms multi-view 2D image features into the 3D BEV feature in ego-car coordinates. Such BEV representation is important as it enables different tasks to share a single encoder. Our framework further contains four important designs that benefit both accuracy and efficiency: (1) An efficient BEV encoder design that reduces the spatial dimension of a voxel feature map. (2) A dynamic box assignment strategy that uses learning-to-match to assign ground-truth 3D boxes with anchors. (3) A BEV centerness re-weighting that reinforces with larger weights for more distant predictions, and (4) Large-scale 2D detection pre-training and auxiliary supervision. We show that these designs significantly benefit the ill-posed camera-based 3D perception tasks where depth information is missing. M^2BEV is memory efficient, allowing significantly higher resolution images as input, with faster inference speed. Experiments on nuScenes show that M^2BEV achieves state-of-the-art results in both 3D object detection and BEV segmentation, with the best single model achieving 42.5 mAP and 57.0 mIoU in these two tasks, respectively.

READ FULL TEXT

page 1

page 2

page 5

page 8

page 13

page 16

page 17

page 18

research
06/02/2022

PETRv2: A Unified Framework for 3D Perception from Multi-Camera Images

In this paper, we propose PETRv2, a unified framework for 3D perception ...
research
05/19/2022

BEVerse: Unified Perception and Prediction in Birds-Eye-View for Vision-Centric Autonomous Driving

In this paper, we present BEVerse, a unified framework for 3D perception...
research
11/19/2022

MatrixVT: Efficient Multi-Camera to BEV Transformation for 3D Perception

This paper proposes an efficient multi-camera to Bird's-Eye-View (BEV) v...
research
07/05/2022

CoBEVT: Cooperative Bird's Eye View Semantic Segmentation with Sparse Transformers

Bird's eye view (BEV) semantic segmentation plays a crucial role in spat...
research
01/13/2023

OA-BEV: Bringing Object Awareness to Bird's-Eye-View Representation for Multi-Camera 3D Object Detection

The recent trend for multi-camera 3D object detection is through the uni...
research
03/18/2021

RangeDet:In Defense of Range View for LiDAR-based 3D Object Detection

In this paper, we propose an anchor-free single-stage LiDAR-based 3D obj...
research
07/25/2023

HeightFormer: Explicit Height Modeling without Extra Data for Camera-only 3D Object Detection in Bird's Eye View

Vision-based Bird's Eye View (BEV) representation is an emerging percept...

Please sign up or login with your details

Forgot password? Click here to reset