M^2-3DLaneNet: Multi-Modal 3D Lane Detection

09/13/2022
by   Yueru Luo, et al.
0

Estimating accurate lane lines in 3D space remains challenging due to their sparse and slim nature. In this work, we propose the M^2-3DLaneNet, a Multi-Modal framework for effective 3D lane detection. Aiming at integrating complementary information from multi-sensors, M^2-3DLaneNet first extracts multi-modal features with modal-specific backbones, then fuses them in a unified Bird's-Eye View (BEV) space. Specifically, our method consists of two core components. 1) To achieve accurate 2D-3D mapping, we propose the top-down BEV generation. Within it, a Line-Restricted Deform-Attention (LRDA) module is utilized to effectively enhance image features in a top-down manner, fully capturing the slenderness features of lanes. After that, it casts the 2D pyramidal features into 3D space using depth-aware lifting and generates BEV features through pillarization. 2) We further propose the bottom-up BEV fusion, which aggregates multi-modal features through multi-scale cascaded attention, integrating complementary information from camera and LiDAR sensors. Sufficient experiments demonstrate the effectiveness of M^2-3DLaneNet, which outperforms previous state-of-the-art methods by a large margin, i.e., 12.1 improvement on OpenLane dataset.

READ FULL TEXT

page 3

page 4

page 6

page 7

research
09/11/2023

FusionFormer: A Multi-sensory Fusion in Bird's-Eye-View and Temporal Consistent Transformer for 3D Objection

Multi-sensor modal fusion has demonstrated strong advantages in 3D objec...
research
09/03/2022

MMKGR: Multi-hop Multi-modal Knowledge Graph Reasoning

Multi-modal knowledge graphs (MKGs) include not only the relation triple...
research
09/16/2021

M2RNet: Multi-modal and Multi-scale Refined Network for RGB-D Salient Object Detection

Salient object detection is a fundamental topic in computer vision. Prev...
research
11/22/2022

LiCamGait: Gait Recognition in the Wild by Using LiDAR and Camera Multi-modal Visual Sensors

LiDAR can capture accurate depth information in large-scale scenarios wi...
research
08/08/2023

LATR: 3D Lane Detection from Monocular Images with Transformer

3D lane detection from monocular images is a fundamental yet challenging...
research
01/06/2023

Anchor3DLane: Learning to Regress 3D Anchors for Monocular 3D Lane Detection

Monocular 3D lane detection is a challenging task due to its lack of dep...
research
08/23/2023

NPF-200: A Multi-Modal Eye Fixation Dataset and Method for Non-Photorealistic Videos

Non-photorealistic videos are in demand with the wave of the metaverse, ...

Please sign up or login with your details

Forgot password? Click here to reset