UniM^2AE: Multi-modal Masked Autoencoders with Unified 3D Representation for 3D Perception in Autonomous Driving

08/21/2023
by   Jian Zou, et al.
0

Masked Autoencoders (MAE) play a pivotal role in learning potent representations, delivering outstanding results across various 3D perception tasks essential for autonomous driving. In real-world driving scenarios, it's commonplace to deploy multiple sensors for comprehensive environment perception. While integrating multi-modal features from these sensors can produce rich and powerful features, there is a noticeable gap in MAE methods addressing this integration. This research delves into multi-modal Masked Autoencoders tailored for a unified representation space in autonomous driving, aiming to pioneer a more efficient fusion of two distinct modalities. To intricately marry the semantics inherent in images with the geometric intricacies of LiDAR point clouds, the UniM^2AE is proposed. This model stands as a potent yet straightforward, multi-modal self-supervised pre-training framework, mainly consisting of two designs. First, it projects the features from both modalities into a cohesive 3D volume space, ingeniously expanded from the bird's eye view (BEV) to include the height dimension. The extension makes it possible to back-project the informative features, obtained by fusing features from both modalities, into their native modalities to reconstruct the multiple masked inputs. Second, the Multi-modal 3D Interactive Module (MMIM) is invoked to facilitate the efficient inter-modal interaction during the interaction process. Extensive experiments conducted on the nuScenes Dataset attest to the efficacy of UniM^2AE, indicating enhancements in 3D object detection and BEV map segmentation by 1.2%(NDS) and 6.5% (mIoU), respectively. Code is available at https://github.com/hollow-503/UniM2AE.

READ FULL TEXT

page 1

page 3

page 9

research
08/15/2023

UniTR: A Unified and Efficient Multi-Modal Transformer for Bird's-Eye-View Representation

Jointly processing information from multiple sensors is crucial to achie...
research
05/12/2023

Multi-Modal 3D Object Detection by Box Matching

Multi-modal 3D object detection has received growing attention as the in...
research
07/28/2022

Safety-Enhanced Autonomous Driving Using Interpretable Sensor Fusion Transformer

Large-scale deployment of autonomous vehicles has been continually delay...
research
04/19/2023

MetaBEV: Solving Sensor Failures for BEV Detection and Map Segmentation

Perception systems in modern autonomous driving vehicles typically take ...
research
06/19/2023

UniG3D: A Unified 3D Object Generation Dataset

The field of generative AI has a transformative impact on various areas,...
research
09/15/2023

MOSAIC: Learning Unified Multi-Sensory Object Property Representations for Robot Perception

A holistic understanding of object properties across diverse sensory mod...
research
09/18/2020

Multi-modal Experts Network for Autonomous Driving

End-to-end learning from sensory data has shown promising results in aut...

Please sign up or login with your details

Forgot password? Click here to reset