Occ-BEV: Multi-Camera Unified Pre-training via 3D Scene Reconstruction

05/30/2023
by   Chen Min, et al.
0

Multi-camera 3D perception has emerged as a prominent research field in autonomous driving, offering a viable and cost-effective alternative to LiDAR-based solutions. However, existing multi-camera algorithms primarily rely on monocular image pre-training, which overlooks the spatial and temporal correlations among different camera views. To address this limitation, we propose the first multi-camera unified pre-training framework called Occ-BEV, which involves initially reconstructing the 3D scene as the foundational stage and subsequently fine-tuning the model on downstream tasks. Specifically, a 3D decoder is designed for leveraging Bird's Eye View (BEV) features from multi-view images to predict the 3D geometric occupancy to enable the model to capture a more comprehensive understanding of the 3D environment. A significant benefit of Occ-BEV is its capability of utilizing a considerable volume of unlabeled image-LiDAR pairs for pre-training purposes. The proposed multi-camera unified pre-training framework demonstrates promising results in key tasks such as multi-camera 3D object detection and surrounding semantic scene completion. When compared to monocular pre-training methods on the nuScenes dataset, Occ-BEV shows a significant improvement of about 2.0 and 2.0 in mIoU for surrounding semantic scene completion. Codes are publicly available at https://github.com/chaytonmin/Occ-BEV.

READ FULL TEXT
research
08/14/2023

UniWorld: Autonomous Driving Pre-training via World Models

In this paper, we draw inspiration from Alberto Elfes' pioneering work i...
research
06/16/2023

PanoOcc: Unified Occupancy Representation for Camera-based 3D Panoptic Segmentation

Comprehensive modeling of the surrounding 3D world is key to the success...
research
09/24/2021

Bringing Generalization to Deep Multi-view Detection

Multi-view Detection (MVD) is highly effective for occlusion reasoning a...
research
03/23/2023

MV-JAR: Masked Voxel Jigsaw and Reconstruction for LiDAR-Based Self-Supervised Pre-Training

This paper introduces the Masked Voxel Jigsaw and Reconstruction (MV-JAR...
research
06/01/2023

AD-PT: Autonomous Driving Pre-Training with Large-scale Point Cloud Dataset

It is a long-term vision for Autonomous Driving (AD) community that the ...
research
03/16/2023

SurroundOcc: Multi-Camera 3D Occupancy Prediction for Autonomous Driving

3D scene understanding plays a vital role in vision-based autonomous dri...
research
05/03/2022

In Defense of Image Pre-Training for Spatiotemporal Recognition

Image pre-training, the current de-facto paradigm for a wide range of vi...

Please sign up or login with your details

Forgot password? Click here to reset