Geometric-aware Pretraining for Vision-centric 3D Object Detection

04/06/2023
by   Linyan Huang, et al.
0

Multi-camera 3D object detection for autonomous driving is a challenging problem that has garnered notable attention from both academia and industry. An obstacle encountered in vision-based techniques involves the precise extraction of geometry-conscious features from RGB images. Recent approaches have utilized geometric-aware image backbones pretrained on depth-relevant tasks to acquire spatial information. However, these approaches overlook the critical aspect of view transformation, resulting in inadequate performance due to the misalignment of spatial knowledge between the image backbone and view transformation. To address this issue, we propose a novel geometric-aware pretraining framework called GAPretrain. Our approach incorporates spatial and structural cues to camera networks by employing the geometric-rich modality as guidance during the pretraining phase. The transference of modal-specific attributes across different modalities is non-trivial, but we bridge this gap by using a unified bird's-eye-view (BEV) representation and structural hints derived from LiDAR point clouds to facilitate the pretraining process. GAPretrain serves as a plug-and-play solution that can be flexibly applied to multiple state-of-the-art detectors. Our experiments demonstrate the effectiveness and generalization ability of the proposed method. We achieve 46.2 mAP and 55.5 NDS on the nuScenes val set using the BEVFormer method, with a gain of 2.7 and 2.1 points, respectively. We also conduct experiments on various image backbones and view transformations to validate the efficacy of our approach. Code will be released at https://github.com/OpenDriveLab/BEVPerception-Survey-Recipe.

READ FULL TEXT
research
09/22/2022

FusionRCNN: LiDAR-Camera Fusion for Two-stage 3D Object Detection

3D object detection with multi-sensors is essential for an accurate and ...
research
03/30/2023

BEVFusion4D: Learning LiDAR-Camera Fusion Under Bird's-Eye-View via Cross-Modality Guidance and Temporal Aggregation

Integrating LiDAR and Camera information into Bird's-Eye-View (BEV) has ...
research
05/19/2022

BEVerse: Unified Perception and Prediction in Birds-Eye-View for Vision-Centric Autonomous Driving

In this paper, we present BEVerse, a unified framework for 3D perception...
research
12/28/2022

TiG-BEV: Multi-view BEV 3D Object Detection via Target Inner-Geometry Learning

To achieve accurate and low-cost 3D object detection, existing methods p...
research
04/27/2023

SparseFusion: Fusing Multi-Modal Sparse Representations for Multi-Sensor 3D Object Detection

By identifying four important components of existing LiDAR-camera 3D obj...
research
03/18/2022

VISTA: Boosting 3D Object Detection via Dual Cross-VIew SpaTial Attention

Detecting objects from LiDAR point clouds is of tremendous significance ...
research
08/28/2023

PanoSwin: a Pano-style Swin Transformer for Panorama Understanding

In panorama understanding, the widely used equirectangular projection (E...

Please sign up or login with your details

Forgot password? Click here to reset