AD-PT: Autonomous Driving Pre-Training with Large-scale Point Cloud Dataset

by   Jiakang Yuan, et al.

It is a long-term vision for Autonomous Driving (AD) community that the perception models can learn from a large-scale point cloud dataset, to obtain unified representations that can achieve promising results on different tasks or benchmarks. Previous works mainly focus on the self-supervised pre-training pipeline, meaning that they perform the pre-training and fine-tuning on the same benchmark, which is difficult to attain the performance scalability and cross-dataset application for the pre-training checkpoint. In this paper, for the first time, we are committed to building a large-scale pre-training point-cloud dataset with diverse data distribution, and meanwhile learning generalizable representations from such a diverse pre-training dataset. We formulate the point-cloud pre-training task as a semi-supervised problem, which leverages the few-shot labeled and massive unlabeled point-cloud data to generate the unified backbone representations that can be directly applied to many baseline models and benchmarks, decoupling the AD-related pre-training process and downstream fine-tuning task. During the period of backbone pre-training, by enhancing the scene- and instance-level distribution diversity and exploiting the backbone's ability to learn from unknown instances, we achieve significant performance gains on a series of downstream perception benchmarks including Waymo, nuScenes, and KITTI, under different baseline models like PV-RCNN++, SECOND, CenterPoint.


SPOT: Scalable 3D Pre-training via Occupancy Prediction for Autonomous Driving

Annotating 3D LiDAR point clouds for perception tasks including 3D objec...

PointContrast: Unsupervised Pre-training for 3D Point Cloud Understanding

Arguably one of the top success stories of deep learning is transfer lea...

Self-Supervised Pre-training of 3D Point Cloud Networks with Image Data

Reducing the quantity of annotations required for supervised training is...

UniWorld: Autonomous Driving Pre-training via World Models

In this paper, we draw inspiration from Alberto Elfes' pioneering work i...

3D Object Detection with a Self-supervised Lidar Scene Flow Backbone

State-of-the-art 3D detection methods rely on supervised learning and la...

MV-JAR: Masked Voxel Jigsaw and Reconstruction for LiDAR-Based Self-Supervised Pre-Training

This paper introduces the Masked Voxel Jigsaw and Reconstruction (MV-JAR...

Occ-BEV: Multi-Camera Unified Pre-training via 3D Scene Reconstruction

Multi-camera 3D perception has emerged as a prominent research field in ...

Please sign up or login with your details

Forgot password? Click here to reset