BEV-MAE: Bird's Eye View Masked Autoencoders for Outdoor Point Cloud Pre-training

12/12/2022
by   Zhiwei Lin, et al.
0

Current outdoor LiDAR-based 3D object detection methods mainly adopt the training-from-scratch paradigm. Unfortunately, this paradigm heavily relies on large-scale labeled data, whose collection can be expensive and time-consuming. Self-supervised pre-training is an effective and desirable way to alleviate this dependence on extensive annotated data. Recently, masked modeling has become a successful self-supervised learning approach for point clouds. However, current works mainly focus on synthetic or indoor datasets. When applied to large-scale and sparse outdoor point clouds, they fail to yield satisfactory results. In this work, we present BEV-MAE, a simple masked autoencoder pre-training framework for 3D object detection on outdoor point clouds. Specifically, we first propose a bird's eye view (BEV) guided masking strategy to guide the 3D encoder learning feature representation in a BEV perspective and avoid complex decoder design during pre-training. Besides, we introduce a learnable point token to maintain a consistent receptive field size of the 3D encoder with fine-tuning for masked point cloud inputs. Finally, based on the property of outdoor point clouds, i.e., the point clouds of distant objects are more sparse, we propose point density prediction to enable the 3D encoder to learn location information, which is essential for object detection. Experimental results show that BEV-MAE achieves new state-of-the-art self-supervised results on both Waymo and nuScenes with diverse 3D object detectors. Furthermore, with only 20 pre-training, BEV-MAE achieves comparable performance with the state-of-the-art method ProposalContrast. The source code and pre-trained models will be made publicly available.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/01/2021

Point Cloud Pre-training by Mixing and Disentangling

The annotation for large-scale point clouds is still time-consuming and ...
research
12/14/2022

MAELi – Masked Autoencoder for Large-Scale LiDAR Point Clouds

We show how the inherent, but often neglected, properties of large-scale...
research
09/12/2023

SCP: Scene Completion Pre-training for 3D Object Detection

3D object detection using LiDAR point clouds is a fundamental task in th...
research
12/29/2022

Self-Supervised Pre-training for 3D Point Clouds via View-Specific Point-to-Image Translation

The past few years have witnessed the prevalence of self-supervised repr...
research
01/21/2023

Slice Transformer and Self-supervised Learning for 6DoF Localization in 3D Point Cloud Maps

Precise localization is critical for autonomous vehicles. We present a s...
research
07/28/2023

Point Clouds Are Specialized Images: A Knowledge Transfer Approach for 3D Understanding

Self-supervised representation learning (SSRL) has gained increasing att...
research
12/06/2022

GD-MAE: Generative Decoder for MAE Pre-training on LiDAR Point Clouds

Despite the tremendous progress of Masked Autoencoders (MAE) in developi...

Please sign up or login with your details

Forgot password? Click here to reset