POS-BERT: Point Cloud One-Stage BERT Pre-Training

04/03/2022
by   Kexue Fu, et al.
0

Recently, the pre-training paradigm combining Transformer and masked language modeling has achieved tremendous success in NLP, images, and point clouds, such as BERT. However, directly extending BERT from NLP to point clouds requires training a fixed discrete Variational AutoEncoder (dVAE) before pre-training, which results in a complex two-stage method called Point-BERT. Inspired by BERT and MoCo, we propose POS-BERT, a one-stage BERT pre-training method for point clouds. Specifically, we use the mask patch modeling (MPM) task to perform point cloud pre-training, which aims to recover masked patches information under the supervision of the corresponding tokenizer output. Unlike Point-BERT, its tokenizer is extra-trained and frozen. We propose to use the dynamically updated momentum encoder as the tokenizer, which is updated and outputs the dynamic supervision signal along with the training process. Further, in order to learn high-level semantic representation, we combine contrastive learning to maximize the class token consistency between different transformation point clouds. Extensive experiments have demonstrated that POS-BERT can extract high-quality pre-training features and promote downstream tasks to improve performance. Using the pre-training model without any fine-tuning to extract features and train linear SVM on ModelNet40, POS-BERT achieves the state-of-the-art classification accuracy, which exceeds Point-BERT by 3.5%. In addition, our approach has significantly improved many downstream tasks, such as fine-tuned classification, few-shot classification, part segmentation. The code and trained-models will be available at: <https://github.com/fukexue/POS-BERT>.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/29/2021

Point-BERT: Pre-training 3D Point Cloud Transformers with Masked Point Modeling

We present Point-BERT, a new paradigm for learning Transformers to gener...
research
05/19/2023

PointGPT: Auto-regressively Generative Pre-training from Point Clouds

Large language models (LLMs) based on the generative pre-training transf...
research
01/09/2023

Designing BERT for Convolutional Networks: Sparse and Hierarchical Masked Modeling

We identify and overcome two key obstacles in extending the success of B...
research
02/10/2023

BEST: BERT Pre-Training for Sign Language Recognition with Coupling Tokenization

In this work, we are dedicated to leveraging the BERT pre-training succe...
research
03/29/2022

mc-BEiT: Multi-choice Discretization for Image BERT Pre-training

Image BERT pre-training with masked image modeling (MIM) becomes a popul...
research
08/11/2022

PointTree: Transformation-Robust Point Cloud Encoder with Relaxed K-D Trees

Being able to learn an effective semantic representation directly on raw...
research
07/28/2023

VPP: Efficient Conditional 3D Generation via Voxel-Point Progressive Representation

Conditional 3D generation is undergoing a significant advancement, enabl...

Please sign up or login with your details

Forgot password? Click here to reset