P2P: Tuning Pre-trained Image Models for Point Cloud Analysis with Point-to-Pixel Prompting

by   Ziyi Wang, et al.

Nowadays, pre-training big models on large-scale datasets has become a crucial topic in deep learning. The pre-trained models with high representation ability and transferability achieve a great success and dominate many downstream tasks in natural language processing and 2D vision. However, it is non-trivial to promote such a pretraining-tuning paradigm to the 3D vision, given the limited training data that are relatively inconvenient to collect. In this paper, we provide a new perspective of leveraging pre-trained 2D knowledge in 3D domain to tackle this problem, tuning pre-trained image models with the novel Point-to-Pixel prompting for point cloud analysis at a minor parameter cost. Following the principle of prompting engineering, we transform point clouds into colorful images with geometry-preserved projection and geometry-aware coloring to adapt to pre-trained image models, whose weights are kept frozen during the end-to-end optimization of point cloud analysis tasks. We conduct extensive experiments to demonstrate that cooperating with our proposed Point-to-Pixel Prompting, better pre-trained image model will lead to consistently better performance in 3D vision. Enjoying prosperous development from image pre-training field, our method attains 89.3 setting of ScanObjectNN, surpassing conventional point cloud models with much fewer trainable parameters. Our framework also exhibits very competitive performance on ModelNet classification and ShapeNet Part Segmentation. Code is available at https://github.com/wangzy22/P2P


Self-Supervised Learning with Multi-View Rendering for 3D Point Cloud Analysis

Recently, great progress has been made in 3D deep learning with the emer...

3D Point Cloud Pre-training with Knowledge Distillation from 2D Images

The recent success of pre-trained 2D vision models is mostly attributabl...

SimIPU: Simple 2D Image and 3D Point Cloud Unsupervised Pre-Training for Spatial-Aware Visual Representations

Pre-training has become a standard paradigm in many computer vision task...

Pre-Training by Completing Point Clouds

There has recently been a flurry of exciting advances in deep learning m...

RegionBLIP: A Unified Multi-modal Pre-training Framework for Holistic and Regional Comprehension

In this work, we investigate extending the comprehension of Multi-modal ...

PCR-CG: Point Cloud Registration via Deep Color and Geometry

In this paper, we introduce PCR-CG: a novel 3D point cloud registration ...

Point Clouds Are Specialized Images: A Knowledge Transfer Approach for 3D Understanding

Self-supervised representation learning (SSRL) has gained increasing att...

Please sign up or login with your details

Forgot password? Click here to reset