Take-A-Photo: 3D-to-2D Generative Pre-training of Point Cloud Models

07/27/2023
by   Ziyi Wang, et al.
0

With the overwhelming trend of mask image modeling led by MAE, generative pre-training has shown a remarkable potential to boost the performance of fundamental models in 2D vision. However, in 3D vision, the over-reliance on Transformer-based backbones and the unordered nature of point clouds have restricted the further development of generative pre-training. In this paper, we propose a novel 3D-to-2D generative pre-training method that is adaptable to any point cloud model. We propose to generate view images from different instructed poses via the cross-attention mechanism as the pre-training scheme. Generating view images has more precise supervision than its point cloud counterpart, thus assisting 3D backbones to have a finer comprehension of the geometrical structure and stereoscopic relations of the point cloud. Experimental results have proved the superiority of our proposed 3D-to-2D generative pre-training over previous pre-training methods. Our method is also effective in boosting the performance of architecture-oriented approaches, achieving state-of-the-art performance when fine-tuning on ScanObjectNN classification and ShapeNetPart segmentation tasks. Code is available at https://github.com/wangzy22/TAP.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/18/2023

Joint Representation Learning for Text and 3D Point Cloud

Recent advancements in vision-language pre-training (e.g. CLIP) have sho...
research
10/02/2020

Pre-Training by Completing Point Clouds

There has recently been a flurry of exciting advances in deep learning m...
research
08/03/2023

RegionBLIP: A Unified Multi-modal Pre-training Framework for Holistic and Regional Comprehension

In this work, we investigate extending the comprehension of Multi-modal ...
research
05/16/2023

Generative Table Pre-training Empowers Models for Tabular Prediction

Recently, the topic of table pre-training has attracted considerable res...
research
09/06/2023

Gene-induced Multimodal Pre-training for Image-omic Classification

Histology analysis of the tumor micro-environment integrated with genomi...
research
09/11/2023

CNN or ViT? Revisiting Vision Transformers Through the Lens of Convolution

The success of Vision Transformer (ViT) has been widely reported on a wi...
research
10/03/2022

CLIP2Point: Transfer CLIP to Point Cloud Classification with Image-Depth Pre-training

Pre-training across 3D vision and language remains under development bec...

Please sign up or login with your details

Forgot password? Click here to reset