Frozen CLIP Model is An Efficient Point Cloud Backbone

12/08/2022
by   Xiaoshui Huang, et al.
0

The pretraining-finetuning paradigm has demonstrated great success in NLP and 2D image fields because of the high-quality representation ability and transferability of their pretrained models. However, pretraining such a strong model is difficult in the 3D point cloud field since the training data is limited and point cloud collection is expensive. This paper introduces Efficient Point Cloud Learning (EPCL), an effective and efficient point cloud learner for directly training high-quality point cloud models with a frozen CLIP model. Our EPCL connects the 2D and 3D modalities by semantically aligning the 2D features and point cloud features without paired 2D-3D data. Specifically, the input point cloud is divided into a sequence of tokens and directly fed into the frozen CLIP model to learn point cloud representation. Furthermore, we design a task token to narrow the gap between 2D images and 3D point clouds. Comprehensive experiments on 3D detection, semantic segmentation, classification and few-shot learning demonstrate that the 2D CLIP model can be an efficient point cloud backbone and our method achieves state-of-the-art accuracy on both real-world and synthetic downstream tasks. Code will be available.

READ FULL TEXT

page 7

page 8

research
03/21/2022

Masked Discrimination for Self-Supervised Learning on Point Clouds

Masked autoencoding has achieved great success for self-supervised learn...
research
07/10/2019

SynthCity: A large scale synthetic point cloud

With deep learning becoming a more prominent approach for automatic clas...
research
06/15/2023

Segment Any Point Cloud Sequences by Distilling Vision Foundation Models

Recent advancements in vision foundation models (VFMs) have opened up ne...
research
03/31/2023

A Closer Look at Few-Shot 3D Point Cloud Classification

In recent years, research on few-shot learning (FSL) has been fast-growi...
research
04/24/2023

Grad-PU: Arbitrary-Scale Point Cloud Upsampling via Gradient Descent with Learned Distance Functions

Most existing point cloud upsampling methods have roughly three steps: f...
research
03/22/2023

CLIP^2: Contrastive Language-Image-Point Pretraining from Real-World Point Cloud Data

Contrastive Language-Image Pre-training, benefiting from large-scale unl...
research
05/09/2017

Deep Projective 3D Semantic Segmentation

Semantic segmentation of 3D point clouds is a challenging problem with n...

Please sign up or login with your details

Forgot password? Click here to reset