PartSLIP: Low-Shot Part Segmentation for 3D Point Clouds via Pretrained Image-Language Models

12/03/2022
by   Minghua Liu, et al.
0

Generalizable 3D part segmentation is important but challenging in vision and robotics. Training deep models via conventional supervised methods requires large-scale 3D datasets with fine-grained part annotations, which are costly to collect. This paper explores an alternative way for low-shot part segmentation of 3D point clouds by leveraging a pretrained image-language model, GLIP, which achieves superior performance on open-vocabulary 2D detection. We transfer the rich knowledge from 2D to 3D through GLIP-based part detection on point cloud rendering and a novel 2D-to-3D label lifting algorithm. We also utilize multi-view 3D priors and few-shot prompt tuning to boost performance significantly. Extensive evaluation on PartNet and PartNet-Mobility datasets shows that our method enables excellent zero-shot 3D part segmentation. Our few-shot version not only outperforms existing few-shot approaches by a large margin but also achieves highly competitive results compared to the fully supervised counterpart. Furthermore, we demonstrate that our method can be directly applied to iPhone-scanned point clouds without significant domain gaps.

READ FULL TEXT

page 1

page 3

page 8

page 10

page 11

page 12

page 13

page 14

research
10/14/2022

LESS: Label-Efficient Semantic Segmentation for LiDAR Point Clouds

Semantic segmentation of LiDAR point clouds is an important task in auto...
research
03/04/2023

Open-Vocabulary Affordance Detection in 3D Point Clouds

Affordance detection is a challenging problem with a wide variety of rob...
research
06/25/2021

"Zero Shot" Point Cloud Upsampling

Point cloud upsampling using deep learning has been paid various efforts...
research
11/21/2022

PointCLIP V2: Adapting CLIP for Powerful 3D Open-world Learning

Contrastive Language-Image Pre-training (CLIP) has shown promising open-...
research
07/20/2023

See More and Know More: Zero-shot Point Cloud Segmentation via Multi-modal Visual Data

Zero-shot point cloud segmentation aims to make deep models capable of r...
research
05/18/2023

OpenShape: Scaling Up 3D Shape Representation Towards Open-World Understanding

We introduce OpenShape, a method for learning multi-modal joint represen...
research
08/01/2023

Detecting Cloud Presence in Satellite Images Using the RGB-based CLIP Vision-Language Model

This work explores capabilities of the pre-trained CLIP vision-language ...

Please sign up or login with your details

Forgot password? Click here to reset