OpenIns3D: Snap and Lookup for 3D Open-vocabulary Instance Segmentation

09/01/2023
by   Zhening Huang, et al.
0

Current 3D open-vocabulary scene understanding methods mostly utilize well-aligned 2D images as the bridge to learn 3D features with language. However, applying these approaches becomes challenging in scenarios where 2D images are absent. In this work, we introduce a completely new pipeline, namely, OpenIns3D, which requires no 2D image inputs, for 3D open-vocabulary scene understanding at the instance level. The OpenIns3D framework employs a "Mask-Snap-Lookup" scheme. The "Mask" module learns class-agnostic mask proposals in 3D point clouds. The "Snap" module generates synthetic scene-level images at multiple scales and leverages 2D vision language models to extract interesting objects. The "Lookup" module searches through the outcomes of "Snap" with the help of Mask2Pixel maps, which contain the precise correspondence between 3D masks and synthetic images, to assign category names to the proposed masks. This 2D input-free, easy-to-train, and flexible approach achieved state-of-the-art results on a wide range of indoor and outdoor datasets with a large margin. Furthermore, OpenIns3D allows for effortless switching of 2D detectors without re-training. When integrated with state-of-the-art 2D open-world models such as ODISE and GroundingDINO, superb results are observed on open-vocabulary instance segmentation. When integrated with LLM-powered 2D models like LISA, it demonstrates a remarkable capacity to process highly complex text queries, including those that require intricate reasoning and world knowledge. Project page: https://zheninghuang.github.io/OpenIns3D/

READ FULL TEXT

page 14

page 17

page 19

page 20

page 21

page 22

page 23

page 24

research
06/23/2023

OpenMask3D: Open-Vocabulary 3D Instance Segmentation

We introduce the task of open-vocabulary 3D instance segmentation. Tradi...
research
03/29/2023

Mask-free OVIS: Open-Vocabulary Instance Segmentation without Manual Mask Annotations

Existing instance segmentation models learn task-specific information us...
research
08/18/2022

Open-Vocabulary Panoptic Segmentation with MaskCLIP

In this paper, we tackle a new computer vision task, open-vocabulary pan...
research
04/10/2023

Instance Neural Radiance Field

This paper presents one of the first learning-based NeRF 3D instance seg...
research
03/16/2023

LERF: Language Embedded Radiance Fields

Humans describe the physical world using natural language to refer to sp...
research
07/23/2022

Semantic Abstraction: Open-World 3D Scene Understanding from 2D Vision-Language Models

We study open-world 3D scene understanding, a family of tasks that requi...
research
11/12/2018

Learning Segmentation Masks with the Independence Prior

An instance with a bad mask might make a composite image that uses it lo...

Please sign up or login with your details

Forgot password? Click here to reset