Lowis3D: Language-Driven Open-World Instance-Level 3D Scene Understanding

08/01/2023
by   Runyu Ding, et al.
0

Open-world instance-level scene understanding aims to locate and recognize unseen object categories that are not present in the annotated dataset. This task is challenging because the model needs to both localize novel 3D objects and infer their semantic categories. A key factor for the recent progress in 2D open-world perception is the availability of large-scale image-text pairs from the Internet, which cover a wide range of vocabulary concepts. However, this success is hard to replicate in 3D scenarios due to the scarcity of 3D-text pairs. To address this challenge, we propose to harness pre-trained vision-language (VL) foundation models that encode extensive knowledge from image-text pairs to generate captions for multi-view images of 3D scenes. This allows us to establish explicit associations between 3D shapes and semantic-rich captions. Moreover, to enhance the fine-grained visual-semantic representation learning from captions for object-level categorization, we design hierarchical point-caption association methods to learn semantic-aware embeddings that exploit the 3D geometry between 3D points and multi-view images. In addition, to tackle the localization challenge for novel classes in the open-world setting, we develop debiased instance localization, which involves training object grouping modules on unlabeled data using instance-level pseudo supervision. This significantly improves the generalization capabilities of instance grouping and thus the ability to accurately locate novel objects. We conduct extensive experiments on 3D semantic, instance, and panoptic segmentation tasks, covering indoor and outdoor scenes across three datasets. Our method outperforms baseline methods by a significant margin in semantic segmentation (e.g. 34.5 instance segmentation (e.g. 21.8 14.7

READ FULL TEXT

page 2

page 4

page 6

page 12

page 13

page 17

page 18

research
11/29/2022

Language-driven Open-Vocabulary 3D Scene Understanding

Open-vocabulary scene understanding aims to localize and recognize unsee...
research
04/03/2023

RegionPLC: Regional Point-Language Contrastive Learning for Open-World 3D Scene Understanding

Existing 3D scene understanding tasks have achieved high performance on ...
research
03/29/2023

Mask-free OVIS: Open-Vocabulary Instance Segmentation without Manual Mask Annotations

Existing instance segmentation models learn task-specific information us...
research
05/18/2023

Going Denser with Open-Vocabulary Part Segmentation

Object detection has been expanded from a limited number of categories t...
research
01/02/2023

Betrayed by Captions: Joint Caption Grounding and Generation for Open Vocabulary Instance Segmentation

In this work, we focus on instance-level open vocabulary segmentation, i...
research
03/23/2023

TAPS3D: Text-Guided 3D Textured Shape Generation from Pseudo Supervision

In this paper, we investigate an open research task of generating contro...
research
04/18/2019

Knowledge-rich Image Gist Understanding Beyond Literal Meaning

We investigate the problem of understanding the message (gist) conveyed ...

Please sign up or login with your details

Forgot password? Click here to reset