OpenScene: 3D Scene Understanding with Open Vocabularies

11/28/2022
by   Songyou Peng, et al.
0

Traditional 3D scene understanding approaches rely on labeled 3D datasets to train a model for a single task with supervision. We propose OpenScene, an alternative approach where a model predicts dense features for 3D scene points that are co-embedded with text and image pixels in CLIP feature space. This zero-shot approach enables task-agnostic training and open-vocabulary queries. For example, to perform SOTA zero-shot 3D semantic segmentation it first infers CLIP features for every 3D point and later classifies them based on similarities to embeddings of arbitrary class labels. More interestingly, it enables a suite of open-vocabulary scene understanding applications that have never been done before. For example, it allows a user to enter an arbitrary text query and then see a heat map indicating which parts of a scene match. Our approach is effective at identifying objects, materials, affordances, activities, and room types in complex 3D scenes, all using a single model trained without any labeled 3D data.

READ FULL TEXT

page 15

page 16

page 17

page 19

page 20

page 21

page 22

page 24

research
06/23/2023

OpenMask3D: Open-Vocabulary 3D Instance Segmentation

We introduce the task of open-vocabulary 3D instance segmentation. Tradi...
research
09/21/2023

LLM-Grounder: Open-Vocabulary 3D Visual Grounding with Large Language Model as an Agent

3D visual grounding is a critical skill for household robots, enabling t...
research
03/08/2023

CLIP-FO3D: Learning Free Open-world 3D Scene Representations from 2D Dense CLIP

Training a 3D scene understanding model requires complicated human annot...
research
12/18/2021

Prompt-Based Multi-Modal Image Segmentation

Image segmentation is usually addressed by training a model for a fixed ...
research
06/09/2022

Extracting Zero-shot Common Sense from Large Language Models for Robot 3D Scene Understanding

Semantic 3D scene understanding is a problem of critical importance in r...
research
03/16/2023

LERF: Language Embedded Radiance Fields

Humans describe the physical world using natural language to refer to sp...
research
03/20/2023

Neural Implicit Vision-Language Feature Fields

Recently, groundbreaking results have been presented on open-vocabulary ...

Please sign up or login with your details

Forgot password? Click here to reset