Neural Implicit Vision-Language Feature Fields

03/20/2023
by   Kenneth Blomqvist, et al.
0

Recently, groundbreaking results have been presented on open-vocabulary semantic image segmentation. Such methods segment each pixel in an image into arbitrary categories provided at run-time in the form of text prompts, as opposed to a fixed set of classes defined at training time. In this work, we present a zero-shot volumetric open-vocabulary semantic scene segmentation method. Our method builds on the insight that we can fuse image features from a vision-language model into a neural implicit representation. We show that the resulting feature field can be segmented into different classes by assigning points to natural language text prompts. The implicit volumetric representation enables us to segment the scene both in 3D and 2D by rendering feature maps from any given viewpoint of the scene. We show that our method works on noisy real-world data and can run in real-time on live sensor data dynamically adjusting to text prompts. We also present quantitative comparisons on the ScanNet dataset.

READ FULL TEXT

page 1

page 4

page 5

research
09/11/2023

Panoptic Vision-Language Feature Fields

Recently, methods have been proposed for 3D open-vocabulary semantic seg...
research
05/21/2023

VL-Fields: Towards Language-Grounded Neural Implicit Spatial Representations

We present Visual-Language Fields (VL-Fields), a neural implicit spatial...
research
09/26/2022

Baking in the Feature: Accelerating Volumetric Segmentation by Rendering Feature Maps

Methods have recently been proposed that densely segment 3D volumes into...
research
10/06/2022

Feature-Realistic Neural Fusion for Real-Time, Open Set Scene Understanding

General scene understanding for robotics requires flexible semantic repr...
research
03/23/2023

Zero-guidance Segmentation Using Zero Segment Labels

CLIP has enabled new and exciting joint vision-language applications, on...
research
11/28/2022

OpenScene: 3D Scene Understanding with Open Vocabularies

Traditional 3D scene understanding approaches rely on labeled 3D dataset...
research
03/08/2023

CROSSFIRE: Camera Relocalization On Self-Supervised Features from an Implicit Representation

Beyond novel view synthesis, Neural Radiance Fields are useful for appli...

Please sign up or login with your details

Forgot password? Click here to reset