CLIP-Fields: Weakly Supervised Semantic Fields for Robotic Memory

We propose CLIP-Fields, an implicit scene model that can be trained with no direct human supervision. This model learns a mapping from spatial locations to semantic embedding vectors. The mapping can then be used for a variety of tasks, such as segmentation, instance identification, semantic search over space, and view localization. Most importantly, the mapping can be trained with supervision coming only from web-image and web-text trained models such as CLIP, Detic, and Sentence-BERT. When compared to baselines like Mask-RCNN, our method outperforms on few-shot instance identification or semantic segmentation on the HM3D dataset with only a fraction of the examples. Finally, we show that using CLIP-Fields as a scene memory, robots can perform semantic navigation in real-world environments. Our code and demonstrations are available here: https://mahis.life/clip-fields/

READ FULL TEXT

page 1

page 5

page 6

research
05/21/2023

VL-Fields: Towards Language-Grounded Neural Implicit Spatial Representations

We present Visual-Language Fields (VL-Fields), a neural implicit spatial...
research
01/02/2017

Weakly Supervised Semantic Segmentation using Web-Crawled Videos

We propose a novel algorithm for weakly supervised semantic segmentation...
research
03/28/2018

Learning Pixel-level Semantic Affinity with Image-level Supervision for Weakly Supervised Semantic Segmentation

The deficiency of segmentation labels is one of the main obstacles to se...
research
04/10/2023

Agronav: Autonomous Navigation Framework for Agricultural Robots and Vehicles using Semantic Segmentation and Semantic Line Detection

The successful implementation of vision-based navigation in agricultural...
research
11/25/2021

NeSF: Neural Semantic Fields for Generalizable Semantic Segmentation of 3D Scenes

We present NeSF, a method for producing 3D semantic fields from posed RG...
research
03/06/2023

Nerflets: Local Radiance Fields for Efficient Structure-Aware 3D Scene Representation from 2D Supervision

We address efficient and structure-aware 3D scene representation from im...
research
10/15/2019

Explainable Semantic Mapping for First Responders

One of the key challenges in the semantic mapping problem in postdisaste...

Please sign up or login with your details

Forgot password? Click here to reset