Visual Language Maps for Robot Navigation

10/11/2022
by   Chenguang Huang, et al.
0

Grounding language to the visual observations of a navigating agent can be performed using off-the-shelf visual-language models pretrained on Internet-scale data (e.g., image captions). While this is useful for matching images to natural language descriptions of object goals, it remains disjoint from the process of mapping the environment, so that it lacks the spatial precision of classic geometric maps. To address this problem, we propose VLMaps, a spatial map representation that directly fuses pretrained visual-language features with a 3D reconstruction of the physical world. VLMaps can be autonomously built from video feed on robots using standard exploration approaches and enables natural language indexing of the map without additional labeled data. Specifically, when combined with large language models (LLMs), VLMaps can be used to (i) translate natural language commands into a sequence of open-vocabulary navigation goals (which, beyond prior work, can be spatial by construction, e.g., "in between the sofa and TV" or "three meters to the right of the chair") directly localized in the map, and (ii) can be shared among multiple robots with different embodiments to generate new obstacle maps on-the-fly (by using a list of obstacle categories). Extensive experiments carried out in simulated and real world environments show that VLMaps enable navigation according to more complex language instructions than existing methods. Videos are available at https://vlmaps.github.io.

READ FULL TEXT

page 1

page 2

page 5

page 6

page 11

research
03/13/2023

Audio Visual Language Maps for Robot Navigation

While interacting in the world is a multi-sensory experience, many robot...
research
07/21/2023

CARTIER: Cartographic lAnguage Reasoning Targeted at Instruction Execution for Robots

This work explores the capacity of large language models (LLMs) to addre...
research
08/17/2023

Language-enhanced RNR-Map: Querying Renderable Neural Radiance Field maps with natural language

We present Le-RNR-Map, a Language-enhanced Renderable Neural Radiance ma...
research
12/15/2014

Translating Videos to Natural Language Using Deep Recurrent Neural Networks

Solving the visual symbol grounding problem has long been a goal of arti...
research
06/15/2023

Tell Me Where to Go: A Composable Framework for Context-Aware Embodied Robot Navigation

Humans have the remarkable ability to navigate through unfamiliar enviro...
research
05/21/2023

Instance-Level Semantic Maps for Vision Language Navigation

Humans have a natural ability to perform semantic associations with the ...
research
02/14/2023

ConceptFusion: Open-set Multimodal 3D Mapping

Building 3D maps of the environment is central to robot navigation, plan...

Please sign up or login with your details

Forgot password? Click here to reset