Can an Embodied Agent Find Your "Cat-shaped Mug"? LLM-Based Zero-Shot Object Navigation

03/06/2023
by   Vishnu Sashank Dorbala, et al.
0

We present LGX, a novel algorithm for Object Goal Navigation in a "language-driven, zero-shot manner", where an embodied agent navigates to an arbitrarily described target object in a previously unexplored environment. Our approach leverages the capabilities of Large Language Models (LLMs) for making navigational decisions by mapping the LLMs implicit knowledge about the semantic context of the environment into sequential inputs for robot motion planning. Simultaneously, we also conduct generalized target object detection using a pre-trained Vision-Language grounding model. We achieve state-of-the-art zero-shot object navigation results on RoboTHOR with a success rate (SR) improvement of over 27 on Wheels (OWL CoW). Furthermore, we study the usage of LLMs for robot navigation and present an analysis of the various semantic factors affecting model output. Finally, we showcase the benefits of our approach via real-world experiments that indicate the superior performance of LGX when navigating to and detecting visually unique objects.

READ FULL TEXT

page 1

page 3

page 4

page 6

page 7

research
11/30/2022

CLIP-Nav: Using CLIP for Zero-Shot Vision-and-Language Navigation

Household environments are visually diverse. Embodied agents performing ...
research
01/30/2023

ESC: Exploration with Soft Commonsense Constraints for Zero-shot Object Navigation

The ability to accurately locate and navigate to a specific object is a ...
research
06/24/2022

ZSON: Zero-Shot Object-Goal Navigation using Multimodal Goal Embeddings

We present a scalable approach for learning open-world object-goal navig...
research
09/20/2023

Discuss Before Moving: Visual Language Navigation via Multi-expert Discussions

Visual language navigation (VLN) is an embodied task demanding a wide ra...
research
02/03/2023

Multiple Thinking Achieving Meta-Ability Decoupling for Object Navigation

We propose a meta-ability decoupling (MAD) paradigm, which brings togeth...
research
04/11/2023

L3MVN: Leveraging Large Language Models for Visual Target Navigation

Visual target navigation in unknown environments is a crucial problem in...
research
03/28/2017

A Deep Compositional Framework for Human-like Language Acquisition in Virtual Environment

We tackle a task where an agent learns to navigate in a 2D maze-like env...

Please sign up or login with your details

Forgot password? Click here to reset