ESC: Exploration with Soft Commonsense Constraints for Zero-shot Object Navigation

01/30/2023
by   Kaiwen Zhou, et al.
0

The ability to accurately locate and navigate to a specific object is a crucial capability for embodied agents that operate in the real world and interact with objects to complete tasks. Such object navigation tasks usually require large-scale training in visual environments with labeled objects, which generalizes poorly to novel objects in unknown environments. In this work, we present a novel zero-shot object navigation method, Exploration with Soft Commonsense constraints (ESC), that transfers commonsense knowledge in pre-trained models to open-world object navigation without any navigation experience nor any other training on the visual environments. First, ESC leverages a pre-trained vision and language model for open-world prompt-based grounding and a pre-trained commonsense language model for room and object reasoning. Then ESC converts commonsense knowledge into navigation actions by modeling it as soft logic predicates for efficient exploration. Extensive experiments on MP3D, HM3D, and RoboTHOR benchmarks show that our ESC method improves significantly over baselines, and achieves new state-of-the-art results for zero-shot object navigation (e.g., 225% relative Success Rate improvement than CoW on MP3D).

READ FULL TEXT

page 1

page 3

page 14

research
03/06/2023

Can an Embodied Agent Find Your "Cat-shaped Mug"? LLM-Based Zero-Shot Object Navigation

We present LGX, a novel algorithm for Object Goal Navigation in a "langu...
research
09/19/2023

Bridging Zero-shot Object Navigation and Foundation Models through Pixel-Guided Navigation Skill

Zero-shot object navigation is a challenging task for home-assistance ro...
research
05/22/2022

Housekeep: Tidying Virtual Households using Commonsense Reasoning

We introduce Housekeep, a benchmark to evaluate commonsense reasoning in...
research
03/20/2022

CLIP on Wheels: Zero-Shot Object Navigation as Object Localization and Exploration

Households across the world contain arbitrary objects: from mate gourds ...
research
02/03/2023

Multiple Thinking Achieving Meta-Ability Decoupling for Object Navigation

We propose a meta-ability decoupling (MAD) paradigm, which brings togeth...
research
03/02/2023

Open-World Object Manipulation using Pre-trained Vision-Language Models

For robots to follow instructions from people, they must be able to conn...
research
05/26/2023

Discovering Novel Actions in an Open World with Object-Grounded Visual Commonsense Reasoning

Learning to infer labels in an open world, i.e., in an environment where...

Please sign up or login with your details

Forgot password? Click here to reset