Meta-Explore: Exploratory Hierarchical Vision-and-Language Navigation Using Scene Object Spectrum Grounding

03/07/2023
by   Minyoung Hwang, et al.
0

The main challenge in vision-and-language navigation (VLN) is how to understand natural-language instructions in an unseen environment. The main limitation of conventional VLN algorithms is that if an action is mistaken, the agent fails to follow the instructions or explores unnecessary regions, leading the agent to an irrecoverable path. To tackle this problem, we propose Meta-Explore, a hierarchical navigation method deploying an exploitation policy to correct misled recent actions. We show that an exploitation policy, which moves the agent toward a well-chosen local goal among unvisited but observable states, outperforms a method which moves the agent to a previously visited state. We also highlight the demand for imagining regretful explorations with semantically meaningful clues. The key to our approach is understanding the object placements around the agent in spectral-domain. Specifically, we present a novel visual representation, called scene object spectrum (SOS), which performs category-wise 2D Fourier transform of detected objects. Combining exploitation policy and SOS features, the agent can correct its path by choosing a promising local goal. We evaluate our method in three VLN benchmarks: R2R, SOON, and REVERIE. Meta-Explore outperforms other baselines and shows significant generalization performance. In addition, local goal search using the proposed spectral-domain SOS features significantly improves the success rate by 17.1

READ FULL TEXT

page 4

page 5

page 12

page 16

page 19

research
11/15/2022

Structured Exploration Through Instruction Enhancement for Object Navigation

Finding an object of a specific class in an unseen environment remains a...
research
09/28/2020

Learning to Stop: A Simple yet Effective Approach to Urban Vision-Language Navigation

Vision-and-Language Navigation (VLN) is a natural language grounding tas...
research
11/07/2020

Sim-to-Real Transfer for Vision-and-Language Navigation

We study the challenging problem of releasing a robot in a previously un...
research
05/14/2021

Towards Navigation by Reasoning over Spatial Configurations

We deal with the navigation problem where the agent follows natural lang...
research
10/18/2022

ULN: Towards Underspecified Vision-and-Language Navigation

Vision-and-Language Navigation (VLN) is a task to guide an embodied agen...
research
02/23/2022

Think Global, Act Local: Dual-scale Graph Transformer for Vision-and-Language Navigation

Following language instructions to navigate in unseen environments is a ...

Please sign up or login with your details

Forgot password? Click here to reset