Bridging the Gap Between Learning in Discrete and Continuous Environments for Vision-and-Language Navigation

03/05/2022
by   Yicong Hong, et al.
0

Most existing works in vision-and-language navigation (VLN) focus on either discrete or continuous environments, training agents that cannot generalize across the two. The fundamental difference between the two setups is that discrete navigation assumes prior knowledge of the connectivity graph of the environment, so that the agent can effectively transfer the problem of navigation with low-level controls to jumping from node to node with high-level actions by grounding to an image of a navigable direction. To bridge the discrete-to-continuous gap, we propose a predictor to generate a set of candidate waypoints during navigation, so that agents designed with high-level actions can be transferred to and trained in continuous environments. We refine the connectivity graph of Matterport3D to fit the continuous Habitat-Matterport3D, and train the waypoints predictor with the refined graphs to produce accessible waypoints at each time step. Moreover, we demonstrate that the predicted waypoints can be augmented during training to diversify the views and paths, and therefore enhance agent's generalization ability. Through extensive experiments we show that agents navigating in continuous environments with predicted waypoints perform significantly better than agents using low-level actions, which reduces the absolute discrete-to-continuous gap by 11.76 and 18.24 imitation learning objective, outperform previous methods by a large margin, achieving new state-of-the-art results on the testing environments of the R2R-CE and the RxR-CE datasets.

READ FULL TEXT

page 1

page 6

page 17

page 18

page 19

page 20

research
04/20/2022

Sim-2-Sim Transfer for Vision-and-Language Navigation in Continuous Environments

Recent work in Vision-and-Language Navigation (VLN) has presented two en...
research
04/06/2020

Beyond the Nav-Graph: Vision-and-Language Navigation in Continuous Environments

We develop a language-guided navigation task set in a continuous 3D envi...
research
04/22/2021

Hierarchical Cross-Modal Agent for Robotics Vision-and-Language Navigation

Deep Learning has revolutionized our ability to solve complex problems s...
research
07/05/2019

Embodied Vision-and-Language Navigation with Dynamic Convolutional Filters

In Vision-and-Language Navigation (VLN), an embodied agent needs to reac...
research
10/05/2021

Waypoint Models for Instruction-guided Navigation in Continuous Environments

Little inquiry has explicitly addressed the role of action spaces in lan...
research
05/10/2020

BabyWalk: Going Farther in Vision-and-Language Navigation by Taking Baby Steps

Learning to follow instructions is of fundamental importance to autonomo...
research
11/10/2018

Mapping Navigation Instructions to Continuous Control Actions with Position-Visitation Prediction

We propose an approach for mapping natural language instructions and raw...

Please sign up or login with your details

Forgot password? Click here to reset