Embodied Vision-and-Language Navigation with Dynamic Convolutional Filters

07/05/2019
by   Federico Landi, et al.
3

In Vision-and-Language Navigation (VLN), an embodied agent needs to reach a target destination with the only guidance of a natural language instruction. To explore the environment and progress towards the target location, the agent must perform a series of low-level actions, such as rotate, before stepping ahead. In this paper, we propose to exploit dynamic convolutional filters to encode the visual information and the lingual description in an efficient way. Differently from some previous works that abstract from the agent perspective and use high-level navigation spaces, we design a policy which decodes the information provided by dynamic convolution into a series of low-level, agent friendly actions. Results show that our model exploiting dynamic filters performs better than other architectures with traditional convolution, being the new state of the art for embodied VLN in the low-level action space. Additionally, we attempt to categorize recent work on VLN depending on their architectural choices and distinguish two main groups: we call them low-level actions and high-level actions models. To the best of our knowledge, we are the first to propose this analysis and categorization for VLN.

READ FULL TEXT

page 1

page 2

page 3

page 9

research
04/20/2022

Sim-2-Sim Transfer for Vision-and-Language Navigation in Continuous Environments

Recent work in Vision-and-Language Navigation (VLN) has presented two en...
research
11/07/2020

Sim-to-Real Transfer for Vision-and-Language Navigation

We study the challenging problem of releasing a robot in a previously un...
research
10/13/2021

Feudal Reinforcement Learning by Reading Manuals

Reading to act is a prevalent but challenging task which requires the ab...
research
11/27/2019

Perceive, Transform, and Act: Multi-Modal Attention Networks for Vision-and-Language Navigation

Vision-and-Language Navigation (VLN) is a challenging task in which an a...
research
03/05/2022

Bridging the Gap Between Learning in Discrete and Continuous Environments for Vision-and-Language Navigation

Most existing works in vision-and-language navigation (VLN) focus on eit...
research
10/05/2021

Waypoint Models for Instruction-guided Navigation in Continuous Environments

Little inquiry has explicitly addressed the role of action spaces in lan...
research
12/31/2022

Action Codes

We provide a new perspective on the problem how high-level state machine...

Please sign up or login with your details

Forgot password? Click here to reset