Attention Based Natural Language Grounding by Navigating Virtual Environment

04/23/2018
by   Abhishek Sinha, et al.
0

In this work, we focus on the problem of grounding language by training an agent to follow a set of natural language instructions and navigate to a target object in an environment. The agent receives visual information through raw pixels and a natural language instruction telling what task needs to be achieved. Other than these two sources of information, our model does not have any prior information of both the visual and textual modalities and is end-to-end trainable. We develop an attention mechanism for multi-modal fusion of visual and textual modalities that allows the agent to learn to complete the task and also achieve language grounding. Our experimental results show that our attention mechanism outperforms the existing multi-modal fusion mechanisms proposed for both 2D and 3D environments in order to solve the above mentioned task. We show that the learnt textual representations are semantically meaningful as they follow vector arithmetic and are also consistent enough to induce translation between instructions in different natural languages. We also show that our model generalizes effectively to unseen scenarios and exhibit zero-shot generalization capabilities both in 2D and 3D environments. The code for our 2D environment as well as the models that we developed for both 2D and 3D are available at https://github.com/rl-lang-grounding/rl-lang-groundhttps://github.com/rl-lang-grounding/rl-lang-ground

READ FULL TEXT

page 7

page 8

research
06/22/2017

Gated-Attention Architectures for Task-Oriented Language Grounding

To perform tasks specified by natural language instructions, autonomous ...
research
10/14/2019

Dynamic Attention Networks for Task Oriented Grounding

In order to successfully perform tasks specified by natural language ins...
research
01/19/2021

Grounding Language to Entities and Dynamics for Generalization in Reinforcement Learning

In this paper, we consider the problem of leveraging textual description...
research
10/27/2019

Task-Oriented Language Grounding for Language Input with Multiple Sub-Goals of Non-Linear Order

In this work, we analyze the performance of general deep reinforcement l...
research
03/21/2023

Joint Visual Grounding and Tracking with Natural Language Specification

Tracking by natural language specification aims to locate the referred t...
research
06/02/2019

Are You Looking? Grounding to Multiple Modalities in Vision-and-Language Navigation

Vision-and-Language Navigation (VLN) requires grounding instructions, su...
research
11/27/2019

Perceive, Transform, and Act: Multi-Modal Attention Networks for Vision-and-Language Navigation

Vision-and-Language Navigation (VLN) is a challenging task in which an a...

Please sign up or login with your details

Forgot password? Click here to reset