Robust and Interpretable Grounding of Spatial References with Relation Networks

05/02/2020
by   Tsung-Yen Yang, et al.
0

Handling spatial references in natural language is a key challenge in tasks like autonomous navigation and robotic manipulation. Recent work has investigated various neural architectures for learning multi-modal representations of spatial concepts that generalize well across a variety of observations and text instructions. In this work, we develop accurate models for understanding spatial references in text that are also robust and interpretable. We design a text-conditioned relation network whose parameters are dynamically computed with a cross-modal attention module to capture fine-grained spatial relations between entities. Our experiments across three different prediction tasks demonstrate the effectiveness of our model compared to existing state-of-the-art systems. Our model is robust to both observational and instructional noise, and lends itself to easy interpretation through visualization of intermediate outputs.

READ FULL TEXT

page 1

page 7

page 8

page 13

research
11/17/2022

Language Conditioned Spatial Relation Reasoning for 3D Object Grounding

Localizing objects in 3D scenes based on natural language requires under...
research
07/13/2017

Representation Learning for Grounded Spatial Reasoning

The interpretation of spatial references is highly contextual, requiring...
research
06/15/2023

Contrasting Intra-Modal and Ranking Cross-Modal Hard Negatives to Enhance Visio-Linguistic Fine-grained Understanding

Current Vision and Language Models (VLMs) demonstrate strong performance...
research
12/23/2018

Multi-modal Learning with Prior Visual Relation Reasoning

Visual relation reasoning is a central component in recent cross-modal a...
research
10/14/2019

Dynamic Attention Networks for Task Oriented Grounding

In order to successfully perform tasks specified by natural language ins...
research
06/17/2019

ParNet: Position-aware Aggregated Relation Network for Image-Text matching

Exploring fine-grained relationship between entities(e.g. objects in ima...
research
09/18/2021

ReaSCAN: Compositional Reasoning in Language Grounding

The ability to compositionally map language to referents, relations, and...

Please sign up or login with your details

Forgot password? Click here to reset