Object Ordering with Bidirectional Matchings for Visual Reasoning

04/18/2018
by   Hao Tan, et al.
0

Visual reasoning with compositional natural language instructions, e.g., based on the newly-released Cornell Natural Language Visual Reasoning (NLVR) dataset, is a challenging task, where the model needs to have the ability to create an accurate mapping between the diverse phrases and the several objects placed in complex arrangements in the image. Further, this mapping needs to be processed to answer the question in the statement given the ordering and relationship of the objects across three similar images. In this paper, we propose a novel end-to-end neural model for the NLVR task, where we first use joint bidirectional attention to build a two-way conditioning between the visual information and the language phrases. Next, we use an RL-based pointer network to sort and process the varying number of unordered objects (so as to match the order of the statement phrases) in each of the three images and then pool over the three decisions. Our model achieves strong improvements (of 4-6 absolute) over the state-of-the-art on both the structured representation and raw image versions of the dataset.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/01/2018

A Corpus for Reasoning About Natural Language Grounded in Photographs

We introduce a new dataset for joint reasoning about language and vision...
research
11/28/2016

Learning a Natural Language Interface with Neural Programmer

Learning a natural language interface for database tables is a challengi...
research
01/29/2018

Object-based reasoning in VQA

Visual Question Answering (VQA) is a novel problem domain where multi-mo...
research
06/05/2019

Learning to Compose and Reason with Language Tree Structures for Visual Grounding

Grounding natural language in images, such as localizing "the black dog ...
research
04/12/2017

Discriminative Bimodal Networks for Visual Localization and Detection with Natural Language Queries

Associating image regions with text queries has been recently explored a...
research
05/19/2021

VSGM – Enhance robot task understanding ability through visual semantic graph

In recent years, developing AI for robotics has raised much attention. T...
research
07/10/2017

Learning Visual Reasoning Without Strong Priors

Achieving artificial visual reasoning - the ability to answer image-rela...

Please sign up or login with your details

Forgot password? Click here to reset