Using Syntax to Ground Referring Expressions in Natural Images

05/26/2018
by   Volkan Cirik, et al.
0

We introduce GroundNet, a neural network for referring expression recognition -- the task of localizing (or grounding) in an image the object referred to by a natural language expression. Our approach to this task is the first to rely on a syntactic analysis of the input referring expression in order to inform the structure of the computation graph. Given a parse tree for an input expression, we explicitly map the syntactic constituents and relationships present in the tree to a composed graph of neural modules that defines our architecture for performing localization. This syntax-based approach aids localization of both the target object and auxiliary supporting objects mentioned in the expression. As a result, GroundNet is more interpretable than previous methods: we can (1) determine which phrase of the referring expression points to which object in the image and (2) track how the localization of the target object is determined by the network. We study this property empirically by introducing a new set of annotations on the GoogleRef dataset to evaluate localization of supporting objects. Our experiments show that GroundNet achieves state-of-the-art accuracy in identifying supporting objects, while maintaining comparable performance in the localization of target objects.

READ FULL TEXT

page 2

page 4

page 7

research
08/01/2016

Modeling Context Between Objects for Referring Expression Understanding

Referring expressions usually describe an object using properties of the...
research
06/11/2018

Interactive Visual Grounding of Referring Expressions for Human-Robot Interaction

This paper presents INGRESS, a robot system that follows human natural l...
research
05/30/2018

Visual Referring Expression Recognition: What Do Systems Actually Learn?

We present an empirical analysis of the state-of-the-art systems for ref...
research
12/18/2019

ScanRefer: 3D Object Localization in RGB-D Scans using Natural Language

We introduce the new task of 3D object localization in RGB-D scans using...
research
11/29/2018

Towards Human-Friendly Referring Expression Generation

This paper addresses the generation of referring expressions that not on...
research
03/31/2022

FindIt: Generalized Localization with Natural Language Queries

We propose FindIt, a simple and versatile framework that unifies a varie...
research
06/11/2021

Team RUC_AIM3 Technical Report at ActivityNet 2021: Entities Object Localization

Entities Object Localization (EOL) aims to evaluate how grounded or fait...

Please sign up or login with your details

Forgot password? Click here to reset