Parallel Attention: A Unified Framework for Visual Object Discovery through Dialogs and Queries

11/17/2017
by   Bohan Zhuang, et al.
0

Recognising objects according to a pre-defined fixed set of class labels has been well studied in the Computer Vision. There are a great many practical applications where the subjects that may be of interest are not known beforehand, or so easily delineated, however. In many of these cases natural language dialog is a natural way to specify the subject of interest, and the task achieving this capability (a.k.a, Referring Expression Comprehension) has recently attracted attention. To this end we propose a unified framework, the ParalleL AttentioN (PLAN) network, to discover the object in an image that is being referred to in variable length natural expression descriptions, from short phrases query to long multi-round dialogs. The PLAN network has two attention mechanisms that relate parts of the expressions to both the global visual content and also directly to object candidates. Furthermore, the attention mechanisms are recurrent, making the referring process visualizable and explainable. The attended information from these dual sources are combined to reason about the referred object. These two attention mechanisms can be trained in parallel and we find the combined system outperforms the state-of-art on several benchmarked datasets with different length language input, such as RefCOCO, RefCOCO+ and GuessWhat?!.

READ FULL TEXT
research
07/19/2020

Referring Expression Comprehension: A Survey of Methods and Datasets

Referring expression comprehension (REC) aims to localize a target objec...
research
09/18/2019

Dynamic Graph Attention for Referring Expression Comprehension

Referring expression comprehension aims to locate the object instance de...
research
12/30/2016

A Joint Speaker-Listener-Reinforcer Model for Referring Expressions

Referring expressions are natural language constructions used to identif...
research
12/12/2018

Neighbourhood Watch: Referring Expression Comprehension via Language-guided Graph Attention Networks

The task in referring expression comprehension is to localise the object...
research
01/06/2022

A Unified Framework for Attention-Based Few-Shot Object Detection

Few-Shot Object Detection (FSOD) is a rapidly growing field in computer ...
research
10/06/2022

Video Referring Expression Comprehension via Transformer with Content-aware Query

Video Referring Expression Comprehension (REC) aims to localize a target...
research
05/30/2018

Visual Referring Expression Recognition: What Do Systems Actually Learn?

We present an empirical analysis of the state-of-the-art systems for ref...

Please sign up or login with your details

Forgot password? Click here to reset