Dual Attention Networks for Visual Reference Resolution in Visual Dialog

02/25/2019
by   Gi-Cheon Kang, et al.
0

Visual dialog (VisDial) is a task which requires an AI agent to answer a series of questions grounded in an image. Unlike in visual question answering (VQA), the series of questions should be able to capture a temporal context from a dialog history and exploit visually-grounded information. A problem called visual reference resolution involves these challenges, requiring the agent to resolve ambiguous references in a given question and find the references in a given image. In this paper, we propose Dual Attention Networks (DAN) for visual reference resolution. DAN consists of two kinds of attention networks, REFER and FIND. Specifically, REFER module learns latent relationships between a given question and a dialog history by employing a self-attention mechanism. FIND module takes image features and reference-aware representations (i.e., the output of REFER module) as input, and performs visual grounding via bottom-up attention mechanism. We qualitatively and quantitatively evaluate our model on VisDial v1.0 and v0.9 datasets, showing that DAN outperforms the previous state-of-the-art model by a significant margin (2.0

READ FULL TEXT

page 2

page 6

research
12/06/2018

Recursive Visual Attention in Visual Dialog

Visual dialog is a challenging vision-language task, which requires the ...
research
09/06/2018

Visual Coreference Resolution in Visual Dialog using Neural Module Networks

Visual dialog entails answering a series of questions grounded in an ima...
research
09/23/2017

Visual Reference Resolution using Attention Memory for Visual Dialog

Visual dialog is a task of answering a series of inter-dependent questio...
research
08/22/2022

Neuro-Symbolic Visual Dialog

We propose Neuro-Symbolic Visual Dialog (NSVD) -the first method to comb...
research
11/02/2020

Reasoning Over History: Context Aware Visual Dialog

While neural models have been shown to exhibit strong performance on sin...
research
06/15/2020

ORD: Object Relationship Discovery for Visual Dialogue Generation

With the rapid advancement of image captioning and visual question answe...
research
03/23/2019

Referring to the recently seen: reference and perceptual memory in situated dialog

From theoretical linguistic and cognitive perspectives, situated dialog ...

Please sign up or login with your details

Forgot password? Click here to reset