Visual Reference Resolution using Attention Memory for Visual Dialog

09/23/2017
by   Paul Hongsuck Seo, et al.
0

Visual dialog is a task of answering a series of inter-dependent questions given an input image, and often requires to resolve visual references among the questions. This problem is different from visual question answering (VQA), which relies on spatial attention (a.k.a. visual grounding) estimated from an image and question pair. We propose a novel attention mechanism that exploits visual attentions in the past to resolve the current reference in the visual dialog scenario. The proposed model is equipped with an associative attention memory storing a sequence of previous (attention, key) pairs. From this memory, the model retrieves the previous attention, taking into account recency, which is most relevant for the current question, in order to resolve potentially ambiguous references. The model then merges the retrieved attention with a tentative one to obtain the final attention for the current question; specifically, we use dynamic parameter prediction to combine the two attentions conditioned on the question. Through extensive experiments on a new synthetic visual dialog dataset, we show that our model significantly outperforms the state-of-the-art (by 16 resolution plays an important role. Moreover, the proposed model achieves superior performance ( 2 despite having significantly fewer parameters than the baselines.

READ FULL TEXT

page 8

page 12

page 13

page 14

page 15

page 16

research
02/25/2019

Dual Attention Networks for Visual Reference Resolution in Visual Dialog

Visual dialog (VisDial) is a task which requires an AI agent to answer a...
research
03/07/2017

Unsupervised Visual-Linguistic Reference Resolution in Instructional Videos

We propose an unsupervised method for reference resolution in instructio...
research
11/02/2020

Reasoning Over History: Context Aware Visual Dialog

While neural models have been shown to exhibit strong performance on sin...
research
06/15/2020

ORD: Object Relationship Discovery for Visual Dialogue Generation

With the rapid advancement of image captioning and visual question answe...
research
08/22/2022

Neuro-Symbolic Visual Dialog

We propose Neuro-Symbolic Visual Dialog (NSVD) -the first method to comb...
research
04/11/2019

Factor Graph Attention

Dialog is an effective way to exchange information, but subtle details a...
research
03/23/2019

Referring to the recently seen: reference and perceptual memory in situated dialog

From theoretical linguistic and cognitive perspectives, situated dialog ...

Please sign up or login with your details

Forgot password? Click here to reset