Recursive Visual Attention in Visual Dialog

12/06/2018
by   Yulei Niu, et al.
14

Visual dialog is a challenging vision-language task, which requires the agent to answer multi-round questions about an image. It typically needs to address two major problems: (1) How to answer visually-grounded questions, which is the core challenge in visual question answering (VQA); (2) How to infer the co-reference between questions and the dialog history. An example of visual co-reference is: pronouns (e.g., `they') in the question (e.g., `Are they on or off?') are linked with nouns (e.g., `lamps') appearing in the dialog history (e.g., `How many lamps are there?') and the object grounded in the image. In this work, to resolve the visual co-reference for visual dialog, we propose a novel attention mechanism called Recursive Visual Attention (RvA). Specifically, our dialog agent browses the dialog history until the agent has sufficient confidence in the visual co-reference resolution, and refines the visual attention recursively. The quantitative and qualitative experimental results on the large-scale VisDial v0.9 and v1.0 datasets demonstrate that the proposed RvA not only outperforms the state-of-the-art methods, but also achieves reasonable recursion and interpretable attention maps without additional annotations.

READ FULL TEXT

page 8

page 10

page 11

research
02/25/2019

Dual Attention Networks for Visual Reference Resolution in Visual Dialog

Visual dialog (VisDial) is a task which requires an AI agent to answer a...
research
04/29/2020

Multi-View Attention Networks for Visual Dialog

Visual dialog is a challenging vision-language task in which a series of...
research
03/06/2022

Modeling Coreference Relations in Visual Dialog

Visual dialog is a vision-language task where an agent needs to answer a...
research
04/11/2019

Factor Graph Attention

Dialog is an effective way to exchange information, but subtle details a...
research
11/02/2020

Reasoning Over History: Context Aware Visual Dialog

While neural models have been shown to exhibit strong performance on sin...
research
09/13/2021

Learning to Ground Visual Objects for Visual Dialog

Visual dialog is challenging since it needs to answer a series of cohere...
research
02/25/2019

Making History Matter: Gold-Critic Sequence Training for Visual Dialog

We study the multi-round response generation in visual dialog systems, w...

Please sign up or login with your details

Forgot password? Click here to reset