Reasoning Over History: Context Aware Visual Dialog

11/02/2020
by   Muhammad A. Shah, et al.
1

While neural models have been shown to exhibit strong performance on single-turn visual question answering (VQA) tasks, extending VQA to a multi-turn, conversational setting remains a challenge. One way to address this challenge is to augment existing strong neural VQA models with the mechanisms that allow them to retain information from previous dialog turns. One strong VQA model is the MAC network, which decomposes a task into a series of attention-based reasoning steps. However, since the MAC network is designed for single-turn question answering, it is not capable of referring to past dialog turns. More specifically, it struggles with tasks that require reasoning over the dialog history, particularly coreference resolution. We extend the MAC network architecture with Context-aware Attention and Memory (CAM), which attends over control states in past dialog turns to determine the necessary reasoning operations for the current question. MAC nets with CAM achieve up to 98.25 state-of-the-art by 30 the model's performance particularly improved on questions that required coreference resolution.

READ FULL TEXT

page 7

page 8

research
12/06/2018

Recursive Visual Attention in Visual Dialog

Visual dialog is a challenging vision-language task, which requires the ...
research
09/06/2018

Visual Coreference Resolution in Visual Dialog using Neural Module Networks

Visual dialog entails answering a series of questions grounded in an ima...
research
09/23/2017

Visual Reference Resolution using Attention Memory for Visual Dialog

Visual dialog is a task of answering a series of inter-dependent questio...
research
02/25/2019

Dual Attention Networks for Visual Reference Resolution in Visual Dialog

Visual dialog (VisDial) is a task which requires an AI agent to answer a...
research
04/02/2021

VisQA: X-raying Vision and Language Reasoning in Transformers

Visual Question Answering systems target answering open-ended textual qu...
research
07/28/2023

Context-VQA: Towards Context-Aware and Purposeful Visual Question Answering

Visual question answering (VQA) has the potential to make the Internet m...
research
08/22/2022

Neuro-Symbolic Visual Dialog

We propose Neuro-Symbolic Visual Dialog (NSVD) -the first method to comb...

Please sign up or login with your details

Forgot password? Click here to reset