DeepAI AI Chat
Log In Sign Up

Reciprocal Attention Fusion for Visual Question Answering

05/11/2018
by   Moshiur R Farazi, et al.
0

Existing attention mechanisms either attend to local image grid or object level features for Visual Question Answering (VQA). Motivated by the observation that questions can relate to both object instances and their parts, we propose a novel attention mechanism that jointly considers reciprocal relationships between the two levels of visual details. The bottom-up attention thus generated is further coalesced with the top-down information to only focus on the scene elements that are most relevant to a given question. Our design hierarchically fuses multi-modal information i.e., language, object- and gird-level features, through an efficient tensor decomposition scheme. The proposed model improves the state-of-the-art single model performances from 67.9 significant boost.

READ FULL TEXT

page 2

page 3

06/01/2020

Multimodal grid features and cell pointers for Scene Text Visual Question Answering

This paper presents a new model for the task of scene text visual questi...
09/17/2019

Inverse Visual Question Answering with Multi-Level Attentions

In this paper, we propose a novel deep multi-level attention model to ad...
01/22/2021

Visual Question Answering based on Local-Scene-Aware Referring Expression Generation

Visual question answering requires a deep understanding of both images a...
10/05/2020

Attention Guided Semantic Relationship Parsing for Visual Question Answering

Humans explain inter-object relationships with semantic labels that demo...
06/15/2020

ORD: Object Relationship Discovery for Visual Dialogue Generation

With the rapid advancement of image captioning and visual question answe...
08/01/2018

Learning Visual Question Answering by Bootstrapping Hard Attention

Attention mechanisms in biological perception are thought to select subs...
08/09/2019

Question-Agnostic Attention for Visual Question Answering

Visual Question Answering (VQA) models employ attention mechanisms to di...