Hierarchical Question-Image Co-Attention for Visual Question Answering

05/31/2016
by   Jiasen Lu, et al.
0

A number of recent works have proposed attention models for Visual Question Answering (VQA) that generate spatial maps highlighting image regions relevant to answering the question. In this paper, we argue that in addition to modeling "where to look" or visual attention, it is equally important to model "what words to listen to" or question attention. We present a novel co-attention model for VQA that jointly reasons about image and question attention. In addition, our model reasons about the question (and consequently the image via the co-attention mechanism) in a hierarchical fashion via a novel 1-dimensional convolution neural networks (CNN). Our model improves the state-of-the-art on the VQA dataset from 60.3 dataset. By using ResNet, the performance is further improved to 62.1 and 65.4

READ FULL TEXT

page 8

page 9

research
06/17/2016

Human Attention in Visual Question Answering: Do Humans and Deep Networks Look at the Same Regions?

We conduct large-scale studies on `human attention' in Visual Question A...
research
08/31/2016

Towards Transparent AI Systems: Interpreting Visual Question Answering Models

Deep neural networks have shown striking progress and obtained state-of-...
research
04/03/2018

Improved Fusion of Visual and Language Representations by Dense Symmetric Co-Attention for Visual Question Answering

A key solution to visual question answering (VQA) exists in how to fuse ...
research
11/17/2015

Ask, Attend and Answer: Exploring Question-Guided Spatial Attention for Visual Question Answering

We address the problem of Visual Question Answering (VQA), which require...
research
11/18/2017

Co-attending Free-form Regions and Detections with Multi-modal Multiplicative Feature Embedding for Visual Question Answering

Recently, the Visual Question Answering (VQA) task has gained increasing...
research
02/01/2018

Dual Recurrent Attention Units for Visual Question Answering

We propose an architecture for VQA which utilizes recurrent layers to ge...
research
06/04/2022

From Pixels to Objects: Cubic Visual Attention for Visual Question Answering

Recently, attention-based Visual Question Answering (VQA) has achieved g...

Please sign up or login with your details

Forgot password? Click here to reset