Task-driven Visual Saliency and Attention-based Visual Question Answering

02/22/2017
by   Yuetan Lin, et al.
0

Visual question answering (VQA) has witnessed great progress since May, 2015 as a classic problem unifying visual and textual data into a system. Many enlightening VQA works explore deep into the image and question encodings and fusing methods, of which attention is the most effective and infusive mechanism. Current attention based methods focus on adequate fusion of visual and textual features, but lack the attention to where people focus to ask questions about the image. Traditional attention based methods attach a single value to the feature at each spatial location, which losses many useful information. To remedy these problems, we propose a general method to perform saliency-like pre-selection on overlapped region features by the interrelation of bidirectional LSTM (BiLSTM), and use a novel element-wise multiplication based attention method to capture more competent correlation information between visual and textual features. We conduct experiments on the large-scale COCO-VQA dataset and analyze the effectiveness of our model demonstrated by strong empirical results.

READ FULL TEXT

page 4

page 7

research
09/19/2017

Exploring Human-like Attention Supervision in Visual Question Answering

Attention mechanisms have been widely applied in the Visual Question Ans...
research
11/04/2020

An Improved Attention for Visual Question Answering

We consider the problem of Visual Question Answering (VQA). Given an ima...
research
08/08/2018

Question-Guided Hybrid Convolution for Visual Question Answering

In this paper, we propose a novel Question-Guided Hybrid Convolution (QG...
research
06/04/2022

From Pixels to Objects: Cubic Visual Attention for Visual Question Answering

Recently, attention-based Visual Question Answering (VQA) has achieved g...
research
11/17/2015

Ask, Attend and Answer: Exploring Question-Guided Spatial Attention for Visual Question Answering

We address the problem of Visual Question Answering (VQA), which require...
research
06/14/2019

Improving Visual Question Answering by Referring to Generated Paragraph Captions

Paragraph-style image captions describe diverse aspects of an image as o...
research
06/12/2016

Training Recurrent Answering Units with Joint Loss Minimization for VQA

We propose a novel algorithm for visual question answering based on a re...

Please sign up or login with your details

Forgot password? Click here to reset