Exploring Human-like Attention Supervision in Visual Question Answering

09/19/2017
by   Tingting Qiao, et al.
0

Attention mechanisms have been widely applied in the Visual Question Answering (VQA) task, as they help to focus on the area-of-interest of both visual and textual information. To answer the questions correctly, the model needs to selectively target different areas of an image, which suggests that an attention-based model may benefit from an explicit attention supervision. In this work, we aim to address the problem of adding attention supervision to VQA models. Since there is a lack of human attention data, we first propose a Human Attention Network (HAN) to generate human-like attention maps, training on a recently released dataset called Human ATtention Dataset (VQA-HAT). Then, we apply the pre-trained HAN on the VQA v2.0 dataset to automatically produce the human-like attention maps for all image-question pairs. The generated human-like attention map dataset for the VQA v2.0 dataset is named as Human-Like ATtention (HLAT) dataset. Finally, we apply human-like attention supervision to an attention-based VQA model. The experiments show that adding human-like supervision yields a more accurate attention together with a better performance, showing a promising future for human-like attention supervision in VQA.

READ FULL TEXT

page 1

page 3

page 6

page 7

page 9

page 10

research
09/27/2021

Multimodal Integration of Human-Like Attention in Visual Question Answering

Human-like attention as a supervisory signal to guide neural attention h...
research
02/22/2017

Task-driven Visual Saliency and Attention-based Visual Question Answering

Visual question answering (VQA) has witnessed great progress since May, ...
research
11/19/2019

Explanation vs Attention: A Two-Player Game to Obtain Attention for VQA

In this paper, we aim to obtain improved attention for a visual question...
research
09/27/2021

VQA-MHUG: A Gaze Dataset to Study Multimodal Neural Attention in Visual Question Answering

We present VQA-MHUG - a novel 49-participant dataset of multimodal human...
research
02/03/2021

Answer Questions with Right Image Regions: A Visual Attention Regularization Approach

Visual attention in Visual Question Answering (VQA) targets at locating ...
research
01/10/2020

Visual Question Answering on 360° Images

In this work, we introduce VQA 360, a novel task of visual question answ...
research
09/21/2020

Regularizing Attention Networks for Anomaly Detection in Visual Question Answering

For stability and reliability of real-world applications, the robustness...

Please sign up or login with your details

Forgot password? Click here to reset