Human Attention in Visual Question Answering: Do Humans and Deep Networks Look at the Same Regions?

06/17/2016
by   Abhishek Das, et al.
0

We conduct large-scale studies on `human attention' in Visual Question Answering (VQA) to understand where humans choose to look to answer questions about images. We design and test multiple game-inspired novel attention-annotation interfaces that require the subject to sharpen regions of a blurred image to answer a question. Thus, we introduce the VQA-HAT (Human ATtention) dataset. We evaluate attention maps generated by state-of-the-art VQA models against human attention both qualitatively (via visualizations) and quantitatively (via rank-order correlation). Overall, our experiments show that current attention models in VQA do not seem to be looking at the same regions as humans.

READ FULL TEXT

page 1

page 2

page 3

research
05/31/2016

Hierarchical Question-Image Co-Attention for Visual Question Answering

A number of recent works have proposed attention models for Visual Quest...
research
02/03/2021

Answer Questions with Right Image Regions: A Visual Attention Regularization Approach

Visual attention in Visual Question Answering (VQA) targets at locating ...
research
10/09/2018

Knowing Where to Look? Analysis on Attention of Visual Question Answering System

Attention mechanisms have been widely used in Visual Question Answering ...
research
01/11/2022

On the Efficacy of Co-Attention Transformer Layers in Visual Question Answering

In recent years, multi-modal transformers have shown significant progres...
research
04/01/2018

Differential Attention for Visual Question Answering

In this paper we aim to answer questions based on images when provided w...
research
08/23/2022

How good are deep models in understanding the generated images?

My goal in this paper is twofold: to study how well deep models can unde...
research
02/11/2019

Taking a HINT: Leveraging Explanations to Make Vision and Language Models More Grounded

Many vision and language models suffer from poor visual grounding - ofte...

Please sign up or login with your details

Forgot password? Click here to reset