Context-VQA: Towards Context-Aware and Purposeful Visual Question Answering

07/28/2023
by   Nandita Naik, et al.
0

Visual question answering (VQA) has the potential to make the Internet more accessible in an interactive way, allowing people who cannot see images to ask questions about them. However, multiple studies have shown that people who are blind or have low-vision prefer image explanations that incorporate the context in which an image appears, yet current VQA datasets focus on images in isolation. We argue that VQA models will not fully succeed at meeting people's needs unless they take context into account. To further motivate and analyze the distinction between different contexts, we introduce Context-VQA, a VQA dataset that pairs images with contexts, specifically types of websites (e.g., a shopping website). We find that the types of questions vary systematically across contexts. For example, images presented in a travel context garner 2 times more "Where?" questions, and images on social media and news garner 2.8 and 1.8 times more "Who?" questions than the average. We also find that context effects are especially important when participants can't see the image. These results demonstrate that context affects the types of questions asked and that VQA models should be context-sensitive to better meet people's needs, especially in accessibility settings.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/22/2018

VizWiz Grand Challenge: Answering Visual Questions from Blind People

The study of algorithms to automatically answer visual questions current...
research
12/04/2017

Learning by Asking Questions

We introduce an interactive learning framework for the development and t...
research
03/13/2023

Polar-VQA: Visual Question Answering on Remote Sensed Ice sheet Imagery from Polar Region

For glaciologists, studying ice sheets from the polar regions is critica...
research
03/29/2019

Relation-aware Graph Attention Network for Visual Question Answering

In order to answer semantically-complicated questions about an image, a ...
research
06/20/2019

Adversarial Regularization for Visual Question Answering: Strengths, Shortcomings, and Side Effects

Visual question answering (VQA) models have been shown to over-rely on l...
research
10/08/2020

Characterizing Datasets for Social Visual Question Answering, and the New TinySocial Dataset

Modern social intelligence includes the ability to watch videos and answ...
research
11/02/2020

Reasoning Over History: Context Aware Visual Dialog

While neural models have been shown to exhibit strong performance on sin...

Please sign up or login with your details

Forgot password? Click here to reset