On the Flip Side: Identifying Counterexamples in Visual Question Answering

06/03/2018
by   Gabriel Grand, et al.
0

Visual question answering (VQA) models respond to open-ended natural language questions about images. While VQA is an increasingly popular area of research, it is unclear to what extent current VQA architectures learn key semantic distinctions between visually-similar images. To investigate this question, we explore a reformulation of the VQA task that challenges models to identify counterexamples: images that result in a different answer to the original question. We introduce two plug-and-play methods for evaluating existing VQA models against a supervised counterexample prediction task, VQA-CX. While our models surpass existing benchmarks on VQA-CX, we find that the multimodal representations learned by an existing state-of-the-art VQA model contribute only marginally to performance on this task. These results call into question the assumption that successful performance on the VQA benchmark is indicative of general visual-semantic reasoning abilities.

READ FULL TEXT

page 1

page 7

research
05/02/2021

A survey on VQA_Datasets and Approaches

Visual question answering (VQA) is a task that combines both the techniq...
research
05/03/2015

VQA: Visual Question Answering

We propose the task of free-form and open-ended Visual Question Answerin...
research
10/26/2022

What's Different between Visual Question Answering for Machine "Understanding" Versus for Accessibility?

In visual question answering (VQA), a machine must answer a question giv...
research
04/02/2022

Co-VQA : Answering by Interactive Sub Question Sequence

Most existing approaches to Visual Question Answering (VQA) answer quest...
research
09/21/2017

Visual Question Generation as Dual Task of Visual Question Answering

Recently visual question answering (VQA) and visual question generation ...
research
06/08/2018

CS-VQA: Visual Question Answering with Compressively Sensed Images

Visual Question Answering (VQA) is a complex semantic task requiring bot...
research
11/15/2022

MapQA: A Dataset for Question Answering on Choropleth Maps

Choropleth maps are a common visual representation for region-specific t...

Please sign up or login with your details

Forgot password? Click here to reset