'Just because you are right, doesn't mean I am wrong': Overcoming a Bottleneck in the Development and Evaluation of Open-Ended Visual Question Answering (VQA) Tasks

03/28/2021
by   Man Luo, et al.
0

GQA (Hudson and Manning, 2019) is a dataset for real-world visual reasoning and compositional question answering. We found that many answers predicted by the best visionlanguage models on the GQA dataset do not match the ground-truth answer but still are semantically meaningful and correct in the given context. In fact, this is the case with most existing visual question answering (VQA) datasets where they assume only one ground-truth answer for each question. We propose Alternative Answer Sets (AAS) of ground-truth answers to address this limitation, which is created automatically using off-the-shelf NLP tools. We introduce a semantic metric based on AAS and modify top VQA solvers to support multiple plausible answers for a question. We implement this approach on the GQA dataset and show the performance improvements.

READ FULL TEXT
research
08/21/2023

VQA Therapy: Exploring Answer Differences by Visually Grounding Answers

Visual question answering is a task of predicting the answer to a questi...
research
08/02/2017

A Simple Loss Function for Improving the Convergence and Accuracy of Visual Question Answering Models

Visual question answering as recently proposed multimodal learning task ...
research
04/12/2020

Which visual questions are difficult to answer? Analysis with Entropy of Answer Distributions

We propose a novel approach to identify the difficulty of visual questio...
research
04/20/2020

A Revised Generative Evaluation of Visual Dialogue

Evaluating Visual Dialogue, the task of answering a sequence of question...
research
06/29/2022

What Can Secondary Predictions Tell Us? An Exploration on Question-Answering with SQuAD-v2.0

Performance in natural language processing, and specifically for the que...
research
09/12/2018

The Wisdom of MaSSeS: Majority, Subjectivity, and Semantic Similarity in the Evaluation of VQA

We introduce MASSES, a simple evaluation metric for the task of Visual Q...
research
02/21/2021

Learning Compositional Representation for Few-shot Visual Question Answering

Current methods of Visual Question Answering perform well on the answers...

Please sign up or login with your details

Forgot password? Click here to reset