Measuring Faithful and Plausible Visual Grounding in VQA

05/24/2023
by   Daniel Reich, et al.
0

Metrics for Visual Grounding (VG) in Visual Question Answering (VQA) systems primarily aim to measure a system's reliance on relevant parts of the image when inferring an answer to the given question. Lack of VG has been a common problem among state-of-the-art VQA systems and can manifest in over-reliance on irrelevant image parts or a disregard for the visual modality entirely. Although inference capabilities of VQA models are often illustrated by a few qualitative illustrations, most systems are not quantitatively assessed for their VG properties. We believe, an easily calculated criterion for meaningfully measuring a system's VG can help remedy this shortcoming, as well as add another valuable dimension to model evaluations and analysis. To this end, we propose a new VG metric that captures if a model a) identifies question-relevant objects in the scene, and b) actually relies on the information contained in the relevant objects when producing its answer, i.e., if its visual grounding is both "faithful" and "plausible". Our metric, called "Faithful and Plausible Visual Grounding" (FPVG), is straightforward to determine for most VQA model designs. We give a detailed description of FPVG and evaluate several reference systems spanning various VQA architectures. Code to support the metric calculations on the GQA data set is available on GitHub.

READ FULL TEXT

page 1

page 4

research
08/21/2023

VQA Therapy: Exploring Answer Differences by Visually Grounding Answers

Visual question answering is a task of predicting the answer to a questi...
research
04/03/2022

Question-Driven Graph Fusion Network For Visual Question Answering

Existing Visual Question Answering (VQA) models have explored various vi...
research
05/05/2022

What is Right for Me is Not Yet Right for You: A Dataset for Grounding Relative Directions via Multi-Task Learning

Understanding spatial relations is essential for intelligent agents to a...
research
11/15/2022

Visually Grounded VQA by Lattice-based Retrieval

Visual Grounding (VG) in Visual Question Answering (VQA) systems describ...
research
05/25/2022

Guiding Visual Question Answering with Attention Priors

The current success of modern visual reasoning systems is arguably attri...
research
09/20/2023

Sentence Attention Blocks for Answer Grounding

Answer grounding is the task of locating relevant visual evidence for th...
research
09/12/2018

The Wisdom of MaSSeS: Majority, Subjectivity, and Semantic Similarity in the Evaluation of VQA

We introduce MASSES, a simple evaluation metric for the task of Visual Q...

Please sign up or login with your details

Forgot password? Click here to reset