Beyond Accuracy: A Consolidated Tool for Visual Question Answering Benchmarking

10/11/2021
by   Dirk Väth, et al.
0

On the way towards general Visual Question Answering (VQA) systems that are able to answer arbitrary questions, the need arises for evaluation beyond single-metric leaderboards for specific datasets. To this end, we propose a browser-based benchmarking tool for researchers and challenge organizers, with an API for easy integration of new models and datasets to keep up with the fast-changing landscape of VQA. Our tool helps test generalization capabilities of models across multiple datasets, evaluating not just accuracy, but also performance in more realistic real-world scenarios such as robustness to input noise. Additionally, we include metrics that measure biases and uncertainty, to further explain model behavior. Interactive filtering facilitates discovery of problematic behavior, down to the data sample level. As proof of concept, we perform a case study on four models. We find that state-of-the-art VQA models are optimized for specific tasks or datasets, but fail to generalize even to other in-domain test sets, for example they cannot recognize text in images. Our metrics allow us to quantify which image and question embeddings provide most robustness to a model. All code is publicly available.

READ FULL TEXT

page 5

page 10

research
10/26/2022

What's Different between Visual Question Answering for Machine "Understanding" Versus for Accessibility?

In visual question answering (VQA), a machine must answer a question giv...
research
03/28/2017

An Analysis of Visual Question Answering Algorithms

In visual question answering (VQA), an algorithm must answer text-based ...
research
11/30/2019

Assessing the Robustness of Visual Question Answering

Deep neural networks have been playing an essential role in the task of ...
research
11/19/2018

Explicit Bias Discovery in Visual Question Answering Models

Researchers have observed that Visual Question Answering (VQA) models te...
research
04/06/2023

Improving Visual Question Answering Models through Robustness Analysis and In-Context Learning with a Chain of Basic Questions

Deep neural networks have been critical in the task of Visual Question A...
research
06/08/2021

Are VQA Systems RAD? Measuring Robustness to Augmented Data with Focused Interventions

Deep learning algorithms have shown promising results in visual question...
research
03/15/2022

CARETS: A Consistency And Robustness Evaluative Test Suite for VQA

We introduce CARETS, a systematic test suite to measure consistency and ...

Please sign up or login with your details

Forgot password? Click here to reset