Reliable Visual Question Answering: Abstain Rather Than Answer Incorrectly

04/28/2022
by   Spencer Whitehead, et al.
0

Machine learning has advanced dramatically, narrowing the accuracy gap to humans in multimodal tasks like visual question answering (VQA). However, while humans can say "I don't know" when they are uncertain (i.e., abstain from answering a question), such ability has been largely neglected in multimodal research, despite the importance of this problem to the usage of VQA in real settings. In this work, we promote a problem formulation for reliable VQA, where we prefer abstention over providing an incorrect answer. We first enable abstention capabilities for several VQA models, and analyze both their coverage, the portion of questions answered, and risk, the error on that portion. For that we explore several abstention approaches. We find that although the best performing models achieve over 71 dataset, introducing the option to abstain by directly using a model's softmax scores limits them to answering less than 8 risk of error (i.e., 1 function to directly estimate the correctness of the predicted answers, which we show can triple the coverage from, for example, 5.0 While it is important to analyze both coverage and risk, these metrics have a trade-off which makes comparing VQA models challenging. To address this, we also propose an Effective Reliability metric for VQA that places a larger cost on incorrect answers compared to abstentions. This new problem formulation, metric, and analysis for VQA provide the groundwork for building effective and reliable VQA models that have the self-awareness to abstain if and only if they don't know the answer.

READ FULL TEXT

page 22

page 24

research
01/25/2023

Towards a Unified Model for Generating Answers and Explanations in Visual Question Answering

Providing explanations for visual question answering (VQA) has gained mu...
research
06/14/2023

Improving Selective Visual Question Answering by Learning from Your Peers

Despite advances in Visual Question Answering (VQA), the ability of mode...
research
04/07/2021

Beyond Question-Based Biases: Assessing Multimodal Shortcut Learning in Visual Question Answering

We introduce an evaluation methodology for visual question answering (VQ...
research
09/12/2018

The Wisdom of MaSSeS: Majority, Subjectivity, and Semantic Similarity in the Evaluation of VQA

We introduce MASSES, a simple evaluation metric for the task of Visual Q...
research
06/23/2016

Analyzing the Behavior of Visual Question Answering Models

Recently, a number of deep-learning based models have been proposed for ...
research
11/30/2019

A Free Lunch in Generating Datasets: Building a VQG and VQA System with Attention and Humans in the Loop

Despite their importance in training artificial intelligence systems, la...
research
03/31/2022

SimVQA: Exploring Simulated Environments for Visual Question Answering

Existing work on VQA explores data augmentation to achieve better genera...

Please sign up or login with your details

Forgot password? Click here to reset