Improving Selective Visual Question Answering by Learning from Your Peers

06/14/2023
by   Corentin Dancette, et al.
0

Despite advances in Visual Question Answering (VQA), the ability of models to assess their own correctness remains underexplored. Recent work has shown that VQA models, out-of-the-box, can have difficulties abstaining from answering when they are wrong. The option to abstain, also called Selective Prediction, is highly relevant when deploying systems to users who must trust the system's output (e.g., VQA assistants for users with visual impairments). For such scenarios, abstention can be especially important as users may provide out-of-distribution (OOD) or adversarial inputs that make incorrect answers more likely. In this work, we explore Selective VQA in both in-distribution (ID) and OOD scenarios, where models are presented with mixtures of ID and OOD data. The goal is to maximize the number of questions answered while minimizing the risk of error on those questions. We propose a simple yet effective Learning from Your Peers (LYP) approach for training multimodal selection functions for making abstention decisions. Our approach uses predictions from models trained on distinct subsets of the training data as targets for optimizing a Selective VQA model. It does not require additional manual labels or held-out data and provides a signal for identifying examples that are easy/difficult to generalize to. In our extensive evaluations, we show this benefits a number of models across different architectures and scales. Overall, for ID, we reach 32.92 of error (C@1 task. For mixed ID/OOD, using models' softmax confidences for abstention decisions performs very poorly, answering <5 even when faced with only 10 with LYP can increase that to 25.38

READ FULL TEXT

page 21

page 22

research
04/28/2022

Reliable Visual Question Answering: Abstain Rather Than Answer Incorrectly

Machine learning has advanced dramatically, narrowing the accuracy gap t...
research
12/04/2017

Learning by Asking Questions

We introduce an interactive learning framework for the development and t...
research
04/07/2021

Beyond Question-Based Biases: Assessing Multimodal Shortcut Learning in Visual Question Answering

We introduce an evaluation methodology for visual question answering (VQ...
research
04/04/2020

Generating Rationales in Visual Question Answering

Despite recent advances in Visual QuestionAnswering (VQA), it remains a ...
research
06/04/2021

Human-Adversarial Visual Question Answering

Performance on the most commonly used Visual Question Answering dataset ...
research
05/16/2022

A Neuro-Symbolic ASP Pipeline for Visual Question Answering

We present a neuro-symbolic visual question answering (VQA) pipeline for...
research
06/13/2023

AVIS: Autonomous Visual Information Seeking with Large Language Models

In this paper, we propose an autonomous information seeking visual quest...

Please sign up or login with your details

Forgot password? Click here to reset