Beyond Question-Based Biases: Assessing Multimodal Shortcut Learning in Visual Question Answering

04/07/2021
by   Corentin Dancette, et al.
0

We introduce an evaluation methodology for visual question answering (VQA) to better diagnose cases of shortcut learning. These cases happen when a model exploits spurious statistical regularities to produce correct answers but does not actually deploy the desired behavior. There is a need to identify possible shortcuts in a dataset and assess their use before deploying a model in the real world. The research community in VQA has focused exclusively on question-based shortcuts, where a model might, for example, answer "What is the color of the sky" with "blue" by relying mostly on the question-conditional training prior and give little weight to visual evidence. We go a step further and consider multimodal shortcuts that involve both questions and images. We first identify potential shortcuts in the popular VQA v2 training set by mining trivial predictive rules such as co-occurrences of words and visual elements. We then create VQA-CE, a new evaluation set made of CounterExamples i.e. questions where the mined rules lead to incorrect answers. We use this new evaluation in a large-scale study of existing models. We demonstrate that even state-of-the-art models perform poorly and that existing techniques to reduce biases are largely ineffective in this context. Our findings suggest that past work on question-based biases in VQA has only addressed one facet of a complex issue. The code for our method is available at https://github.com/cdancette/detect-shortcuts

READ FULL TEXT

page 2

page 3

page 4

page 5

page 6

page 12

page 13

research
06/24/2019

RUBi: Reducing Unimodal Biases in Visual Question Answering

Visual Question Answering (VQA) is the task of answering questions about...
research
11/19/2018

Explicit Bias Discovery in Visual Question Answering Models

Researchers have observed that Visual Question Answering (VQA) models te...
research
05/07/2023

OpenViVQA: Task, Dataset, and Multimodal Fusion Models for Visual Question Answering in Vietnamese

In recent years, visual question answering (VQA) has attracted attention...
research
04/28/2022

Reliable Visual Question Answering: Abstain Rather Than Answer Incorrectly

Machine learning has advanced dramatically, narrowing the accuracy gap t...
research
11/27/2020

Point and Ask: Incorporating Pointing into Visual Question Answering

Visual Question Answering (VQA) has become one of the key benchmarks of ...
research
04/08/2021

How Transferable are Reasoning Patterns in VQA?

Since its inception, Visual Question Answering (VQA) is notoriously know...
research
06/14/2023

Improving Selective Visual Question Answering by Learning from Your Peers

Despite advances in Visual Question Answering (VQA), the ability of mode...

Please sign up or login with your details

Forgot password? Click here to reset