MUTANT: A Training Paradigm for Out-of-Distribution Generalization in Visual Question Answering

09/18/2020
by   Tejas Gokhale, et al.
8

While progress has been made on the visual question answering leaderboards, models often utilize spurious correlations and priors in datasets under the i.i.d. setting. As such, evaluation on out-of-distribution (OOD) test samples has emerged as a proxy for generalization. In this paper, we present MUTANT, a training paradigm that exposes the model to perceptually similar, yet semantically distinct mutations of the input, to improve OOD generalization, such as the VQA-CP challenge. Under this paradigm, models utilize a consistency-constrained training objective to understand the effect of semantic changes in input (question-image pair) on the output (answer). Unlike existing methods on VQA-CP, MUTANT does not rely on the knowledge about the nature of train and test answer distributions. MUTANT establishes a new state-of-the-art accuracy on VQA-CP with a 10.57% improvement. Our work opens up avenues for the use of semantic input mutations for OOD generalization in question answering.

READ FULL TEXT

page 1

page 4

page 5

page 12

page 14

research
12/01/2017

Don't Just Assume; Look and Answer: Overcoming Priors for Visual Question Answering

A number of studies have found that today's Visual Question Answering (V...
research
05/24/2019

Self-Critical Reasoning for Robust Visual Question Answering

Visual Question Answering (VQA) deep-learning systems tend to capture su...
research
10/10/2022

Towards Robust Visual Question Answering: Making the Most of Biased Samples via Contrastive Learning

Models for Visual Question Answering (VQA) often rely on the spurious co...
research
06/08/2021

Are VQA Systems RAD? Measuring Robustness to Augmented Data with Focused Interventions

Deep learning algorithms have shown promising results in visual question...
research
05/06/2023

Adaptive loose optimization for robust question answering

Question answering methods are well-known for leveraging data bias, such...
research
10/10/2022

Language Prior Is Not the Only Shortcut: A Benchmark for Shortcut Learning in VQA

Visual Question Answering (VQA) models are prone to learn the shortcut s...
research
12/04/2020

Self-Supervised VQA: Answering Visual Questions using Images and Captions

Methodologies for training VQA models assume the availability of dataset...

Please sign up or login with your details

Forgot password? Click here to reset