A negative case analysis of visual grounding methods for VQA

04/12/2020
by   Robik Shrestha, et al.
7

Existing Visual Question Answering (VQA) methods tend to exploit dataset biases and spurious statistical correlations, instead of producing right answers for the right reasons. To address this issue, recent bias mitigation methods for VQA propose to incorporate visual cues (e.g., human attention maps) to better ground the VQA models, showcasing impressive gains. However, we show that the performance improvements are not a result of improved visual grounding, but a regularization effect which prevents over-fitting to linguistic priors. For instance, we find that it is not actually necessary to provide proper, human-based cues; random, insensible cues also result in similar improvements. Based on this observation, we propose a simpler regularization scheme that does not require any external annotations and yet achieves near state-of-the-art performance on VQA-CPv2.

READ FULL TEXT

page 1

page 9

page 10

research
06/20/2019

Adversarial Regularization for Visual Question Answering: Strengths, Shortcomings, and Side Effects

Visual question answering (VQA) models have been shown to over-rely on l...
research
10/08/2018

Overcoming Language Priors in Visual Question Answering with Adversarial Regularization

Modern Visual Question Answering (VQA) models have been shown to rely he...
research
02/03/2021

Answer Questions with Right Image Regions: A Visual Attention Regularization Approach

Visual attention in Visual Question Answering (VQA) targets at locating ...
research
08/01/2018

Interpretable Visual Question Answering by Visual Grounding from Attention Supervision Mining

A key aspect of VQA models that are interpretable is their ability to gr...
research
07/13/2020

Reducing Language Biases in Visual Question Answering with Visually-Grounded Question Encoder

Recent studies have shown that current VQA models are heavily biased on ...
research
05/25/2022

Guiding Visual Question Answering with Attention Priors

The current success of modern visual reasoning systems is arguably attri...
research
11/15/2022

Visually Grounded VQA by Lattice-based Retrieval

Visual Grounding (VG) in Visual Question Answering (VQA) systems describ...

Please sign up or login with your details

Forgot password? Click here to reset