Reducing Language Biases in Visual Question Answering with Visually-Grounded Question Encoder

07/13/2020
by   Gouthaman KV, et al.
0

Recent studies have shown that current VQA models are heavily biased on the language priors in the train set to answer the question, irrespective of the image. E.g., overwhelmingly answer "what sport is" as "tennis" or "what color banana" as "yellow." This behavior restricts them from real-world application scenarios. In this work, we propose a novel model-agnostic question encoder, Visually-Grounded Question Encoder (VGQE), for VQA that reduces this effect. VGQE utilizes both visual and language modalities equally while encoding the question. Hence the question representation itself gets sufficient visual-grounding, and thus reduces the dependency of the model on the language priors. We demonstrate the effect of VGQE on three recent VQA models and achieve state-of-the-art results on the bias-sensitive split of the VQAv2 dataset; VQA-CPv2. Further, unlike the existing bias-reduction techniques, on the standard VQAv2 benchmark, our approach does not drop the accuracy; instead, it improves the performance.

READ FULL TEXT

page 12

page 13

research
10/08/2018

Overcoming Language Priors in Visual Question Answering with Adversarial Regularization

Modern Visual Question Answering (VQA) models have been shown to rely he...
research
08/28/2021

On the Significance of Question Encoder Sequence Model in the Out-of-Distribution Performance in Visual Question Answering

Generalizing beyond the experiences has a significant role in developing...
research
06/10/2020

Estimating semantic structure for the VQA answer space

Since its appearance, Visual Question Answering (VQA, i.e. answering a q...
research
06/28/2021

Adventurer's Treasure Hunt: A Transparent System for Visually Grounded Compositional Visual Question Answering based on Scene Graphs

With the expressed goal of improving system transparency and visual grou...
research
05/17/2023

An Empirical Study on the Language Modal in Visual Question Answering

Generalization beyond in-domain experience to out-of-distribution data i...
research
04/12/2020

A negative case analysis of visual grounding methods for VQA

Existing Visual Question Answering (VQA) methods tend to exploit dataset...
research
11/15/2022

Visually Grounded VQA by Lattice-based Retrieval

Visual Grounding (VG) in Visual Question Answering (VQA) systems describ...

Please sign up or login with your details

Forgot password? Click here to reset