An Empirical Study on the Language Modal in Visual Question Answering

05/17/2023
by   Daowan Peng, et al.
0

Generalization beyond in-domain experience to out-of-distribution data is of paramount significance in the AI domain. Of late, state-of-the-art Visual Question Answering (VQA) models have shown impressive performance on in-domain data, partially due to the language priors bias which, however, hinders the generalization ability in practice. This paper attempts to provide new insights into the influence of language modality on VQA performance from an empirical study perspective. To achieve this, we conducted a series of experiments on six models. The results of these experiments revealed that, 1) apart from prior bias caused by question types, there is a notable influence of postfix-related bias in inducing biases, and 2) training VQA models with word-sequence-related variant questions demonstrated improved performance on the out-of-distribution benchmark, and the LXMERT even achieved a 10-point gain without adopting any debiasing methods. We delved into the underlying reasons behind these experimental results and put forward some simple proposals to reduce the models' dependency on language priors. The experimental results demonstrated the effectiveness of our proposed method in improving performance on the out-of-distribution benchmark, VQA-CPv2. We hope this study can inspire novel insights for future research on designing bias-reduction approaches.

READ FULL TEXT

page 1

page 4

page 5

page 6

page 7

research
12/21/2020

Learning content and context with language bias for Visual Question Answering

Visual Question Answering (VQA) is a challenging multimodal task to answ...
research
08/28/2021

On the Significance of Question Encoder Sequence Model in the Out-of-Distribution Performance in Visual Question Answering

Generalizing beyond the experiences has a significant role in developing...
research
01/31/2021

An Empirical Study on the Generalization Power of Neural Representations Learned via Visual Guessing Games

Guessing games are a prototypical instance of the "learning by interacti...
research
07/13/2020

Reducing Language Biases in Visual Question Answering with Visually-Grounded Question Encoder

Recent studies have shown that current VQA models are heavily biased on ...
research
06/01/2023

Overcoming Language Bias in Remote Sensing Visual Question Answering via Adversarial Training

The Visual Question Answering (VQA) system offers a user-friendly interf...
research
09/06/2021

Improved RAMEN: Towards Domain Generalization for Visual Question Answering

Currently nearing human-level performance, Visual Question Answering (VQ...
research
05/05/2021

AdaVQA: Overcoming Language Priors with Adapted Margin Cosine Loss

A number of studies point out that current Visual Question Answering (VQ...

Please sign up or login with your details

Forgot password? Click here to reset