On the Significance of Question Encoder Sequence Model in the Out-of-Distribution Performance in Visual Question Answering

08/28/2021
by   Gouthaman KV, et al.
0

Generalizing beyond the experiences has a significant role in developing practical AI systems. It has been shown that current Visual Question Answering (VQA) models are over-dependent on the language-priors (spurious correlations between question-types and their most frequent answers) from the train set and pose poor performance on Out-of-Distribution (OOD) test sets. This conduct limits their generalizability and restricts them from being utilized in real-world situations. This paper shows that the sequence model architecture used in the question-encoder has a significant role in the generalizability of VQA models. To demonstrate this, we performed a detailed analysis of various existing RNN-based and Transformer-based question-encoders, and along, we proposed a novel Graph attention network (GAT)-based question-encoder. Our study found that a better choice of sequence model in the question-encoder improves the generalizability of VQA models even without using any additional relatively complex bias-mitigation approaches.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/13/2020

Reducing Language Biases in Visual Question Answering with Visually-Grounded Question Encoder

Recent studies have shown that current VQA models are heavily biased on ...
research
10/08/2018

Overcoming Language Priors in Visual Question Answering with Adversarial Regularization

Modern Visual Question Answering (VQA) models have been shown to rely he...
research
05/17/2023

An Empirical Study on the Language Modal in Visual Question Answering

Generalization beyond in-domain experience to out-of-distribution data i...
research
03/24/2022

Towards Efficient and Elastic Visual Question Answering with Doubly Slimmable Transformer

Transformer-based approaches have shown great success in visual question...
research
05/06/2023

Adaptive loose optimization for robust question answering

Question answering methods are well-known for leveraging data bias, such...
research
11/17/2021

Achieving Human Parity on Visual Question Answering

The Visual Question Answering (VQA) task utilizes both visual image and ...
research
12/19/2019

Deep Exemplar Networks for VQA and VQG

In this paper, we consider the problem of solving semantic tasks such as...

Please sign up or login with your details

Forgot password? Click here to reset