Overcoming Language Priors in Visual Question Answering via Distinguishing Superficially Similar Instances

09/18/2022
by   Yike Wu, et al.
0

Despite the great progress of Visual Question Answering (VQA), current VQA models heavily rely on the superficial correlation between the question type and its corresponding frequent answers (i.e., language priors) to make predictions, without really understanding the input. In this work, we define the training instances with the same question type but different answers as \textit{superficially similar instances}, and attribute the language priors to the confusion of VQA model on such instances. To solve this problem, we propose a novel training framework that explicitly encourages the VQA model to distinguish between the superficially similar instances. Specifically, for each training instance, we first construct a set that contains its superficially similar counterparts. Then we exploit the proposed distinguishing module to increase the distance between the instance and its counterparts in the answer space. In this way, the VQA model is forced to further focus on the other parts of the input beyond the question type, which helps to overcome the language priors. Experimental results show that our method achieves the state-of-the-art performance on VQA-CP v2. Codes are available at \href{https://github.com/wyk-nku/Distinguishing-VQA.git}{Distinguishing-VQA}.

READ FULL TEXT

page 1

page 7

research
12/01/2017

Don't Just Assume; Look and Answer: Overcoming Priors for Visual Question Answering

A number of studies have found that today's Visual Question Answering (V...
research
10/08/2018

Overcoming Language Priors in Visual Question Answering with Adversarial Regularization

Modern Visual Question Answering (VQA) models have been shown to rely he...
research
05/05/2021

AdaVQA: Overcoming Language Priors with Adapted Margin Cosine Loss

A number of studies point out that current Visual Question Answering (VQ...
research
07/24/2022

Visual Perturbation-aware Collaborative Learning for Overcoming the Language Prior Problem

Several studies have recently pointed that existing Visual Question Answ...
research
12/02/2016

Making the V in VQA Matter: Elevating the Role of Image Understanding in Visual Question Answering

Problems at the intersection of vision and language are of significant i...
research
10/30/2020

Loss-rescaling VQA: Revisiting Language Prior Problem from a Class-imbalance View

Recent studies have pointed out that many well-developed Visual Question...
research
10/10/2022

Language Prior Is Not the Only Shortcut: A Benchmark for Shortcut Learning in VQA

Visual Question Answering (VQA) models are prone to learn the shortcut s...

Please sign up or login with your details

Forgot password? Click here to reset