Estimating semantic structure for the VQA answer space

06/10/2020
by   Corentin Kervadec, et al.
0

Since its appearance, Visual Question Answering (VQA, i.e. answering a question posed over an image), has always been treated as a classification problem over a set of predefined answers. Despite its convenience, this classification approach poorly reflects the semantics of the problem limiting the answering to a choice between independent proposals, without taking into account the similarity between them (e.g. equally penalizing for answering cat or German shepherd instead of dog). We address this issue by proposing (1) two measures of proximity between VQA classes, and (2) a corresponding loss which takes into account the estimated proximity. This significantly improves the generalization of VQA models by reducing their language bias. In particular, we show that our approach is completely model-agnostic since it allows consistent improvements with three different VQA models. Finally, by combining our method with a language bias reduction approach, we report SOTA-level performance on the challenging VQAv2-CP dataset.

READ FULL TEXT
research
12/21/2020

Learning content and context with language bias for Visual Question Answering

Visual Question Answering (VQA) is a challenging multimodal task to answ...
research
05/29/2021

LPF: A Language-Prior Feedback Objective Function for De-biased Visual Question Answering

Most existing Visual Question Answering (VQA) systems tend to overly rel...
research
07/13/2020

Reducing Language Biases in Visual Question Answering with Visually-Grounded Question Encoder

Recent studies have shown that current VQA models are heavily biased on ...
research
06/20/2019

Adversarial Regularization for Visual Question Answering: Strengths, Shortcomings, and Side Effects

Visual question answering (VQA) models have been shown to over-rely on l...
research
05/05/2021

AdaVQA: Overcoming Language Priors with Adapted Margin Cosine Loss

A number of studies point out that current Visual Question Answering (VQ...
research
01/31/2020

Augmenting Visual Question Answering with Semantic Frame Information in a Multitask Learning Approach

Visual Question Answering (VQA) concerns providing answers to Natural La...
research
09/12/2018

The Wisdom of MaSSeS: Majority, Subjectivity, and Semantic Similarity in the Evaluation of VQA

We introduce MASSES, a simple evaluation metric for the task of Visual Q...

Please sign up or login with your details

Forgot password? Click here to reset