Visual Perturbation-aware Collaborative Learning for Overcoming the Language Prior Problem

07/24/2022
by   Yudong Han, et al.
19

Several studies have recently pointed that existing Visual Question Answering (VQA) models heavily suffer from the language prior problem, which refers to capturing superficial statistical correlations between the question type and the answer whereas ignoring the image contents. Numerous efforts have been dedicated to strengthen the image dependency by creating the delicate models or introducing the extra visual annotations. However, these methods cannot sufficiently explore how the visual cues explicitly affect the learned answer representation, which is vital for language reliance alleviation. Moreover, they generally emphasize the class-level discrimination of the learned answer representation, which overlooks the more fine-grained instance-level patterns and demands further optimization. In this paper, we propose a novel collaborative learning scheme from the viewpoint of visual perturbation calibration, which can better investigate the fine-grained visual effects and mitigate the language prior problem by learning the instance-level characteristics. Specifically, we devise a visual controller to construct two sorts of curated images with different perturbation extents, based on which the collaborative learning of intra-instance invariance and inter-instance discrimination is implemented by two well-designed discriminators. Besides, we implement the information bottleneck modulator on latent space for further bias alleviation and representation calibration. We impose our visual perturbation-aware framework to three orthodox baselines and the experimental results on two diagnostic VQA-CP benchmark datasets evidently demonstrate its effectiveness. In addition, we also justify its robustness on the balanced VQA benchmark.

READ FULL TEXT

page 1

page 2

page 5

page 11

page 13

research
10/30/2020

Loss-rescaling VQA: Revisiting Language Prior Problem from a Class-imbalance View

Recent studies have pointed out that many well-developed Visual Question...
research
12/17/2020

Overcoming Language Priors with Self-supervised Learning for Visual Question Answering

Most Visual Question Answering (VQA) models suffer from the language pri...
research
09/18/2022

Overcoming Language Priors in Visual Question Answering via Distinguishing Superficially Similar Instances

Despite the great progress of Visual Question Answering (VQA), current V...
research
05/05/2021

AdaVQA: Overcoming Language Priors with Adapted Margin Cosine Loss

A number of studies point out that current Visual Question Answering (VQ...
research
05/13/2019

Quantifying and Alleviating the Language Prior Problem in Visual Question Answering

Benefiting from the advancement of computer vision, natural language pro...
research
03/09/2023

Toward Unsupervised Realistic Visual Question Answering

The problem of realistic VQA (RVQA), where a model has to reject unanswe...

Please sign up or login with your details

Forgot password? Click here to reset