Towards Robust Visual Question Answering: Making the Most of Biased Samples via Contrastive Learning

10/10/2022
by   Qingyi Si, et al.
0

Models for Visual Question Answering (VQA) often rely on the spurious correlations, i.e., the language priors, that appear in the biased samples of training set, which make them brittle against the out-of-distribution (OOD) test data. Recent methods have achieved promising progress in overcoming this problem by reducing the impact of biased samples on model training. However, these models reveal a trade-off that the improvements on OOD data severely sacrifice the performance on the in-distribution (ID) data (which is dominated by the biased samples). Therefore, we propose a novel contrastive learning approach, MMBS, for building robust VQA models by Making the Most of Biased Samples. Specifically, we construct positive samples for contrastive learning by eliminating the information related to spurious correlation from the original training samples and explore several strategies to use the constructed positive samples for training. Instead of undermining the importance of biased samples in model training, our approach precisely exploits the biased samples for unbiased information that contributes to reasoning. The proposed method is compatible with various VQA backbones. We validate our contributions by achieving competitive performance on the OOD dataset VQA-CP v2 while preserving robust performance on the ID dataset VQA v2.

READ FULL TEXT

page 1

page 8

research
07/27/2021

Greedy Gradient Ensemble for Robust Visual Question Answering

Language bias is a critical issue in Visual Question Answering (VQA), wh...
research
09/18/2020

MUTANT: A Training Paradigm for Out-of-Distribution Generalization in Visual Question Answering

While progress has been made on the visual question answering leaderboar...
research
10/11/2022

Efficient debiasing with contrastive weight pruning

Neural networks are often biased to spuriously correlated features that ...
research
05/29/2021

LPF: A Language-Prior Feedback Objective Function for De-biased Visual Question Answering

Most existing Visual Question Answering (VQA) systems tend to overly rel...
research
10/03/2021

Counterfactual Samples Synthesizing and Training for Robust Visual Question Answering

Today's VQA models still tend to capture superficial linguistic correlat...
research
08/24/2022

UniCon: Unidirectional Split Learning with Contrastive Loss for Visual Question Answering

Visual question answering (VQA) that leverages multi-modality data has a...
research
11/10/2022

Unbiased Supervised Contrastive Learning

Many datasets are biased, namely they contain easy-to-learn features tha...

Please sign up or login with your details

Forgot password? Click here to reset