SC-ML: Self-supervised Counterfactual Metric Learning for Debiased Visual Question Answering

04/04/2023
by   Xinyao Shu, et al.
0

Visual question answering (VQA) is a critical multimodal task in which an agent must answer questions according to the visual cue. Unfortunately, language bias is a common problem in VQA, which refers to the model generating answers only by associating with the questions while ignoring the visual content, resulting in biased results. We tackle the language bias problem by proposing a self-supervised counterfactual metric learning (SC-ML) method to focus the image features better. SC-ML can adaptively select the question-relevant visual features to answer the question, reducing the negative influence of question-irrelevant visual features on inferring answers. In addition, question-irrelevant visual features can be seamlessly incorporated into counterfactual training schemes to further boost robustness. Extensive experiments have proved the effectiveness of our method with improved results on the VQA-CP dataset. Our code will be made publicly available.

READ FULL TEXT

page 3

page 5

research
12/21/2020

Learning content and context with language bias for Visual Question Answering

Visual Question Answering (VQA) is a challenging multimodal task to answ...
research
06/25/2021

A Picture May Be Worth a Hundred Words for Visual Question Answering

How far can we go with textual representations for understanding picture...
research
05/01/2017

The Promise of Premise: Harnessing Question Premises in Visual Question Answering

In this paper, we make a simple observation that questions about images ...
research
04/05/2022

SwapMix: Diagnosing and Regularizing the Over-Reliance on Visual Context in Visual Question Answering

While Visual Question Answering (VQA) has progressed rapidly, previous w...
research
12/17/2020

Overcoming Language Priors with Self-supervised Learning for Visual Question Answering

Most Visual Question Answering (VQA) models suffer from the language pri...
research
03/14/2020

Counterfactual Samples Synthesizing for Robust Visual Question Answering

Despite Visual Question Answering (VQA) has realized impressive progress...
research
10/03/2021

Counterfactual Samples Synthesizing and Training for Robust Visual Question Answering

Today's VQA models still tend to capture superficial linguistic correlat...

Please sign up or login with your details

Forgot password? Click here to reset