Overcoming Language Priors with Self-supervised Learning for Visual Question Answering

12/17/2020
by   Xi Zhu, et al.
0

Most Visual Question Answering (VQA) models suffer from the language prior problem, which is caused by inherent data biases. Specifically, VQA models tend to answer questions (e.g., what color is the banana?) based on the high-frequency answers (e.g., yellow) ignoring image contents. Existing approaches tackle this problem by creating delicate models or introducing additional visual annotations to reduce question dependency while strengthening image dependency. However, they are still subject to the language prior problem since the data biases have not been even alleviated. In this paper, we introduce a self-supervised learning framework to solve this problem. Concretely, we first automatically generate labeled data to balance the biased data, and propose a self-supervised auxiliary task to utilize the balanced data to assist the base VQA model to overcome language priors. Our method can compensate for the data biases by generating balanced data without introducing external annotations. Experimental results show that our method can significantly outperform the state-of-the-art, improving the overall accuracy from 49.50 words, we can increase the performance of annotation-based methods by 16 without using external annotations.

READ FULL TEXT

page 3

page 6

research
06/24/2019

RUBi: Reducing Unimodal Biases in Visual Question Answering

Visual Question Answering (VQA) is the task of answering questions about...
research
10/08/2018

Overcoming Language Priors in Visual Question Answering with Adversarial Regularization

Modern Visual Question Answering (VQA) models have been shown to rely he...
research
07/24/2022

Visual Perturbation-aware Collaborative Learning for Overcoming the Language Prior Problem

Several studies have recently pointed that existing Visual Question Answ...
research
12/02/2016

Making the V in VQA Matter: Elevating the Role of Image Understanding in Visual Question Answering

Problems at the intersection of vision and language are of significant i...
research
07/27/2021

Greedy Gradient Ensemble for Robust Visual Question Answering

Language bias is a critical issue in Visual Question Answering (VQA), wh...
research
04/04/2023

SC-ML: Self-supervised Counterfactual Metric Learning for Debiased Visual Question Answering

Visual question answering (VQA) is a critical multimodal task in which a...
research
12/04/2020

Self-Supervised VQA: Answering Visual Questions using Images and Captions

Methodologies for training VQA models assume the availability of dataset...

Please sign up or login with your details

Forgot password? Click here to reset