Learning from Lexical Perturbations for Consistent Visual Question Answering

11/26/2020
by   Spencer Whitehead, et al.
0

Existing Visual Question Answering (VQA) models are often fragile and sensitive to input variations. In this paper, we propose a novel approach to address this issue based on modular networks, which creates two questions related by linguistic perturbations and regularizes the visual reasoning process between them to be consistent during training. We show that our framework markedly improves consistency and generalization ability, demonstrating the value of controlled linguistic perturbations as a useful and currently underutilized training and regularization tool for VQA models. We also present VQA Perturbed Pairings (VQA P2), a new, low-cost benchmark and augmentation pipeline to create controllable linguistic variations of VQA questions. Our benchmark uniquely draws from large-scale linguistic resources, avoiding human annotation effort while maintaining data quality compared to generative approaches. We benchmark existing VQA models using VQA P2 and provide robustness analysis on each type of linguistic variation.

READ FULL TEXT

page 2

page 8

research
02/15/2019

Cycle-Consistency for Robust Visual Question Answering

Despite significant progress in Visual Question Answering over the years...
research
10/13/2020

Contrast and Classify: Alternate Training for Robust VQA

Recent Visual Question Answering (VQA) models have shown impressive perf...
research
04/09/2020

Natural Perturbation for Robust Question Answering

While recent models have achieved human-level scores on many NLP dataset...
research
12/16/2019

Towards Causal VQA: Revealing and Reducing Spurious Correlations by Invariant and Covariant Semantic Editing

Despite significant success in Visual Question Answering (VQA), VQA mode...
research
09/14/2022

Finetuning Pretrained Vision-Language Models with Correlation Information Bottleneck for Robust Visual Question Answering

Benefiting from large-scale Pretrained Vision-Language Models (VL-PMs), ...
research
03/05/2023

Knowledge-Based Counterfactual Queries for Visual Question Answering

Visual Question Answering (VQA) has been a popular task that combines vi...
research
10/08/2019

Modulated Self-attention Convolutional Network for VQA

As new data-sets for real-world visual reasoning and compositional quest...

Please sign up or login with your details

Forgot password? Click here to reset