SOrT-ing VQA Models : Contrastive Gradient Learning for Improved Consistency

10/20/2020
by   Sameer Dharur, et al.
9

Recent research in Visual Question Answering (VQA) has revealed state-of-the-art models to be inconsistent in their understanding of the world – they answer seemingly difficult questions requiring reasoning correctly but get simpler associated sub-questions wrong. These sub-questions pertain to lower level visual concepts in the image that models ideally should understand to be able to answer the higher level question correctly. To address this, we first present a gradient-based interpretability approach to determine the questions most strongly correlated with the reasoning question on an image, and use this to evaluate VQA models on their ability to identify the relevant sub-questions needed to answer a reasoning question. Next, we propose a contrastive gradient learning based approach called Sub-question Oriented Tuning (SOrT) which encourages models to rank relevant sub-questions higher than irrelevant questions for an <image, reasoning-question> pair. We show that SOrT improves model consistency by upto 6.5 baselines, while also improving visual grounding.

READ FULL TEXT

page 2

page 5

research
01/20/2020

SQuINTing at VQA Models: Interrogating VQA Models with Sub-Questions

Existing VQA datasets contain questions with varying levels of complexit...
research
04/02/2022

Co-VQA : Answering by Interactive Sub Question Sequence

Most existing approaches to Visual Question Answering (VQA) answer quest...
research
06/21/2016

Question Relevance in VQA: Identifying Non-Visual And False-Premise Questions

Visual Question Answering (VQA) is the task of answering natural-languag...
research
06/27/2022

Consistency-preserving Visual Question Answering in Medical Imaging

Visual Question Answering (VQA) models take an image and a natural-langu...
research
02/25/2019

GQA: a new dataset for compositional question answering over real-world images

We introduce GQA, a new dataset for real-world visual reasoning and comp...
research
07/19/2023

A reinforcement learning approach for VQA validation: an application to diabetic macular edema grading

Recent advances in machine learning models have greatly increased the pe...
research
01/13/2018

Benchmark Visual Question Answer Models by using Focus Map

Inferring and Executing Programs for Visual Reasoning proposes a model f...

Please sign up or login with your details

Forgot password? Click here to reset