Are VQA Systems RAD? Measuring Robustness to Augmented Data with Focused Interventions

06/08/2021
by   Daniel Rosenberg, et al.
0

Deep learning algorithms have shown promising results in visual question answering (VQA) tasks, but a more careful look reveals that they often do not understand the rich signal they are being fed with. To understand and better measure the generalization capabilities of VQA systems, we look at their robustness to counterfactually augmented data. Our proposed augmentations are designed to make a focused intervention on a specific property of the question such that the answer changes. Using these augmentations, we propose a new robustness measure, Robustness to Augmented Data (RAD), which measures the consistency of model predictions between original and augmented examples. Through extensive experimentation, we show that RAD, unlike classical accuracy measures, can quantify when state-of-the-art systems are not robust to counterfactuals. We find substantial failure cases which reveal that current VQA systems are still brittle. Finally, we connect between robustness and generalization, demonstrating the predictive power of RAD for performance on unseen augmentations.

READ FULL TEXT
research
12/16/2019

Towards Causal VQA: Revealing and Reducing Spurious Correlations by Invariant and Covariant Semantic Editing

Despite significant success in Visual Question Answering (VQA), VQA mode...
research
09/14/2017

Robustness Analysis of Visual QA Models by Basic Questions

Visual Question Answering (VQA) models should have both high robustness ...
research
09/18/2020

MUTANT: A Training Paradigm for Out-of-Distribution Generalization in Visual Question Answering

While progress has been made on the visual question answering leaderboar...
research
10/09/2018

Knowing Where to Look? Analysis on Attention of Visual Question Answering System

Attention mechanisms have been widely used in Visual Question Answering ...
research
10/11/2021

Beyond Accuracy: A Consolidated Tool for Visual Question Answering Benchmarking

On the way towards general Visual Question Answering (VQA) systems that ...
research
11/30/2019

Assessing the Robustness of Visual Question Answering

Deep neural networks have been playing an essential role in the task of ...
research
03/15/2022

CARETS: A Consistency And Robustness Evaluative Test Suite for VQA

We introduce CARETS, a systematic test suite to measure consistency and ...

Please sign up or login with your details

Forgot password? Click here to reset