CARETS: A Consistency And Robustness Evaluative Test Suite for VQA

03/15/2022
by   Carlos E. Jimenez, et al.
2

We introduce CARETS, a systematic test suite to measure consistency and robustness of modern VQA models through a series of six fine-grained capability tests. In contrast to existing VQA test sets, CARETS features balanced question generation to create pairs of instances to test models, with each pair focusing on a specific capability such as rephrasing, logical symmetry or image obfuscation. We evaluate six modern VQA systems on CARETS and identify several actionable weaknesses in model comprehension, especially with concepts such as negation, disjunction, or hypernym invariance. Interestingly, even the most sophisticated models are sensitive to aspects such as swapping the order of terms in a conjunction or varying the number of answer choices mentioned in the question. We release CARETS to be used as an extensible tool for evaluating multi-modal model robustness.

READ FULL TEXT

page 2

page 7

page 11

research
07/08/2020

IQ-VQA: Intelligent Visual Question Answering

Even though there has been tremendous progress in the field of Visual Qu...
research
09/10/2019

Sunny and Dark Outside?! Improving Answer Consistency in VQA through Entailed Question Generation

While models for Visual Question Answering (VQA) have steadily improved ...
research
05/04/2022

All You May Need for VQA are Image Captions

Visual Question Answering (VQA) has benefited from increasingly sophisti...
research
03/16/2018

Inverse Visual Question Answering: A New Benchmark and VQA Diagnosis Tool

In recent years, visual question answering (VQA) has become topical. The...
research
10/10/2022

Language Prior Is Not the Only Shortcut: A Benchmark for Shortcut Learning in VQA

Visual Question Answering (VQA) models are prone to learn the shortcut s...
research
10/11/2021

Beyond Accuracy: A Consolidated Tool for Visual Question Answering Benchmarking

On the way towards general Visual Question Answering (VQA) systems that ...
research
06/08/2021

Are VQA Systems RAD? Measuring Robustness to Augmented Data with Focused Interventions

Deep learning algorithms have shown promising results in visual question...

Please sign up or login with your details

Forgot password? Click here to reset