Measuring CLEVRness: Blackbox testing of Visual Reasoning Models

02/24/2022
by   Spyridon Mouselinos, et al.
0

How can we measure the reasoning capabilities of intelligence systems? Visual question answering provides a convenient framework for testing the model's abilities by interrogating the model through questions about the scene. However, despite scores of various visual QA datasets and architectures, which sometimes yield even a super-human performance, the question of whether those architectures can actually reason remains open to debate. To answer this, we extend the visual question answering framework and propose the following behavioral test in the form of a two-player game. We consider black-box neural models of CLEVR. These models are trained on a diagnostic dataset benchmarking reasoning. Next, we train an adversarial player that re-configures the scene to fool the CLEVR model. We show that CLEVR models, which otherwise could perform at a human level, can easily be fooled by our agent. Our results put in doubt whether data-driven approaches can do reasoning without exploiting the numerous biases that are often present in those datasets. Finally, we also propose a controlled experiment measuring the efficiency of such models to learn and perform reasoning.

READ FULL TEXT

page 25

page 26

page 27

page 30

page 31

page 32

page 33

page 41

research
12/20/2016

CLEVR: A Diagnostic Dataset for Compositional Language and Elementary Visual Reasoning

When building artificial intelligence systems that can reason and answer...
research
04/11/2020

Exploring The Spatial Reasoning Ability of Neural Models in Human IQ Tests

Although neural models have performed impressively well on various tasks...
research
05/06/2022

QLEVR: A Diagnostic Dataset for Quantificational Language and Elementary Visual Reasoning

Synthetic datasets have successfully been used to probe visual question-...
research
10/31/2019

TAB-VCR: Tags and Attributes based VCR Baselines

Reasoning is an important ability that we learn from a very early age. Y...
research
05/10/2017

Inferring and Executing Programs for Visual Reasoning

Existing methods for visual reasoning attempt to directly map inputs to ...
research
08/31/2016

Measuring Machine Intelligence Through Visual Question Answering

As machines have become more intelligent, there has been a renewed inter...
research
03/14/2022

ScienceWorld: Is your Agent Smarter than a 5th Grader?

This paper presents a new benchmark, ScienceWorld, to test agents' scien...

Please sign up or login with your details

Forgot password? Click here to reset