Measuring CLEVRness: Blackbox testing of Visual Reasoning Models

by   Spyridon Mouselinos, et al.

How can we measure the reasoning capabilities of intelligence systems? Visual question answering provides a convenient framework for testing the model's abilities by interrogating the model through questions about the scene. However, despite scores of various visual QA datasets and architectures, which sometimes yield even a super-human performance, the question of whether those architectures can actually reason remains open to debate. To answer this, we extend the visual question answering framework and propose the following behavioral test in the form of a two-player game. We consider black-box neural models of CLEVR. These models are trained on a diagnostic dataset benchmarking reasoning. Next, we train an adversarial player that re-configures the scene to fool the CLEVR model. We show that CLEVR models, which otherwise could perform at a human level, can easily be fooled by our agent. Our results put in doubt whether data-driven approaches can do reasoning without exploiting the numerous biases that are often present in those datasets. Finally, we also propose a controlled experiment measuring the efficiency of such models to learn and perform reasoning.


page 25

page 26

page 27

page 30

page 31

page 32

page 33

page 41


CLEVR: A Diagnostic Dataset for Compositional Language and Elementary Visual Reasoning

When building artificial intelligence systems that can reason and answer...

Exploring The Spatial Reasoning Ability of Neural Models in Human IQ Tests

Although neural models have performed impressively well on various tasks...

QLEVR: A Diagnostic Dataset for Quantificational Language and Elementary Visual Reasoning

Synthetic datasets have successfully been used to probe visual question-...

TAB-VCR: Tags and Attributes based VCR Baselines

Reasoning is an important ability that we learn from a very early age. Y...

Inferring and Executing Programs for Visual Reasoning

Existing methods for visual reasoning attempt to directly map inputs to ...

Measuring Machine Intelligence Through Visual Question Answering

As machines have become more intelligent, there has been a renewed inter...

ScienceWorld: Is your Agent Smarter than a 5th Grader?

This paper presents a new benchmark, ScienceWorld, to test agents' scien...

Please sign up or login with your details

Forgot password? Click here to reset