CLEVR: A Diagnostic Dataset for Compositional Language and Elementary Visual Reasoning

12/20/2016
by   Justin Johnson, et al.
0

When building artificial intelligence systems that can reason and answer questions about visual data, we need diagnostic tests to analyze our progress and discover shortcomings. Existing benchmarks for visual question answering can help, but have strong biases that models can exploit to correctly answer questions without reasoning. They also conflate multiple sources of error, making it hard to pinpoint model weaknesses. We present a diagnostic dataset that tests a range of visual reasoning abilities. It contains minimal biases and has detailed annotations describing the kind of reasoning each question requires. We use this dataset to analyze a variety of modern visual reasoning systems, providing novel insights into their abilities and limitations.

READ FULL TEXT

page 1

page 3

page 7

page 10

page 12

page 13

page 14

page 15

research
01/01/2021

DVD: A Diagnostic Dataset for Multi-step Reasoning in Video Grounded Dialogue

A video-grounded dialogue system is required to understand both dialogue...
research
05/06/2022

QLEVR: A Diagnostic Dataset for Quantificational Language and Elementary Visual Reasoning

Synthetic datasets have successfully been used to probe visual question-...
research
02/24/2022

Measuring CLEVRness: Blackbox testing of Visual Reasoning Models

How can we measure the reasoning capabilities of intelligence systems? V...
research
10/31/2019

TAB-VCR: Tags and Attributes based VCR Baselines

Reasoning is an important ability that we learn from a very early age. Y...
research
03/18/2023

Divide and Conquer: Answering Questions with Object Factorization and Compositional Reasoning

Humans have the innate capability to answer diverse questions, which is ...
research
03/30/2021

AGQA: A Benchmark for Compositional Spatio-Temporal Reasoning

Visual events are a composition of temporal actions involving actors spa...
research
04/12/2022

AGQA 2.0: An Updated Benchmark for Compositional Spatio-Temporal Reasoning

Prior benchmarks have analyzed models' answers to questions about videos...

Please sign up or login with your details

Forgot password? Click here to reset