PTR: A Benchmark for Part-based Conceptual, Relational, and Physical Reasoning

12/09/2021
by   Yining Hong, et al.
28

A critical aspect of human visual perception is the ability to parse visual scenes into individual objects and further into object parts, forming part-whole hierarchies. Such composite structures could induce a rich set of semantic concepts and relations, thus playing an important role in the interpretation and organization of visual signals as well as for the generalization of visual perception and reasoning. However, existing visual reasoning benchmarks mostly focus on objects rather than parts. Visual reasoning based on the full part-whole hierarchy is much more challenging than object-centric reasoning due to finer-grained concepts, richer geometry relations, and more complex physics. Therefore, to better serve for part-based conceptual, relational and physical reasoning, we introduce a new large-scale diagnostic visual reasoning dataset named PTR. PTR contains around 70k RGBD synthetic images with ground truth object and part level annotations regarding semantic instance segmentation, color attributes, spatial and geometric relationships, and certain physical properties such as stability. These images are paired with 700k machine-generated questions covering various types of reasoning types, making them a good testbed for visual reasoning models. We examine several state-of-the-art visual reasoning models on this dataset and observe that they still make many surprising mistakes in situations where humans can easily infer the correct answer. We believe this dataset will open up new opportunities for part-based reasoning.

READ FULL TEXT

page 2

page 18

page 19

page 20

page 21

page 22

page 23

research
03/30/2021

Grounding Physical Concepts of Objects and Events Through Dynamic Visual Reasoning

We study the problem of dynamic visual reasoning on raw videos. This is ...
research
04/24/2022

RelViT: Concept-guided Vision Transformer for Visual Relational Reasoning

Reasoning about visual relationships is central to how humans interpret ...
research
04/06/2020

SHOP-VRB: A Visual Reasoning Benchmark for Object Perception

In this paper we present an approach and a benchmark for visual reasonin...
research
11/16/2021

A Benchmark for Modeling Violation-of-Expectation in Physical Reasoning Across Event Categories

Recent work in computer vision and cognitive reasoning has given rise to...
research
02/02/2016

Are Elephants Bigger than Butterflies? Reasoning about Sizes of Objects

Human vision greatly benefits from the information about sizes of object...
research
04/25/2020

Machine Number Sense: A Dataset of Visual Arithmetic Problems for Abstract and Relational Reasoning

As a comprehensive indicator of mathematical thinking and intelligence, ...
research
10/12/2021

AVoE: A Synthetic 3D Dataset on Understanding Violation of Expectation for Artificial Cognition

Recent work in cognitive reasoning and computer vision has engendered an...

Please sign up or login with your details

Forgot password? Click here to reset