Physion: Evaluating Physical Prediction from Vision in Humans and Machines

by   Daniel M. Bear, et al.

While machine learning algorithms excel at many challenging visual tasks, it is unclear that they can make predictions about commonplace real world physical events. Here, we present a visual and physical prediction benchmark that precisely measures this capability. In realistically simulating a wide variety of physical phenomena – rigid and soft-body collisions, stable multi-object configurations, rolling and sliding, projectile motion – our dataset presents a more comprehensive challenge than existing benchmarks. Moreover, we have collected human responses for our stimuli so that model predictions can be directly compared to human judgments. We compare an array of algorithms – varying in their architecture, learning objective, input-output structure, and training data – on their ability to make diverse physical predictions. We find that graph neural networks with access to the physical state best capture human behavior, whereas among models that receive only visual input, those with object-centric representations or pretraining do best but fall far short of human accuracy. This suggests that extracting physically meaningful representations of scenes is the main bottleneck to achieving human-like visual prediction. We thus demonstrate how our benchmark can identify areas for improvement and measure progress on this key aspect of physical understanding.


page 10

page 11

page 12

page 13

page 19

page 21

page 24

page 25


Physion++: Evaluating Physical Scene Understanding that Requires Online Inference of Different Physical Properties

General physical scene understanding requires more than simply localizin...

To Fall Or Not To Fall: A Visual Approach to Physical Stability Prediction

Understanding physical phenomena is a key competence that enables humans...

Sensory Optimization: Neural Networks as a Model for Understanding and Creating Art

This article is about the cognitive science of visual art. Artists creat...

ShapeStacks: Learning Vision-Based Physical Intuition for Generalised Object Stacking

Physical intuition is pivotal for intelligent agents to perform complex ...

Learning Physical Graph Representations from Visual Scenes

Convolutional Neural Networks (CNNs) have proved exceptional at learning...

Visual Stability Prediction and Its Application to Manipulation

Understanding physical phenomena is a key competence that enables humans...

An Objective Laboratory Protocol for Evaluating Cognition of Non-Human Systems Against Human Cognition

In this paper I describe and reduce to practice an objective protocol fo...

Please sign up or login with your details

Forgot password? Click here to reset