ComPhy: Compositional Physical Reasoning of Objects and Events from Videos

by   Zhenfang Chen, et al.

Objects' motions in nature are governed by complex interactions and their properties. While some properties, such as shape and material, can be identified via the object's visual appearances, others like mass and electric charge are not directly visible. The compositionality between the visible and hidden properties poses unique challenges for AI models to reason from the physical world, whereas humans can effortlessly infer them with limited observations. Existing studies on video reasoning mainly focus on visually observable elements such as object appearance, movement, and contact interaction. In this paper, we take an initial step to highlight the importance of inferring the hidden physical properties not directly observable from visual appearances, by introducing the Compositional Physical Reasoning (ComPhy) dataset. For a given set of objects, ComPhy includes few videos of them moving and interacting under different initial conditions. The model is evaluated based on its capability to unravel the compositional hidden properties, such as mass and charge, and use this knowledge to answer a set of questions posted on one of the videos. Evaluation results of several state-of-the-art video reasoning models on ComPhy show unsatisfactory performance as they fail to capture these hidden properties. We further propose an oracle neural-symbolic framework named Compositional Physics Learner (CPL), combining visual perception, physical property learning, dynamic prediction, and symbolic execution into a unified framework. CPL can effectively identify objects' physical properties from their interactions and predict their dynamics to answer questions.


page 2

page 4

page 7

page 16

page 17

page 18


CRIPP-VQA: Counterfactual Reasoning about Implicit Physical Properties via Video Question Answering

Videos often capture objects, their visible properties, their motion, an...

A Compositional Object-Based Approach to Learning Physical Dynamics

We present the Neural Physics Engine (NPE), a framework for learning sim...

Visually Indicated Sounds

Objects make distinctive sounds when they are hit or scratched. These so...

SPACE: A Simulator for Physical Interactions and Causal Learning in 3D Environments

Recent advancements in deep learning, computer vision, and embodied AI h...

Divide and Conquer: Answering Questions with Object Factorization and Compositional Reasoning

Humans have the innate capability to answer diverse questions, which is ...

Structured Object-Aware Physics Prediction for Video Modeling and Planning

When humans observe a physical system, they can easily locate objects, u...

Relational Neural Expectation Maximization: Unsupervised Discovery of Objects and their Interactions

Common-sense physical reasoning is an essential ingredient for any intel...

Please sign up or login with your details

Forgot password? Click here to reset