Dynamic Visual Reasoning by Learning Differentiable Physics Models from Video and Language

10/28/2021
by   Mingyu Ding, et al.
8

In this work, we propose a unified framework, called Visual Reasoning with Differ-entiable Physics (VRDP), that can jointly learn visual concepts and infer physics models of objects and their interactions from videos and language. This is achieved by seamlessly integrating three components: a visual perception module, a concept learner, and a differentiable physics engine. The visual perception module parses each video frame into object-centric trajectories and represents them as latent scene representations. The concept learner grounds visual concepts (e.g., color, shape, and material) from these object-centric representations based on the language, thus providing prior knowledge for the physics engine. The differentiable physics model, implemented as an impulse-based differentiable rigid-body simulator, performs differentiable physical simulation based on the grounded concepts to infer physical properties, such as mass, restitution, and velocity, by fitting the simulated trajectories into the video observations. Consequently, these learned concepts and physical models can explain what we have seen and imagine what is about to happen in future and counterfactual scenarios. Integrating differentiable physics into the dynamic reasoning framework offers several appealing benefits. More accurate dynamics prediction in learned physics models enables state-of-the-art performance on both synthetic and real-world benchmarks while still maintaining high transparency and interpretability; most notably, VRDP improves the accuracy of predictive and counterfactual questions by 4.5 data-efficient: physical parameters can be optimized from very few videos, and even a single video can be sufficient. Finally, with all physical parameters inferred, VRDP can quickly learn new concepts from a few examples.

READ FULL TEXT

page 4

page 10

page 18

page 19

page 21

research
03/30/2021

Grounding Physical Concepts of Objects and Events Through Dynamic Visual Reasoning

We study the problem of dynamic visual reasoning on raw videos. This is ...
research
04/15/2019

Bounce and Learn: Modeling Scene Dynamics with Real-World Bounces

We introduce an approach to model surface properties governing bounces i...
research
05/27/2019

Physics-as-Inverse-Graphics: Joint Unsupervised Learning of Objects and Physics from Video

We aim to perform unsupervised discovery of objects and their states suc...
research
06/02/2022

Predicting Physical Object Properties from Video

We present a novel approach to estimating physical properties of objects...
research
02/01/2022

Filtered-CoPhy: Unsupervised Learning of Counterfactual Physics in Pixel Space

Learning causal relationships in high-dimensional data (images, videos) ...
research
11/30/2021

DiffSDFSim: Differentiable Rigid-Body Dynamics With Implicit Shapes

Differentiable physics is a powerful tool in computer vision and robotic...
research
03/03/2023

Intrinsic Physical Concepts Discovery with Object-Centric Predictive Models

The ability to discover abstract physical concepts and understand how th...

Please sign up or login with your details

Forgot password? Click here to reset