Filtered-CoPhy: Unsupervised Learning of Counterfactual Physics in Pixel Space

02/01/2022
by   Steeven Janny, et al.
3

Learning causal relationships in high-dimensional data (images, videos) is a hard task, as they are often defined on low dimensional manifolds and must be extracted from complex signals dominated by appearance, lighting, textures and also spurious correlations in the data. We present a method for learning counterfactual reasoning of physical processes in pixel space, which requires the prediction of the impact of interventions on initial conditions. Going beyond the identification of structural relationships, we deal with the challenging problem of forecasting raw video over long horizons. Our method does not require the knowledge or supervision of any ground truth positions or other object or scene properties. Our model learns and acts on a suitable hybrid latent representation based on a combination of dense features, sets of 2D keypoints and an additional latent vector per keypoint. We show that this better captures the dynamics of physical processes than purely dense or sparse representations. We introduce a new challenging and carefully designed counterfactual benchmark for predictions in pixel space and outperform strong baselines in physics-inspired ML and video prediction.

READ FULL TEXT

page 9

page 18

page 19

page 21

page 25

page 26

page 27

page 28

research
09/26/2019

COPHY: Counterfactual Learning of Physical Dynamics

Understanding causes and effects in mechanical systems is an essential c...
research
03/30/2021

Grounding Physical Concepts of Objects and Events Through Dynamic Visual Reasoning

We study the problem of dynamic visual reasoning on raw videos. This is ...
research
07/01/2020

Causal Discovery in Physical Systems from Videos

Causal discovery is at the core of human cognition. It enables us to rea...
research
10/28/2021

Dynamic Visual Reasoning by Learning Differentiable Physics Models from Video and Language

In this work, we propose a unified framework, called Visual Reasoning wi...
research
10/16/2017

A Disentangled Recognition and Nonlinear Dynamics Model for Unsupervised Learning

This paper takes a step towards temporal reasoning in a dynamically chan...
research
07/16/2021

Towards an Interpretable Latent Space in Structured Models for Video Prediction

We focus on the task of future frame prediction in video governed by und...
research
10/05/2022

On the Learning Mechanisms in Physical Reasoning

Is dynamics prediction indispensable for physical reasoning? If so, what...

Please sign up or login with your details

Forgot password? Click here to reset