Towards an Interpretable Latent Space in Structured Models for Video Prediction

07/16/2021
by   Rushil Gupta, et al.
8

We focus on the task of future frame prediction in video governed by underlying physical dynamics. We work with models which are object-centric, i.e., explicitly work with object representations, and propagate a loss in the latent space. Specifically, our research builds on recent work by Kipf et al. <cit.>, which predicts the next state via contrastive learning of object interactions in a latent space using a Graph Neural Network. We argue that injecting explicit inductive bias in the model, in form of general physical laws, can help not only make the model more interpretable, but also improve the overall prediction of model. As a natural by-product, our model can learn feature maps which closely resemble actual object positions in the image, without having any explicit supervision about the object positions at the training time. In comparison with earlier works <cit.>, which assume a complete knowledge of the dynamics governing the motion in the form of a physics engine, we rely only on the knowledge of general physical laws, such as, world consists of objects, which have position and velocity. We propose an additional decoder based loss in the pixel space, imposed in a curriculum manner, to further refine the latent space predictions. Experiments in multiple different settings demonstrate that while Kipf et al. model is effective at capturing object interactions, our model can be significantly more effective at localising objects, resulting in improved performance in 3 out of 4 domains that we experiment with. Additionally, our model can learn highly intrepretable feature maps, resembling actual object positions.

READ FULL TEXT

page 6

page 9

page 10

research
02/24/2022

Learning Multi-Object Dynamics with Compositional Neural Radiance Fields

We present a method to learn compositional predictive models from image ...
research
06/03/2022

Reinforcement Learning with Neural Radiance Fields

It is a long-standing problem to find effective representations for trai...
research
10/06/2019

Structured Object-Aware Physics Prediction for Video Modeling and Planning

When humans observe a physical system, they can easily locate objects, u...
research
05/19/2020

Symbolic Pregression: Discovering Physical Laws from Raw Distorted Video

We present a method for unsupervised learning of equations of motion for...
research
05/04/2022

Zero-Episode Few-Shot Contrastive Predictive Coding: Solving intelligence tests without prior training

Video prediction models often combine three components: an encoder from ...
research
11/12/2020

3D-OES: Viewpoint-Invariant Object-Factorized Environment Simulators

We propose an action-conditioned dynamics model that predicts scene chan...
research
02/01/2022

Filtered-CoPhy: Unsupervised Learning of Counterfactual Physics in Pixel Space

Learning causal relationships in high-dimensional data (images, videos) ...

Please sign up or login with your details

Forgot password? Click here to reset