3D-OES: Viewpoint-Invariant Object-Factorized Environment Simulators

11/12/2020
by   Hsiao-Yu Fish Tung, et al.
1

We propose an action-conditioned dynamics model that predicts scene changes caused by object and agent interactions in a viewpoint-invariant 3D neural scene representation space, inferred from RGB-D videos. In this 3D feature space, objects do not interfere with one another and their appearance persists over time and across viewpoints. This permits our model to predict future scenes long in the future by simply "moving" 3D object features based on cumulative object motion predictions. Object motion predictions are computed by a graph neural network that operates over the object features extracted from the 3D neural scene representation. Our model's simulations can be decoded by a neural renderer into2D image views from any desired viewpoint, which aids the interpretability of our latent 3D simulation space. We show our model generalizes well its predictions across varying number and appearances of interacting objects as well as across camera viewpoints, outperforming existing 2D and 3D dynamics models. We further demonstrate sim-to-real transfer of the learnt dynamics by applying our model trained solely in simulation to model-based control for pushing objects to desired locations under clutter on a real robotic setup

READ FULL TEXT
research
06/07/2021

SIMONe: View-Invariant, Temporally-Abstracted Object Representations via Unsupervised Video Decomposition

To help agents reason about scenes in terms of their building blocks, we...
research
02/24/2022

Learning Multi-Object Dynamics with Compositional Neural Radiance Fields

We present a method to learn compositional predictive models from image ...
research
09/07/2018

Neural Allocentric Intuitive Physics Prediction from Real Videos

Humans are able to make rich predictions about the future dynamics of ph...
research
06/10/2019

Embodied View-Contrastive 3D Feature Learning

Humans can effortlessly imagine the occluded side of objects in a photog...
research
07/08/2021

3D Neural Scene Representations for Visuomotor Control

Humans have a strong intuitive understanding of the 3D environment aroun...
research
12/31/2018

Learning Spatial Common Sense with Geometry-Aware Recurrent Networks

We integrate two powerful ideas, geometry and deep visual representation...
research
07/16/2021

Towards an Interpretable Latent Space in Structured Models for Video Prediction

We focus on the task of future frame prediction in video governed by und...

Please sign up or login with your details

Forgot password? Click here to reset