DeepAI AI Chat
Log In Sign Up

Object-centric Forward Modeling for Model Predictive Control

by   Yufei Ye, et al.
Carnegie Mellon University

We present an approach to learn an object-centric forward model, and show that this allows us to plan for sequences of actions to achieve distant desired goals. We propose to model a scene as a collection of objects, each with an explicit spatial location and implicit visual feature, and learn to model the effects of actions using random interaction data. Our model allows capturing the robot-object and object-object interactions, and leads to more sample-efficient and accurate predictions. We show that this learned model can be leveraged to search for action sequences that lead to desired goal configurations, and that in conjunction with a learned correction module, this allows for robust closed loop execution. We present experiments both in simulation and the real world, and show that our approach improves over alternate implicit or pixel-space forward models. Please see our project page ( for result videos.


page 6

page 7

page 8

page 13


SORNet: Spatial Object-Centric Representations for Sequential Manipulation

Sequential manipulation tasks require a robot to perceive the state of a...

Robotic Visuomotor Control with Unsupervised Forward Model Learned from Videos

Learning an accurate model of the environment is essential for model-bas...

Scene Editing as Teleoperation: A Case Study in 6DoF Kit Assembly

Studies in robot teleoperation have been centered around action specific...

Hindsight for Foresight: Unsupervised Structured Dynamics Models from Physical Interaction

A key challenge for an agent learning to interact with the world is to r...

Uncertainty Averse Pushing with Model Predictive Path Integral Control

Planning robust robot manipulation requires good forward models that ena...

Low-Cost Scene Modeling using a Density Function Improves Segmentation Performance

We propose a low cost and effective way to combine a free simulation sof...

iPOKE: Poking a Still Image for Controlled Stochastic Video Synthesis

How would a static scene react to a local poke? What are the effects on ...