I Introduction
Physics engines are important for planning and control in robotics. To plan for a task, a robot may use a physics engine to simulate the effects of different actions on the environment and then select a sequence of them to reach a desired goal state. The utility of the resulting action sequence depends on the accuracy of the physics engine’s predictions, so a highfidelity physics engine is an important component in robot planning. Most physics engines used in robotics (such as Mujoco [1] and Bullet [2]) use approximate contact models, and recent studies [3, 4, 5] have demonstrated discrepancies between their predictions and realworld data. These mismatches make contactrich tasks hard to solve using these physics engines.
One way to increase the robustness of controllers and policies resulting from physics engines is to add perturbations to parameters that are difficult to estimate accurately (e.g., frictional variation as a function of position
[4]). This approach leads to an ensemble of simulated predictions that covers a range of possible outcomes. Using the ensemble allows to take more conservative actions and increases robustness, but does not address the limitation of using learned, approximate models [6, 7].To correct for model errors due to approximations, we learn a residual model between realworld measurements and a physics engine’s predictions. Combining the physics engine and residual model yields a dataaugmented physics engine. This strategy is effective because learning a residual error of a reasonable approximation (here from a physics engine) is easier and more sample efficient than learning from scratch. This approach has been shown to be more data efficient, have better generalization capabilities, and outperform its purely analytical or datadriven counterparts [8, 9, 10, 11].
Most residualbased approaches assume a fixed number of objects in the world states. This means they cannot be applied to states with a varied number of objects or generalize what they learn for one object to other similar ones. This problem has been addressed by approaches that use graphstructured network models, such as interaction networks [12] and neural physics engines [13]. These methods are effective at generalizing over objects, modeling interactions, and handling variable numbers of objects. However, as they are purely datadriven, in practice they require a large number of training examples to arrive at a good model.
In this paper, we propose simulatoraugmented interaction networks (SAIN), incorporating interaction networks into a physical simulator for complex, realworld control problems. Specifically, we show:

Sampleefficient residual learning and improved prediction accuracy relative to the physics engine,

Accurate predictions for the dynamics and interaction of novel arrangements and numbers of objects, and the

Utility of the learned residual model for control in highly underactuated planar pushing tasks.
We demonstrate SAIN’s performance on the experimental setup depicted in Fig. 1. Here, the robot’s objective is to guide the second disk to a goal by pushing on the first. This task is challenging due to the presence of multiple complex frictional interactions and underactuation [14]. We demonstrate the stepbystep deployment of SAIN, from training in simulation to augmentation with realworld data, and finally control.
Ii Related Work
Iia Learning Contact Dynamics
In the field of contact dynamics, researchers have looked towards datadriven techniques to complement analytical models and/or directly learn dynamics. For example, Byravan and Fox [15] designed neural nets to predict rigidbody motions for planar pushing. Their approach does not exploit explicit physical knowledge. Kloss et al. [11] used neural net predictions as input to an analytical model; the output of the analytical model is used as the prediction. Here, the neural network learns to maximize the analytical model’s performance. Fazeli et al. [8] also studied learning a residual model for predicting planar impacts. Zhou et al. [16] employed a dataefficient algorithm to capture the frictional interaction between an object and a support surface. They later extended it for simulating parametric variability in planar pushing and grasping [17].
The paper closest to ours is that from Ajay et al. [9], where they used the analytical model as an approximation to the push outcomes, and learned a residual neural model that makes corrections to its output. In contrast, our paper makes two key innovations: first, instead of using a feedforward network to model the dynamics of a single object, we employ an objectbased network to learn residuals. Objectbased networks build upon explicit object representations and learn how they interact; this enables capturing multiobject interactions. Second, we demonstrate that such a hybrid dynamics model can be used for control tasks both in simulation and on a real robot.
IiB Differentiable Physical Simulators
There has been an increasing interest in building differentiable physics simulators [18]. For example, Degrave et al. [19] proposed to directly solve differentiable equations. Such systems have been deployed for manipulation and planning for tool use [20]. Battaglia et al. [12] and Chang et al. [13] have both studied learning objectbased, differentiable neural simulators. Their systems explicitly model the state of each object and learn to predict future states based on object interactions. In this paper, we combine such a learned objectbased simulator with a physics engine for better prediction and for controlling realworld objects.
IiC Control with a Learned Simulator
Recent papers have explored modelpredictive control with deep networks [21, 22, 23, 24, 25]. These approaches learn an abstractstate transition function, not an explicit model of the environment [26, 27]. Eventually, they apply the learned value function or model to guide policy network training. In contrast, we employ an objectbased physical simulator that takes raw object states (e.g., velocity, position) as input. Hogan et al. [28] also learned a residual model with an analytical model for modelpredictive control, but their learned model is a taskspecific Gaussian Process, while our model has the ability to generalize to new object shapes and materials.
A few papers have exploited the power of interaction networks for planning and control, mostly using interaction networks to help training policy networks via imagination—rolling out approximate predictions [29, 30, 31]. In contrast, we use interaction networks as a learned dynamics simulator, combine it with a physics engine, and directly search for actions in realworld control problems. Recently, SanchezGonzalez et al. [32] also used interaction networks in control, though their model does not take into account explicit physical knowledge, and its performance is only demonstrated in simulation.
Iii Method
In this section, we describe SAIN’s formulation and components. We also present our Model Predictive Controller (MPC) which uses SAIN to perform the pushing task.
Iiia Formulation
Let be the state space and be the action space. A dynamics model is a function that predicts the next state given the current action and state: .
There are two general types of dynamics models: analytical (Fig. 2a) and datadriven (Fig. 2b). Our goal is to learn a hybrid dynamics model that combines the two (Fig. 2c). Here, conditioned on the stateaction pair, the datadriven model learns the discrepancy between analytical model predictions and realworld data (i.e. the residual). Specifically, let represent the hybrid dynamics model, represent the physics engine, and represent the residual component. We have . Intuitively, the residual model refines the physics engine’s guess using the current state and action.
For longterm prediction, let represent the recurrent hybrid dynamics model (Fig. 2d). If is the initial state, the action at time , the prediction by the physics engine at time and the prediction at time , then
(1)  
(2) 
For training, we collect observational data and then solve the following optimization problem:
(3) 
where is the weight for the regularization term.
In this study, we choose to use a recurrent parametric model over a nonrecurrent representation for two reasons. First, nonrecurrent models are trained on observation data to make singlestep predictions. Consequently, errors in prediction compound over a sequence of steps. Second, since these models recursively use their own predictions, the input data given during the simulation phase will have a different distribution than the input data during the training phase. This creates a data distribution mismatch between the training and test phases.
IiiB Interaction Networks
We use interaction networks [12] as the datadriven model for multiobject interaction. An interaction network consists of 2 neural nets: and . The network calculates pairwise forces between objects and the network calculates the next state of an object, based on the states of the objects it is interacting with and the nature of the interactions.
The original version of interaction networks was trained to make a singlestep prediction; for improved accuracy, we extend them to make multistep predictions. Let be the state at time , where is the state for object at time . Similarly, let be the predicted state at time where is the predicted state for object at time . In our work, where is the pose of object at time step , the velocity of object at time step , the mass of object i and the radius of object i. Similarly, where is the predicted pose of object at time step and the predicted velocity of object at time step . Note that we do not predict any changes to static object properties such as mass and radius. Also, we note that while is a set of objects, the state of any individual object,
, is a vector. Now, let
be the action applied to object at time . The equations for the interaction network are:(4)  
(5)  
(6)  
(7) 
IiiC SimulatorAugmented Interaction Networks (SAIN)
A simulatoraugmented interaction network extends an interaction network, where and now take in the prediction of a physics engine, . We now learn the residual between the physics engine and the real world. Let be the state at time and be the state for object at time predicted by the physics engine. The equations for SAIN are
(8)  
(9)  
(10)  
(11)  
(12) 
These equations describe a singlestep prediction. For multistep prediction, we use the same equations by providing the true state at and predicted state at as input.
IiiD Control Algorithm
Our action space has two free parameters: the point where the robot contacts the first disk and the direction of the push. In our experiments, a successful execution requires searching for a trajectory of about 50 actions. Due to the size of the search space, we use an approximate receding horizon control algorithm with our dynamics model. The search algorithm maintains a priority queue of action sequences based on the heuristic below. For each expansion, let
be the current state and be the predicted state after steps with actions . Let be the goal state. We choose the control strategy that minimizes the the cost function and insert the new action sequence into the queue.Iv Experiments
We demonstrate SAIN on a challenging planar manipulation task both in simulation and in the realworld. We further evaluate how our model generalizes to handle control tasks that involve objects of new materials and shapes.
Iva Task
In this manipulation task, we are given two disks with different mass and radii. Our goal is to guide the second disk to a target location, but are constrained to push only the first disk. Here, a point in the state space is factored into a set of two object states, , where each is an element of object state space . The object state includes the mass, 2D position, rotation, velocity, and radius of the disk.
Targets locations are generated at random and divide into two categories: easy and hard. A target location is produced by first sampling an angle from an interval , then choosing the goal location to be at distance of three times the radius of second disk and at an angle of with respect to the second disk. In easy pushes, the interval is . In hard pushes, the interval is . A push is considered a success if the distance between the goal location and the pose of the center of mass of the second disk is within the radius of second disk.
IvB Simulation Setup
We use the Bullet physics engine [33] for simulation. For each trajectory, we vary the coefficient of friction between the surface and the disks, the mass of the disks and their radius. The coefficient of friction is sampled from . The mass is sampled from Uniform(0.85kg, 1.15kg) and the radius is sampled from . We always fix the initial position of the first disk to the origin. The other disks are placed in front of the first disk at an angle, randomly sampled from , and just touches it. We ensure that disks don’t overlap each other. The pusher is placed at back of the first disk at an angle randomly sampled from , and just touches it. Then the pusher makes a straight line push at an angle, randomly sampled from , for 2s and covers a distance of about 1cm. We experiment with two different simulation setups: (1) directforce simulation setup in which we control pusher with external force and (2) robot control simulation setup in which we control the pusher using position control. We use the first setup to show the benefits of SAIN over other models. But in our real world setup, we control the pusher using positionbased control. So, we have designed a second simulation setup which matches the real robot and use it to collect pretraining data.
For the directforce simulation setup, we collect pushes with 2 disks for our training set, pushes with 2 disks and pushes with 3 disks for our test set. For the robot control simulation setup, we collect pushes with 2 disks for our training set and pushes with 2 disks for our test set.
IvC Model and Implementation Details
Models  Error on Object 1/Object 2  

trans (%)  pos (mm)  rot (deg)  
Physics  2.82/2.31  6.57/6.05  0.91/0.45 
IN  2.09/1.47  5.61/3.79  0.68/0.26 
SAIN (ours)  1.62/1.38  4.38/3.34  0.38/0.2 
We compare two models for simulation and control: the original interaction networks (IN) and our simulatoraugmented interaction networks (SAIN). They share the same architecture. Each consists of two separate neural networks: and . Both and
have four linear layers with hidden sizes of 128, 64, 32 and 16 respectively. The first three linear layers are followed by a ReLU activation.
Training interaction networks in simulation is straightforward. It is more involved for SAIN, which learns a correction over the Bullet physics engine, so the problem of training “in simulation” is illposed. To address this problem, we fix the physics engine in SAIN with mass and radius of disks equaling that of disks in the real world. We also fix the coefficient of friction in the physics engine to an estimated mean of the coefficient of friction of the real world surface across space. The training data instead contain varied mass and radius of both disks, and varied the coefficient of friction between the disks and the surface, and the model is trained to learn the residual.
We use ADAM [34] for optimization with a starting learning rate of 0.001. We decrease it by 0.5 every 2,500 iterations. We train these models for 10,000 iterations with a batch size of 100. Let the predicted 2D position, rotation, and velocity at time of disk be , and , respectively, and the corresponding true values be , , and . Let
be the length of all trajectories. The training loss function for a single trajectory is 1T∑_i=1^2∑_t=0^T1 ∥p^i_t  ^p^i_t∥_2^2 + ∥v^i_t  ^v^i_t∥_2^2 +
∥sin(r^i_t)  sin(^r^i_t)∥_2^2 + ∥cos(r^i_t)  cos(^r^i_t)∥_2^2. During training, we use a batch of trajectories and take a mean over the loss of those trajectories. We also use regularization with as regularization constant.
In practice, we ask the models to predict the change in object states (relative values) rather than the absolute values. This enables them to generalize to arbitrary starting positions without overfitting.
IvD Search Algorithm
Models  Error on Object 1/2/3  

trans (%)  pos (mm)  rot (deg)  
Physics  2.79/2.34/2.38  6.53/6.11/6.21  0.89/0.48/0.49 
IN  2.12/1.63/1.67  5.68/4.41/4.52  0.70/0.34/0.38 
SAIN (ours)  1.68/1.52/1.61  4.54/3.97/4.34  0.41/0.25/0.32 
Models  Error on Object 1/2  

trans (%)  pos (mm)  rot (deg)  
Physics  2.52/2.19  6.27/5.81  0.85/0.29 
IN  2.13/1.59  5.76/3.84  0.72/0.28 
SAIN (ours)  1.82/1.50  4.66/3.47  0.40/0.21 
As mentioned in Sec. IIID, an action is defined by the initial position of the pusher and the angle of the push, , with respect to the first disk. After these parameters have been selected, the pusher starts at the initial position and moves at an angle of with respect to the first disk for . We discretize our action space as follows. For selecting , we divide the interval into six bins and choose their midpoints. For selecting the initial position of the pusher, we choose an angle and place the pusher at edge of first disk at an angle such that the pusher touches the first disk. For selecting , we divide the interval into 12 bins and choose midpoint of one of these bins. Therefore, our action space consists of 72 discretized actions for each time step. We maintain a priority queue of action sequences based on heuristic where is the predicted 2D position of disk and is the 2D position of goal. is sum of and cosine distance between and . The cosine distance serves as a regularization cost to encourage the center of both disks and the goal to stay in a straight line. To prevent the priority queue from blowing up, we do receding horizon greedy search with an initial horizon of 2, and increase it to 3 when the distance between the second disk and goal is less than .
Models  Finetuning  Error on Object 1/2  

trans (%)  pos (mm)  rot (deg)  
Physics  N/A  0.87/1.91  3.06/6.41  0.32/0.17 
IN  No  0.86/1.84  2.96/5.75  0.96/0.32 
SAIN (ours)  No  0.69/1.06  2.38/3.52  0.43/0.18 
IN  Yes  0.63/0.61  2.23/2.05  0.41/0.19 
SAIN (ours)  Yes  0.42/0.43  1.50/1.52  0.34/0.17 
IvE Prediction and Control Results in Simulation


The forward multistep prediction errors of both interaction networks and SAIN for directforce simulation setup with 2 disks and 3 disks are reported in Table I and Table II. Note that errors on different objects are separated by in all the tables. The training data for this setup consist of pushes with only 2 disks. The forward multistep prediction errors of both interaction networks and SAIN for robot control simulation setup are reported in Table III. Given an initial state and a sequence of actions, the models do forward prediction for the next 200 timesteps, where each timestep is 1/240s. We see that SAIN outperforms interaction networks in both setups. We also list the results of the fixed physics engine used for training SAIN for reference.
We have also evaluated IN and SAIN on control tasks in simulation. We test each model on 25 easy and 25 hard pushes. For these pushes, we set the mass of two disks to 0.9kg and 1kg and their radius to 54mm and 59mm, making them differ them from those used in the internal physics engine of SAIN. This mimics realworld environments, where objects’ size and shape cannot be precisely estimated, and ensures SAIN cannot cheat by just querying the internal simulator. Fig. 3 shows SAIN performs better than IN. This suggests learning the residual not only helps to do better forward prediction, but also benefits control.
IvF RealWorld Robot Setup




We now test our models on a real robot. The setup used for the real experiments is based on the system from the MIT Push dataset [4]. The pusher is a cylinder of radius 4.8mm attached to the last joint of a ABB IRB 120 robot. The position of the pusher is recorded using the robot kinematics. The two disks being pushed are made of stainless steel, have radius of 52.5mm and 58mm, and weight 0.896kg and 1.1kg. During the experiments, the smallest disk is the one pushed directly by the pusher. The position of both disks is tracked using a Vicon system of four cameras so that the disks’ positions are highly accurate. Finally, the surface where the objects lay is made of ABS (Acrylonitrile Butadiene Styrene), whose coefficient of friction is around 0.15. Each push is done at 50mm/s and spans 10mm. We collect 1,500 pushes out of which 1,200 are used for training and 300 for testing.
We evaluate two versions of interaction networks and SAIN. The first is an offtheshelf version purely trained on synthetic data; the second is one trained on simulated data and later finetuned on real data. This helps us understand whether these models can exploit real data to adapt to new environments.
IvG Results on RealWorld Data
Results of forward simulation are shown in Table IV. SAIN outperforms IN on real data. While both models benefit from finetuning, SAIN achieves the best performance. This suggests residual learning also generalizes to real data well. All models achieve a lower error on real data than in simulation; this is because simulated data have a significant amount of noise to make the problem more challenging.
We then evaluate SAIN (both with and without finetuning) for control, on 25 easy and 25 hard pushes. The results are shown in Fig. 4. The model without finetuning achieves 100% success rate on easy pushes and 68% on hard pushes. As shown in the rightmost columns of Fig. 3(a), it sometimes pushes the object too far and gets stuck in a local minimum. After finetuning, the model works well on both easy pushes (100%) and hard pushes (96%) (Fig. 3(b)).
While objects of different shapes and materials have different dynamics, the gap between their dynamics in simulation and in the real world might share similar patterns. This is the intuition behind the observation that residual learning allows easier generalization to novel scenarios. Ajay [9] validated this for forward prediction. Here, we evaluate how our finetuned SAIN generalizes for control. We test our model on 25 hard pushes with a different surface (plywood, where the original surface is ABS), using the original disks. Our framework achieves successes in 92% of the pushes, where Fig. 4(a) shows qualitative results. We’ve also evaluated our model on another 25 hard pushes, where it pushes the large disk (58mm) to direct the small one (52.5mm). Our framework achieves successes in 88% of the pushes. Fig. 4(b) shows qualitative results. These results suggest that SAIN can generalize to solve control tasks with new object shapes and materials.
V Conclusion
We have proposed a hybrid dynamics model, simulatoraugmented interaction networks (SAIN), combining a physical simulator with a learned, objectcentered neural network. Our underlying philosophy is to first use analytical models to model realworld processes as much as possible, and learn the remaining residuals. Learned residual models are specific to the realworld scenario for which data is collected, and adapt the model accordingly. The combined physics engine and residual model requires little need for domain specific knowledge or handcrafting and can generalize well to unseen situations. We have demonstrated SAIN’s efficacy when applied to a challenging control problem in both simulation and the real world. Our model also generalizes to setups where object shape and material vary and has potential applications in control tasks that involve complex contact dynamics.
Acknoledgements. This work is supported by NSF #1420316, #1523767, and #1723381, AFOSR grant FA95501710165, ONR MURI N000141612007, Honda Research, Facebook, and Draper Laboratory.
References
 [1] E. Todorov, T. Erez, and Y. Tassa, “Mujoco: A physics engine for modelbased control,” in IROS. IEEE, 2012, pp. 5026–5033.
 [2] E. Coumans, “Bullet physics simulation,” in SIGGRAPH, 2015.
 [3] R. Kolbert, N. Chavan Dafle, and A. Rodriguez, “Experimental Validation of Contact Dynamics for InHand Manipulation,” in ISER, 2016.
 [4] K.T. Yu, M. Bauza, N. Fazeli, and A. Rodriguez, “More than a million ways to be pushed. a highfidelity experimental dataset of planar pushing,” in IROS. IEEE, 2016, pp. 30–37.
 [5] N. Fazeli, S. Zapolsky, E. Drumwright, and A. Rodriguez, “Fundamental limitations in performance and interpretability of common planar rigidbody contact models,” in ISRR, 2017.
 [6] I. Mordatch, K. Lowrey, and E. Todorov, “Ensemblecio: Fullbody dynamic motion planning that transfers to physical humanoids,” in IROS, 2015.
 [7] A. Becker and T. Bretl, “Approximate steering of a unicycle under bounded model perturbation using ensemble control,” IEEE TRO, vol. 28, no. 3, pp. 580–591, 2012.
 [8] N. Fazeli, S. Zapolsky, E. Drumwright, and A. Rodriguez, “Learning dataefficient rigidbody contact models: Case study of planar impact,” in CoRL, 2017, pp. 388–397.
 [9] A. Ajay, J. Wu, N. Fazeli, M. Bauza, L. P. Kaelbling, J. B. Tenenbaum, and A. Rodriguez, “Augmenting physical simulators with stochastic neural networks: Case study of planar pushing and bouncing,” in IROS, 2018.
 [10] K. Chatzilygeroudis and J.B. Mouret, “Using parameterized blackbox priors to scale up modelbased policy search for robotics,” in ICRA, 2018.
 [11] A. Kloss, S. Schaal, and J. Bohg, “Combining learned and analytical models for predicting action effects,” arXiv:1710.04102, 2017.
 [12] P. W. Battaglia, R. Pascanu, M. Lai, D. Rezende, and K. Kavukcuoglu, “Interaction networks for learning about objects, relations and physics,” in NeurIPS, 2016.
 [13] M. B. Chang, T. Ullman, A. Torralba, and J. B. Tenenbaum, “A compositional objectbased approach to learning physical dynamics,” in ICLR, 2017.
 [14] F. R. Hogan and A. Rodriguez, “Feedback control of the pusherslider system: A story of hybrid and underactuated contact dynamics,” in WAFR, 2016.
 [15] A. Byravan and D. Fox, “Se3nets: Learning rigid body motion using deep neural networks,” in ICRA, 2017.
 [16] J. Zhou, R. Paolini, A. Bagnell, and M. T. Mason, “A convex polynomial forcemotion model for planar sliding: Identification and application,” in ICRA, 2016, pp. 372–377.
 [17] J. Zhou, A. Bagnell, and M. T. Mason, “A fast stochastic contact model for planar pushing and grasping: Theory and experimental validation,” in RSS, 2017.
 [18] S. Ehrhardt, A. Monszpart, N. Mitra, and A. Vedaldi, “Taking visual motion prediction to new heightfields,” arXiv:1712.09448, 2017.

[19]
J. Degrave, M. Hermans, and J. Dambre, “A differentiable physics engine for deep learning in robotics,” in
ICLR Workshop, 2016.  [20] M. Toussaint, K. Allen, K. Smith, and J. Tenenbaum, “Differentiable physics and stable modes for tooluse and manipulation planning,” in RSS, 2018.
 [21] I. Lenz, R. A. Knepper, and A. Saxena, “Deepmpc: Learning deep latent features for model predictive control,” in RSS, 2015.
 [22] S. Gu, T. Lillicrap, I. Sutskever, and S. Levine, “Continuous deep qlearning with modelbased acceleration,” in ICML, 2016.

[23]
A. Nagabandi, G. Kahn, R. S. Fearing, and S. Levine, “Neural network dynamics for modelbased deep reinforcement learning with modelfree finetuning,” in
ICRA, 2018.  [24] G. Farquhar, T. Rocktäschel, M. Igl, and S. Whiteson, “Treeqn and atreec: Differentiable tree planning for deep reinforcement learning,” in ICLR, 2018.
 [25] A. Srinivas, A. Jabri, P. Abbeel, S. Levine, and C. Finn, “Universal planning networks,” in ICML, 2018.
 [26] D. Silver, H. van Hasselt, M. Hessel, T. Schaul, A. Guez, T. Harley, G. DulacArnold, D. Reichert, N. Rabinowitz, A. Barreto, and T. Degris, “The predictron: Endtoend learning and planning,” in ICML, 2017.
 [27] J. Oh, S. Singh, and H. Lee, “Value prediction network,” in NeurIPS, 2017.
 [28] M. Bauza, F. R. Hogan, and A. Rodriguez, “A dataefficient approach to precise and controlled pushing,” in CoRL, 2018.
 [29] S. Racanière, T. Weber, D. Reichert, L. Buesing, A. Guez, D. J. Rezende, A. P. Badia, O. Vinyals, N. Heess, Y. Li, R. Pascanu, P. Battaglia, D. Silver, and D. Wierstra, “Imaginationaugmented agents for deep reinforcement learning,” in NeurIPS, 2017.
 [30] J. B. Hamrick, A. J. Ballard, R. Pascanu, O. Vinyals, N. Heess, and P. W. Battaglia, “Metacontrol for adaptive imaginationbased optimization,” in ICLR, 2017.
 [31] R. Pascanu, Y. Li, O. Vinyals, N. Heess, L. Buesing, S. Racanière, D. Reichert, T. Weber, D. Wierstra, and P. Battaglia, “Learning modelbased planning from scratch,” arXiv:1707.06170, 2017.
 [32] A. SanchezGonzalez, N. Heess, J. T. Springenberg, J. Merel, M. Riedmiller, R. Hadsell, and P. Battaglia, “Graph networks as learnable physics engines for inference and control,” in ICML, 2018.
 [33] E. Coumans, “Bullet physics engine,” Open Source Software: http://bulletphysics. org, 2010.
 [34] D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” in ICLR, 2015.