Physically-based reasoning is fundamental to successfully performing complex tasks in the physical world. This is particularly relevant to the domain of robot learning. There is a large body of mature work in robot dynamics, which need not be learned from scratch per task. In this work, we introduce a differentiable physics simulator for rigid body dynamics. We leverage this differentiability to estimate parameters that result in simulations that closely match the behavior of observed reference systems. Additionally, through trajectory optimization, we can efficiently generate control inputs that minimize cost functions that are expressed with respect to any quantity that is part of the physics computation.
Differentiable physics provides many advantages when used as part of a learning process. Physically accurate simulation obeys dynamical laws of real systems, including conservation of energy and momentum. Furthermore, joint constraints are enforced with no room for error. The parameters of physics engines, like link geometry and inertia matrices, are well-defined and correspond to properties of real systems. Learning these parameters provides a significantly interpretable parameter space, and can benefit classical control and estimation algorithms. These systems provide a high inductive bias, and model parameters need not be retrained for different tasks or reconfigured environments.
Our contributions are as follows:
We present a fully differentiable simulator for rigid body dynamics that supports a rich set of integrators for the accurate simulation of mechanisms over long time horizons.
We analyze the performance of gradient calculation methods on the problem of inferring parameters underlying the simulation of rigid-body dynamical systems.
We introduce an adaptive model-predictive control algorithm that leverages our differentiable model to perform trajectory optimization while finding the optimal parameters that fit a reference mechanism implemented in another simulator.
Ii Related Work
Differentiable physics has recently attracted significant research efforts. Degrave et al. [degrave2019physics]
implemented a differentiable physics engine in the automatic differentiation framework Theano. Giftthaler et al.[giftthaler2017autodiff] presented a rigid-body-dynamics simulator that allows for the computation of derivatives through code generation via RobCoGen [frigerio2016robcogen]. Similarly, we use Stan Math [carpenter2015stan], a C++ library for reverse-mode automatic differentiation to efficiently compute gradients, even in cases where the code branches significantly. Analytical gradients of rigid-body dynamics algorithms have been implemented in the Pinnocchio library [carpentier2018analytical] to facilitate optimal control and inverse kinematics. While such manually derived gradients can be computed efficiently, they are less general than our approach since they can only be used to optimize for a number of hand-engineered quantities. More recently, Koolen and Deits [koolen2019rbd-julia] have implemented rigid-body-dynamics algorithms in the programming language Julia where, among others, libraries for optimization, automatic differentiation, and numerical integration are available. Non-penetrative multi-point contacts between rigid bodies are often simulated by solving a linear complementarity problem (LCP), through which [peres2018lcp] differentiate using the differentiable quadratic program solver OptNet [amos2017optnet]. While our proposed model does not yet incorporate contact dynamics, we are able to demonstrate the scalability of our approach on versatile applications of differentiable physics to common 3D control domains.
Automatic differentiation of the solutions to differential equations is well studied, with applications to pharmacology, meteorology, and many other fields. Recent machine learning work by[chen2018neural]
recasts learning in long short-term memory networks and residual networks as approximations to this problem. Thorough comparisons of methods for computing parameter gradients of ODE solutions are given in[carpenter2015stan, rackauckas2018comparison, serban2003cvodes].
Learning dynamics models has a tradition in the field of robotics and control theory. Early works on forward models [moore1992forward] and locally weighted regression [atkeson1997lwl] yielded control algorithms that learn from previous experience. Computing gradients through the solution of differential equations has been further leveraged for system identification [rackauckas2018comparison].
More recently, a variety of novel deep learning architectures have been proposed to learnintuitive physics
models. Inductive bias has been introduced through graph neural networks[hadsell2018graphnet, li2018learning, liu2019psd], particularly interaction networks [battaglia2016interaction, chang2016physics, schenck2018spnet, mrowca2018flexible, xu2019physics] that are able to learn rigid and soft body dynamics. Vision-based machine learning approaches to predict the future outcomes of the state of the world have been proposed [wu2015galileo, wu2016phys101, wu2017deanimation, finn2016unsupervised, janner2018reasoning]. Physics-informed learning imposes a stronger inductive bias on the learning problem to model particular physical processes, such as cosmological structure formation [he2019physics] or turbulence models [raissi2018physics]. Deep Lagrangian Networks [lutter2018delan] and Hamiltonian Networks [greydanus2019hamiltonian] represent functions in the respective classical mechanics frameworks using deep neural networks.
The approach of adapting the simulator to real world dynamics, which we demonstrate through our adaptive MPC algorithm in subsection V-C, has been less explored. While many previous works have shown to adapt simulators to the real world using system identification and state estimation [kolev2015sysid, zhu2018fastmi], few have shown adaptive model-based control schemes that actively close the feedback loop between the real and the simulated system [reichenbach2009dynsim, farchy2013simback, chebotar2018sim2real]
. Instead of using a simulator, model-based reinforcement learning is a broader field[polydoros2017mbrl], where the system dynamics, and state-action transitions in particular, are learned to achieve higher sample efficiency compared to model-free methods. Within this framework, predominantly Gaussian Processes [ko2007gp, deisenroth2011pilco, boedecker2014sgp] and neural networks [williams2016mppi, yamaguchi2016neural] have been proposed to learn dynamics and optimize policies. Bayesian neural networks in particular have been used to learn dynamics in model-based reinforcement learning approaches [fu2016one, depeweg2016mbrl, gal2016improving, chua2018deep].
Throughout this work, we follow the following conventions.
denotes the system’s state vector from the state space. denotes the system’s control vector from the control space . denotes the system’s parameter vector from the parameter space . A rigid body system is entirely described by the generalized coordinates111
Generalized coordinates sparsely encode only particular degrees of freedom in the kinematic chain so that connected bodies remain connected., which denote the generalized forces, positions, velocities, and accelerations, respectively.
Iv-a Rigid Body Dynamics
To simulate the dynamics of a rigid body system, we integrate the Newton-Euler equation
gives the generalized inertial matrix of the system for configuration , and describes the centrifugal and Coriolis terms affecting motion. describes the contribution from gravity.
Given a descriptive model consisting of joints, bodies, and predecessor/successor relationships, we build a kinematic chain that specifies the dynamics of the system. From a mechanism’s position vector , the forward kinematics function computes the positions and orientation quaternions of the geometries attached to the kinematic chain (such as the end-effector of a robot arm) in world coordinates.
Forward dynamics, computed by is the mapping from positions, velocities and forces to accelerations. We efficiently compute forward dynamics using the Articulated Body Algorithm (ABA) [featherstone2007rbda], that propagates forces through the bodies while adhering to the motion subspaces defined by the joints that connect them. In our simulator, bodies comprise physical entities with mass, inertia, and attached rendering and collision geometries. Joints describe constraints on the relative motion of bodies in a model. Equipped with such a graph of bodies connected via joints with forces acting on them, ABA computes the joint accelerations in operations.
Unless specified otherwise, we represent the mechanism’s state by so that the change in state corresponds to . A mechanism is parameterized by the real vector . Such parameters can, depending on the particular system, contain values for the geometries of the links, inertia properties, and other settings that influence the dynamics of the mechanism.
Resulting from the forward dynamics , a new change in state is computed using ABA at each time step given the previous state and parameters
. Such relationship forms an ordinary differential equation (ODE), which is solved for the next state through an integrator. We leverage several methods to solve ODEs, from a simple Euler integrator, through explicit stepping schemes like fourth-order Runge-Kutta (RK4), to adaptive stepping algorithms, such as Dormand-Prince (commonly referred to as RK45) method.
In order to simulate a system, the ODE is solved for a sequence of time steps . Throughout this work we consider equidistant time intervals with an integration step size . The smaller the step size, the more accurate the simulation, but the more ODE system evaluations are necessary. Larger time steps improve the execution performance of the simulator but yield decreased accuracy, particularly in chaotic systems.
Iv-C Parameter Estimation11todo: 1complete
We are interested in the behavior of this ODE system with respect to changes in its parameters, , and to its control inputs, . Parameters of note are the continuous parameters describing the inertia and geometry of links and joint attachments. Discrete parameters, describing the structure of the system, are fundamentally interesting, but are not considered in this method.
Given a cost function evaluated over states evaluated at times , the overall loss is defined as follows:
Note that the gradient only becomes available by integrating over the dynamics so that the parameters influence the system states. Typically, for parameter estimation, the loss is the distance between the simulated states and the states from a reference trajectory , which can be given from a real physical system or another simulation with unknown parameters. This approach is known as an initial value problem
(IVP) and has an important application in simulation-to-reality (sim2real) transfer learning, where the reality gap between the agent’s dynamics model and the real world dynamics needs to be sufficiently small for the agent to operate in the real world (cf.[polydoros2017mbrl]).
Iv-D Analytical Differentiation
In order to estimate parameters, we seek to minimize Equation 2 through gradient-based optimization. Such an approach requires calculating gradients of the parameters with respect to the ODE solution , which is known as Continuous Sensitivity Analysis and has a wide range of applications [rackauckas2018comparison].
Historically, the two primary methods for computing derivatives through complex systems have been numerical and analytical derivation. Analytical (symbolic) derivation gives the user a chance to hand-optimize calculations, but is inflexible and error-prone, as gradients of any new dynamics elements must be determined separately.
Iv-E Numerical Differentiation
A numerical approximation to the analytical gradients can be obtained through finite differences. This approach is an one-at-a-time method (OAT) that, along each parameter dimension , adapts the parameter vector to approximate the gradient w.r.t. the final system state . A common finite differencing approach is the symmetric difference quotient that approximates the gradient of function at point symmetrically at two nearby points to using the step size . Its error is characterized as , while higher-order symmetric derivatives can be obtained that achieve higher accuracy at the cost of more function evaluations. In practice, step size cannot be reduced indefinitely due to floating point errors [jerrel1997ad], limiting the overall achievable accuracy of this method.
Iv-F Automatic Differentiation
Additionally, we may compute gradients by using forward or reverse mode automatic differentiation (AD) on the numerical integrator used to solve the ODE. Forward-mode differentiation performs arithmetic on dual numbers to compute functions and their derivatives simultaneously. Reverse-mode differentiation tracks the derivatives of function evaluations, storing them on a tape. Gradients are then computed in reverse by repeatedly applying the chain rule. Tape-based AD software may see high memory usage, especially for solvers with adaptive step sizes or systems with many outputs, while forward-mode differentiation scales poorly with input parameters. We present results for automatically differentiating ODE integrators in SectionV.
Iv-G Local Sensitivity Analysis
Local sensitivity analysis, is a method for computing gradients for ODE solutions by augmenting the model dynamics to include the dynamics for the gradient itself (local sensitivities)
where is the -th parameter. This approach adds a new equation to the system per parameter, and thus performs poorly for systems with many parameters. Fortunately, for many applications in robotics, we are interested in optimizing a few unknown parameters of a model.
Iv-H Adjoint Sensitivity Analysis
To compute derivatives in our simulator, we use techniques from automatic differentiation and sensitivity analysis. There are multiple ways to compute gradients of functions of the solutions to a system of differential equations. A concise overview is given by [rackauckas2018comparison]. One method, introduced by [pontryagin1962mathematical] and popularized recently by [chen2018neural], is a continuous method called Adjoint Sensitivity Analysis.
We can compute a parameter gradient by backwards solving
which is known as the adjoint problem. At every discrete point where cost is evaluated, the ODE solution is perturbed by where is solved in the forward pass. Then the loss gradient is22todo: 2verify
We evaluate the previously introduced methods for computing gradients through the ODE solutions. To this end, we first benchmark these approaches and subsequently compare them on parameter estimation problems involving a simulated compound pendulum and the automatic design of a robot arm. Next, we present an adaptive control algorithm that leverages the parameter estimation capabilities of our differentiable simulator and combines it with trajectory optimization to control a mechanism in a different simulator. Our simulator is implemented in C++ using the Eigen framework [eigenweb] for linear algebra and Stan Math [carpenter2015stan] for automatic differentiation (AD). Since the latter only provides an implementation of reverse-mode automatic differentiation, we limit our attention to this algorithm and leave considerations of forward-mode and other AD techniques for future work.
V-a Benchmarking Gradient Calculation Approaches
In our first experiment, we consider an -link compound pendulum that is simulated over a variety of time steps given its link lengths as parameters . We focus in our profiling of the approaches introduced in section IV on their computational efficiency, i.e., how many dynamics evaluations are necessary, how many variables are generated on the automatic differentiation stack, and the total computation time.
We first consider a double pendulum () and report the performance of the algorithms Numerical Differentiation (“Numerical”), reverse-mode AD (“AutoDiff”), Adjoint Sensitivity Method (“Adjoint”) and Coupled ODE System (“Coupled”) in Figure 2. Although the number of ODE evaluations (third plot) grows exponentially faster with finite differencing (orange) than the other methods, we note that AutoDiff takes the longest. We conduct the experiment using error-controlled adaptive time stepping methods, such as the Dormand-Price and the Fehlberg methods, and observe the same behavior. Reverse-mode AD records a copy of each participating variable per operation, resulting in a large stack of variables (second plot), that needs to be traversed in order to compute gradients. In contrast, Adjoint and Coupled both maintain a constant-size stack of variables while taking approximately the same computation time, while the later requires approximately twice as many dynamics evaluations as the former. Next, we investigate the scalability of the continuous sensitivity methods on a 100-link compound pendulum, requiring a parameter vector of size 100 to be estimated. We exclude AD from this comparison due to its prohibitively high computation time. Adjoint and Coupled both remain close in computation time, although Coupled requires sensitivites (1) compared to Adjoint’s augmented state dimensions (2), which might be offset due to the need of solving two ODE in the case of Adjoint at each time step.
V-B Automatic Robot Design
Industrial robotic applications often require a robot to follow a given path in task space with its end effector. In general, robotic arms with six or more degrees of freedom provide large workspaces and redundant configurations to reach any possible point within the workspace. However, motors are expensive to produce, maintain, and calibrate. Designing arms that contain a minimal number of motors required for a task provides economic and reliability benefits, but imposes constraints on the kinematic design of the arm.
One standard for specifying the kinematic configuration of a serial robot arm is the Denavit-Hartenberg (DH) parameterization [hartenberg1955kinematic]. For each joint , the DH parameters are , describing the distance from joint to motor axis , the rotation about axis , the distance of joint along motor axis , and the angle between motor axes and , respectively.
We specify a task-space trajectory for as the position in world coordinates of the robot’s end-effector. Given a joint-space trajectory , we seek to find the best -DOF robot arm design, parameterized by DH parameters , that most closely matches the specified end-effector trajectory:
where the forward kinematics function maps from joint space to the Cartesian coordinates of the end-effector, conditioned on DH parameters . Since we compute using our engine, we may compute derivatives of arbitrary inputs to this function, and use gradient-based optimization through L-BFGS [liu1989lbfgs] from the Ceres optimization library [ceres-solver] to converge to arm designs which accurately perform the trajectory-following task, as shown in Figure 3.
V-C Adaptive MPC
Besides parameter estimation and design, a key benefit of differentiable physics is its applicability to optimal control algorithms. In order to control a system within our simulator, we specify the control space , which is typically a subset of the system’s generalized forces , and the state space . Given a quadratic, i.e. twice-differentiable, cost function , we can linearize the dynamics at every time step, allowing efficient gradient-based optimal control techniques to be employed. Iterative Linear Quadratic Control [li2004iterative] (iLQR) is a direct trajectory optimization algorithm that uses a dynamic programming scheme on the linearized dynamics to derive the control inputs that successively move the trajectory of states and controls closer to the optimum of the cost function.
Throughout our control experiments, we optimize a trajectory for an -link cartpole to swing up from an arbitrary initial configuration of the joint angles. In the case of double cartpole, i.e. a double inverted pendulum on a cart, the state is defined as
where and refer to the cart’s position and velocity, to the joint angles, and to the velocities and accelerations of the revolute joints of the poles, respectively. For a single cartpole the state space is represented analogously, excluding the second revolute joint coordinates . The control input is a one-dimensional vector describing the force applied to the cart along the axis. As typical for finite-horizon, discrete-time LQR problems, the cost of a trajectory over time steps is defined as
where , and the matrices and weight the contributions of each dimension of the state and control input. Throughout this experiment, we set to be diagonal matrices. Minimizing the cost function drives the system to the defined goal state222The goal state is given for a double cartpole here, it is analogously defined for a single cartpole. at which the pole is upright at zero angular velocity and acceleration, and the cart is centered at the origin with zero positional velocity.
Trajectory optimization assumes that the dynamics model is accurate w.r.t the real world and generates sequences of actions that achieve optimal behavior toward a given goal state, leading to open-loop control. Model-predictive control (MPC) leverages trajectory optimization in a feedback loop where the next action is chosen as the first control computed by trajectory optimization over a shorter time horizon with the internal dynamics model. After some actions are executed in the real world and subsequent state samples are observed, adaptive MPC (Algorithm 3) fits the dynamics model to these samples to align it closer with the real-world dynamics. In this experiment, we want to investigate how differentiable physics can help overcome the domain shift that poses an essential challenge of model-based control algorithms that are employed in a different environment. To this end, we incorporate our simulator as dynamics model in such receding-horizon control algorithm to achieve swing-up motions of a single and double cartpole in the DeepMind Control Suite [tassa2018dm] environments that are based on the MuJoCo physics simulator.
We fit the parameters of the simulator by minimizing the prediction loss given the state-action transition from the real system:
Thanks to the low dimensionality of the model parameter vector (for a double cartpole there are 14 parameters, cf. Figure 5), efficient optimizers such as the quasi-Newton optimizer L-BFGS are applicable, leading to fast convergence of the fitting phase, typically within 10 optimization steps. The length of one episode is 140 time steps. During the first episode we fit the dynamics model more often, i.e. every 50 time steps, to warm-start the receding-horizon control scheme. Given a horizon size of 20 and 40 time steps, MPC is able to find the optimal swing-up trajectory for the single and double cartpole, respectively.
Within a handful of training episodes, adaptive MPC infers the correct model parameters involved in the dynamics of a double cartpole (Figure 5). As shown in Figure 1, the models we start from do not match their counterparts from DeepMind Control Suite. For example, the poles are represented by capsules where the mass is distributed across these elongated geometries, whereas initially in our model, the center of mass of the links is at the end of them, such that they have different inertia parameters. We set the masses, lengths of the links, and 3D coordinates of the center of masses to 2, and, using a few steps of the optimizer and less than 100 transition samples, converge to a much more accurate model of the true dynamics in the MuJoCo environment.
We introduced a novel differentiable physical simulator, and presented experiments for the inference of physical parameters, optimal control and system design. Since it is constrained to the laws of physics, such as conservation of energy and momentum, our proposed model provides a large, meaningful inductive bias on robot learning problems. Within a handful of trials in out test environment, our gradient-based representation of rigid-body dynamics allows an adaptive MPC scheme to infer the model parameters of the system thereby allowing it to make predictions and plan for actions many time steps ahead. We look forward to exercising this physics engine for learning and control to solve complex tasks on physical robot systems.