Automatic Differentiation and Continuous Sensitivity Analysis of Rigid Body Dynamics

01/22/2020 ∙ by David Millard, et al. ∙ University of Southern California Columbia University 0

A key ingredient to achieving intelligent behavior is physical understanding that equips robots with the ability to reason about the effects of their actions in a dynamic environment. Several methods have been proposed to learn dynamics models from data that inform model-based control algorithms. While such learning-based approaches can model locally observed behaviors, they fail to generalize to more complex dynamics and under long time horizons. In this work, we introduce a differentiable physics simulator for rigid body dynamics. Leveraging various techniques for differential equation integration and gradient calculation, we compare different methods for parameter estimation that allow us to infer the simulation parameters that are relevant to estimation and control of physical systems. In the context of trajectory optimization, we introduce a closed-loop model-predictive control algorithm that infers the simulation parameters through experience while achieving cost-minimizing performance.



There are no comments yet.


page 1

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

Physically-based reasoning is fundamental to successfully performing complex tasks in the physical world. This is particularly relevant to the domain of robot learning. There is a large body of mature work in robot dynamics, which need not be learned from scratch per task. In this work, we introduce a differentiable physics simulator for rigid body dynamics. We leverage this differentiability to estimate parameters that result in simulations that closely match the behavior of observed reference systems. Additionally, through trajectory optimization, we can efficiently generate control inputs that minimize cost functions that are expressed with respect to any quantity that is part of the physics computation.

Differentiable physics provides many advantages when used as part of a learning process. Physically accurate simulation obeys dynamical laws of real systems, including conservation of energy and momentum. Furthermore, joint constraints are enforced with no room for error. The parameters of physics engines, like link geometry and inertia matrices, are well-defined and correspond to properties of real systems. Learning these parameters provides a significantly interpretable parameter space, and can benefit classical control and estimation algorithms. These systems provide a high inductive bias, and model parameters need not be retrained for different tasks or reconfigured environments.

Our contributions are as follows:

  1. We present a fully differentiable simulator for rigid body dynamics that supports a rich set of integrators for the accurate simulation of mechanisms over long time horizons.

  2. We analyze the performance of gradient calculation methods on the problem of inferring parameters underlying the simulation of rigid-body dynamical systems.

  3. We introduce an adaptive model-predictive control algorithm that leverages our differentiable model to perform trajectory optimization while finding the optimal parameters that fit a reference mechanism implemented in another simulator.

Fig. 1: Visualizations of the simulated cartpole systems. The carts are constrained to the rail, but may move infinitely in either direction. Both systems are actuated by a linear force applied to the cart. Upper row: Single cartpole environment from DeepMind Control Suite [tassa2018dm] in the MuJoCo [todorov2012mujoco] physics simulator (left) and in our simulator (right). Lower row: Double cartpole environment from DeepMind Control Suite (left) and in our simulator (right).

Ii Related Work

Differentiable physics has recently attracted significant research efforts. Degrave et al. [degrave2019physics]

implemented a differentiable physics engine in the automatic differentiation framework Theano. Giftthaler et al. 

[giftthaler2017autodiff] presented a rigid-body-dynamics simulator that allows for the computation of derivatives through code generation via RobCoGen [frigerio2016robcogen]. Similarly, we use Stan Math [carpenter2015stan], a C++ library for reverse-mode automatic differentiation to efficiently compute gradients, even in cases where the code branches significantly. Analytical gradients of rigid-body dynamics algorithms have been implemented in the Pinnocchio library [carpentier2018analytical] to facilitate optimal control and inverse kinematics. While such manually derived gradients can be computed efficiently, they are less general than our approach since they can only be used to optimize for a number of hand-engineered quantities. More recently, Koolen and Deits [koolen2019rbd-julia] have implemented rigid-body-dynamics algorithms in the programming language Julia where, among others, libraries for optimization, automatic differentiation, and numerical integration are available. Non-penetrative multi-point contacts between rigid bodies are often simulated by solving a linear complementarity problem (LCP), through which [peres2018lcp] differentiate using the differentiable quadratic program solver OptNet [amos2017optnet]. While our proposed model does not yet incorporate contact dynamics, we are able to demonstrate the scalability of our approach on versatile applications of differentiable physics to common 3D control domains.

Automatic differentiation of the solutions to differential equations is well studied, with applications to pharmacology, meteorology, and many other fields. Recent machine learning work by  


recasts learning in long short-term memory networks and residual networks as approximations to this problem. Thorough comparisons of methods for computing parameter gradients of ODE solutions are given in  

[carpenter2015stan, rackauckas2018comparison, serban2003cvodes].

Learning dynamics models has a tradition in the field of robotics and control theory. Early works on forward models [moore1992forward] and locally weighted regression [atkeson1997lwl] yielded control algorithms that learn from previous experience. Computing gradients through the solution of differential equations has been further leveraged for system identification [rackauckas2018comparison].

More recently, a variety of novel deep learning architectures have been proposed to learn

intuitive physics

models. Inductive bias has been introduced through graph neural networks 

[hadsell2018graphnet, li2018learning, liu2019psd], particularly interaction networks [battaglia2016interaction, chang2016physics, schenck2018spnet, mrowca2018flexible, xu2019physics] that are able to learn rigid and soft body dynamics. Vision-based machine learning approaches to predict the future outcomes of the state of the world have been proposed [wu2015galileo, wu2016phys101, wu2017deanimation, finn2016unsupervised, janner2018reasoning]. Physics-informed learning imposes a stronger inductive bias on the learning problem to model particular physical processes, such as cosmological structure formation [he2019physics] or turbulence models [raissi2018physics]. Deep Lagrangian Networks [lutter2018delan] and Hamiltonian Networks [greydanus2019hamiltonian] represent functions in the respective classical mechanics frameworks using deep neural networks.

The approach of adapting the simulator to real world dynamics, which we demonstrate through our adaptive MPC algorithm in subsection V-C, has been less explored. While many previous works have shown to adapt simulators to the real world using system identification and state estimation [kolev2015sysid, zhu2018fastmi], few have shown adaptive model-based control schemes that actively close the feedback loop between the real and the simulated system [reichenbach2009dynsim, farchy2013simback, chebotar2018sim2real]

. Instead of using a simulator, model-based reinforcement learning is a broader field 

[polydoros2017mbrl], where the system dynamics, and state-action transitions in particular, are learned to achieve higher sample efficiency compared to model-free methods. Within this framework, predominantly Gaussian Processes [ko2007gp, deisenroth2011pilco, boedecker2014sgp] and neural networks [williams2016mppi, yamaguchi2016neural] have been proposed to learn dynamics and optimize policies. Bayesian neural networks in particular have been used to learn dynamics in model-based reinforcement learning approaches [fu2016one, depeweg2016mbrl, gal2016improving, chua2018deep].

Iii Notation

Throughout this work, we follow the following conventions.

denotes the system’s state vector from the state space

. denotes the system’s control vector from the control space . denotes the system’s parameter vector from the parameter space . A rigid body system is entirely described by the generalized coordinates111

Generalized coordinates sparsely encode only particular degrees of freedom in the kinematic chain so that connected bodies remain connected.

, which denote the generalized forces, positions, velocities, and accelerations, respectively.

Iv Approach

Iv-a Rigid Body Dynamics

To simulate the dynamics of a rigid body system, we integrate the Newton-Euler equation


gives the generalized inertial matrix of the system for configuration , and describes the centrifugal and Coriolis terms affecting motion. describes the contribution from gravity.

Given a descriptive model consisting of joints, bodies, and predecessor/successor relationships, we build a kinematic chain that specifies the dynamics of the system. From a mechanism’s position vector , the forward kinematics function computes the positions and orientation quaternions of the geometries attached to the kinematic chain (such as the end-effector of a robot arm) in world coordinates.

Forward dynamics, computed by is the mapping from positions, velocities and forces to accelerations. We efficiently compute forward dynamics using the Articulated Body Algorithm (ABA) [featherstone2007rbda], that propagates forces through the bodies while adhering to the motion subspaces defined by the joints that connect them. In our simulator, bodies comprise physical entities with mass, inertia, and attached rendering and collision geometries. Joints describe constraints on the relative motion of bodies in a model. Equipped with such a graph of bodies connected via joints with forces acting on them, ABA computes the joint accelerations in operations.

Iv-B Integration

Unless specified otherwise, we represent the mechanism’s state by so that the change in state corresponds to . A mechanism is parameterized by the real vector . Such parameters can, depending on the particular system, contain values for the geometries of the links, inertia properties, and other settings that influence the dynamics of the mechanism.

Resulting from the forward dynamics , a new change in state is computed using ABA at each time step given the previous state and parameters

. Such relationship forms an ordinary differential equation (ODE)

, which is solved for the next state through an integrator. We leverage several methods to solve ODEs, from a simple Euler integrator, through explicit stepping schemes like fourth-order Runge-Kutta (RK4), to adaptive stepping algorithms, such as Dormand-Prince (commonly referred to as RK45) method.

In order to simulate a system, the ODE is solved for a sequence of time steps . Throughout this work we consider equidistant time intervals with an integration step size . The smaller the step size, the more accurate the simulation, but the more ODE system evaluations are necessary. Larger time steps improve the execution performance of the simulator but yield decreased accuracy, particularly in chaotic systems.

Iv-C Parameter Estimation

11todo: 1complete

We are interested in the behavior of this ODE system with respect to changes in its parameters, , and to its control inputs, . Parameters of note are the continuous parameters describing the inertia and geometry of links and joint attachments. Discrete parameters, describing the structure of the system, are fundamentally interesting, but are not considered in this method.

Given a cost function evaluated over states evaluated at times , the overall loss is defined as follows:


Note that the gradient only becomes available by integrating over the dynamics so that the parameters influence the system states. Typically, for parameter estimation, the loss is the distance between the simulated states and the states from a reference trajectory , which can be given from a real physical system or another simulation with unknown parameters. This approach is known as an initial value problem

(IVP) and has an important application in simulation-to-reality (sim2real) transfer learning, where the reality gap between the agent’s dynamics model and the real world dynamics needs to be sufficiently small for the agent to operate in the real world (cf. 


Iv-D Analytical Differentiation

In order to estimate parameters, we seek to minimize Equation 2 through gradient-based optimization. Such an approach requires calculating gradients of the parameters with respect to the ODE solution , which is known as Continuous Sensitivity Analysis and has a wide range of applications [rackauckas2018comparison].

Historically, the two primary methods for computing derivatives through complex systems have been numerical and analytical derivation. Analytical (symbolic) derivation gives the user a chance to hand-optimize calculations, but is inflexible and error-prone, as gradients of any new dynamics elements must be determined separately.

Iv-E Numerical Differentiation

A numerical approximation to the analytical gradients can be obtained through finite differences. This approach is an one-at-a-time method (OAT) that, along each parameter dimension , adapts the parameter vector to approximate the gradient w.r.t. the final system state . A common finite differencing approach is the symmetric difference quotient that approximates the gradient of function at point symmetrically at two nearby points to using the step size . Its error is characterized as , while higher-order symmetric derivatives can be obtained that achieve higher accuracy at the cost of more function evaluations. In practice, step size cannot be reduced indefinitely due to floating point errors [jerrel1997ad], limiting the overall achievable accuracy of this method.

Iv-F Automatic Differentiation

Additionally, we may compute gradients by using forward or reverse mode automatic differentiation (AD) on the numerical integrator used to solve the ODE. Forward-mode differentiation performs arithmetic on dual numbers to compute functions and their derivatives simultaneously. Reverse-mode differentiation tracks the derivatives of function evaluations, storing them on a tape. Gradients are then computed in reverse by repeatedly applying the chain rule. Tape-based AD software may see high memory usage, especially for solvers with adaptive step sizes or systems with many outputs, while forward-mode differentiation scales poorly with input parameters. We present results for automatically differentiating ODE integrators in Section 


Iv-G Local Sensitivity Analysis

Local sensitivity analysis, is a method for computing gradients for ODE solutions by augmenting the model dynamics to include the dynamics for the gradient itself (local sensitivities)

where is the -th parameter. This approach adds a new equation to the system per parameter, and thus performs poorly for systems with many parameters. Fortunately, for many applications in robotics, we are interested in optimizing a few unknown parameters of a model.

Mechanism parameters
Initial state
Start time , end time
function augment()
end function
Algorithm 1 Coupled ODE System

Iv-H Adjoint Sensitivity Analysis

To compute derivatives in our simulator, we use techniques from automatic differentiation and sensitivity analysis. There are multiple ways to compute gradients of functions of the solutions to a system of differential equations. A concise overview is given by [rackauckas2018comparison]. One method, introduced by [pontryagin1962mathematical] and popularized recently by [chen2018neural], is a continuous method called Adjoint Sensitivity Analysis.

We can compute a parameter gradient by backwards solving

which is known as the adjoint problem. At every discrete point where cost is evaluated, the ODE solution is perturbed by where is solved in the forward pass. Then the loss gradient is22todo: 2verify

Mechanism parameters
Final state
Loss gradient of the final state
Start time , end time
function augment()
end function
Algorithm 2 Adjoint Sensitivity Method

V Experiments

We evaluate the previously introduced methods for computing gradients through the ODE solutions. To this end, we first benchmark these approaches and subsequently compare them on parameter estimation problems involving a simulated compound pendulum and the automatic design of a robot arm. Next, we present an adaptive control algorithm that leverages the parameter estimation capabilities of our differentiable simulator and combines it with trajectory optimization to control a mechanism in a different simulator. Our simulator is implemented in C++ using the Eigen framework [eigenweb] for linear algebra and Stan Math [carpenter2015stan] for automatic differentiation (AD). Since the latter only provides an implementation of reverse-mode automatic differentiation, we limit our attention to this algorithm and leave considerations of forward-mode and other AD techniques for future work.

V-a Benchmarking Gradient Calculation Approaches

In our first experiment, we consider an -link compound pendulum that is simulated over a variety of time steps given its link lengths as parameters . We focus in our profiling of the approaches introduced in section IV on their computational efficiency, i.e., how many dynamics evaluations are necessary, how many variables are generated on the automatic differentiation stack, and the total computation time.

Fig. 2: Benchmarking results of the four Jacobian calculation methods considered in this work on the fourth-order Runge-Kutta integrator. First three plots: results for a double pendulum (logarithmic scale). Last plot: computation times for a 100-link compound pendulum.

We first consider a double pendulum () and report the performance of the algorithms Numerical Differentiation (“Numerical”), reverse-mode AD (“AutoDiff”), Adjoint Sensitivity Method (“Adjoint”) and Coupled ODE System (“Coupled”) in Figure 2. Although the number of ODE evaluations (third plot) grows exponentially faster with finite differencing (orange) than the other methods, we note that AutoDiff takes the longest. We conduct the experiment using error-controlled adaptive time stepping methods, such as the Dormand-Price and the Fehlberg methods, and observe the same behavior. Reverse-mode AD records a copy of each participating variable per operation, resulting in a large stack of variables (second plot), that needs to be traversed in order to compute gradients. In contrast, Adjoint and Coupled both maintain a constant-size stack of variables while taking approximately the same computation time, while the later requires approximately twice as many dynamics evaluations as the former. Next, we investigate the scalability of the continuous sensitivity methods on a 100-link compound pendulum, requiring a parameter vector of size 100 to be estimated. We exclude AD from this comparison due to its prohibitively high computation time. Adjoint and Coupled both remain close in computation time, although Coupled requires sensitivites (1) compared to Adjoint’s augmented state dimensions (2), which might be offset due to the need of solving two ODE in the case of Adjoint at each time step.

V-B Automatic Robot Design

Industrial robotic applications often require a robot to follow a given path in task space with its end effector. In general, robotic arms with six or more degrees of freedom provide large workspaces and redundant configurations to reach any possible point within the workspace. However, motors are expensive to produce, maintain, and calibrate. Designing arms that contain a minimal number of motors required for a task provides economic and reliability benefits, but imposes constraints on the kinematic design of the arm.

One standard for specifying the kinematic configuration of a serial robot arm is the Denavit-Hartenberg (DH) parameterization [hartenberg1955kinematic]. For each joint , the DH parameters are , describing the distance from joint to motor axis , the rotation about axis , the distance of joint along motor axis , and the angle between motor axes and , respectively.

Fig. 3: Optimization of a 4-DOF robot arm design parameterized by the Denavit-Hartenberg (DH) parameters to match the robot arm that generated a given trajectory. (left) Evolution of the DH parameters over the optimization iterations.

We specify a task-space trajectory for as the position in world coordinates of the robot’s end-effector. Given a joint-space trajectory , we seek to find the best -DOF robot arm design, parameterized by DH parameters , that most closely matches the specified end-effector trajectory:

where the forward kinematics function maps from joint space to the Cartesian coordinates of the end-effector, conditioned on DH parameters . Since we compute using our engine, we may compute derivatives of arbitrary inputs to this function, and use gradient-based optimization through L-BFGS [liu1989lbfgs] from the Ceres optimization library [ceres-solver] to converge to arm designs which accurately perform the trajectory-following task, as shown in Figure 3.

Fig. 4: Visualization of an exemplary 4-DOF robot arm and its trajectory in our simulator.
Cost function , episode length , trajectory length , horizon length
for episode =  do
         Replay buffer to store transition samples from the real environment
     Obtain initial state from the real environment
     for  do
          Trajectory optimization using iLQR with cost from Equation 3
         Take action in the real environment and obtain next state
         Store transition in
     end for
     Fit dynamics model to real data by minimizing the state-action prediction loss (Equation 4)
end for
Algorithm 3 Adaptive MPC algorithm using differentiable physics model .

V-C Adaptive MPC

Besides parameter estimation and design, a key benefit of differentiable physics is its applicability to optimal control algorithms. In order to control a system within our simulator, we specify the control space , which is typically a subset of the system’s generalized forces , and the state space . Given a quadratic, i.e. twice-differentiable, cost function , we can linearize the dynamics at every time step, allowing efficient gradient-based optimal control techniques to be employed. Iterative Linear Quadratic Control [li2004iterative] (iLQR) is a direct trajectory optimization algorithm that uses a dynamic programming scheme on the linearized dynamics to derive the control inputs that successively move the trajectory of states and controls closer to the optimum of the cost function.

Throughout our control experiments, we optimize a trajectory for an -link cartpole to swing up from an arbitrary initial configuration of the joint angles. In the case of double cartpole, i.e. a double inverted pendulum on a cart, the state is defined as

where and refer to the cart’s position and velocity, to the joint angles, and to the velocities and accelerations of the revolute joints of the poles, respectively. For a single cartpole the state space is represented analogously, excluding the second revolute joint coordinates . The control input is a one-dimensional vector describing the force applied to the cart along the axis. As typical for finite-horizon, discrete-time LQR problems, the cost of a trajectory over time steps is defined as


where , and the matrices and weight the contributions of each dimension of the state and control input. Throughout this experiment, we set to be diagonal matrices. Minimizing the cost function drives the system to the defined goal state222The goal state is given for a double cartpole here, it is analogously defined for a single cartpole. at which the pole is upright at zero angular velocity and acceleration, and the cart is centered at the origin with zero positional velocity.

Trajectory optimization assumes that the dynamics model is accurate w.r.t the real world and generates sequences of actions that achieve optimal behavior toward a given goal state, leading to open-loop control. Model-predictive control (MPC) leverages trajectory optimization in a feedback loop where the next action is chosen as the first control computed by trajectory optimization over a shorter time horizon with the internal dynamics model. After some actions are executed in the real world and subsequent state samples are observed, adaptive MPC (Algorithm 3) fits the dynamics model to these samples to align it closer with the real-world dynamics. In this experiment, we want to investigate how differentiable physics can help overcome the domain shift that poses an essential challenge of model-based control algorithms that are employed in a different environment. To this end, we incorporate our simulator as dynamics model in such receding-horizon control algorithm to achieve swing-up motions of a single and double cartpole in the DeepMind Control Suite [tassa2018dm] environments that are based on the MuJoCo physics simulator.

Fig. 5: Convergence of the physical parameters of a double cartpole, over all model fitting iterations combined, using Adaptive MPC (Algorithm 3) in the DeepMind Control Suite environment.

We fit the parameters of the simulator by minimizing the prediction loss given the state-action transition from the real system:


Thanks to the low dimensionality of the model parameter vector (for a double cartpole there are 14 parameters, cf. Figure 5), efficient optimizers such as the quasi-Newton optimizer L-BFGS are applicable, leading to fast convergence of the fitting phase, typically within 10 optimization steps. The length of one episode is 140 time steps. During the first episode we fit the dynamics model more often, i.e. every 50 time steps, to warm-start the receding-horizon control scheme. Given a horizon size of 20 and 40 time steps, MPC is able to find the optimal swing-up trajectory for the single and double cartpole, respectively.

Within a handful of training episodes, adaptive MPC infers the correct model parameters involved in the dynamics of a double cartpole (Figure 5). As shown in Figure 1, the models we start from do not match their counterparts from DeepMind Control Suite. For example, the poles are represented by capsules where the mass is distributed across these elongated geometries, whereas initially in our model, the center of mass of the links is at the end of them, such that they have different inertia parameters. We set the masses, lengths of the links, and 3D coordinates of the center of masses to 2, and, using a few steps of the optimizer and less than 100 transition samples, converge to a much more accurate model of the true dynamics in the MuJoCo environment.

Vi Conclusion

We introduced a novel differentiable physical simulator, and presented experiments for the inference of physical parameters, optimal control and system design. Since it is constrained to the laws of physics, such as conservation of energy and momentum, our proposed model provides a large, meaningful inductive bias on robot learning problems. Within a handful of trials in out test environment, our gradient-based representation of rigid-body dynamics allows an adaptive MPC scheme to infer the model parameters of the system thereby allowing it to make predictions and plan for actions many time steps ahead. We look forward to exercising this physics engine for learning and control to solve complex tasks on physical robot systems.