Robots interact with their environments primarily through contact, and planning of such systems (colloquially named planning through contact) is a long-standing challenge in manipulation and locomotion, especially in the absence of fixed mode sequences. Neither the tightly coupled problems of modeling the contact dynamics nor the planning that uses the prescribed model have straightforward answers.
A large body of previous work in planning through contact [posa, contactinvariantoptimization, add, carpentierddp, francois]
has been based on gradient-based optimizers. Combining different contact models, optimization methods for planning, and a variety of relaxation schemes, these methods have shown reasonable success. However, they still tend to remain brittle: they are sensitive to hyperparameters, and the quality of initial guesses becomes a make-or-break factor. When they fail, they can be notoriously hard to debug.
In another direction, recent works in Reinforcement Learning (RL) have shown impressive results in tackling problems with contact dynamics [fu2016oneshot, nagabandi2019deep, heess2017emergence]. The quality of the solutions obtained via RL is surprising, despite the fact that no underlying structure was used. This suggests that improvements can be made in time and sample complexity. More importantly; however, the fact that we do not understand the gap between the success of RL methods and the struggle of more classical gradient-based methods in planning through contact, is unsatisfying. What are the key improvements that RL made but the classical methods failed to consider? We believe that the answer lies in how RL fundamentally considers a stochastic formulation, as opposed to gradient-based methods that have largely been deterministic.
If this is true, what would it mean to take a probabilistic formulation of contact dynamics, and how would it help the planning problem? Can we learn from the answers and combine stochasticity with gradient-based approaches to obtain better convergence behavior? We aim to answer these questions through the lens of randomized smoothing [gradientsampling, duchi1, duchi2]. Results from non-smooth optimization [gradientsampling] tell us that the fast-changing gradients of non-smooth problems destabilize gradient-based optimizers such as gradient descent [yuxinchen]. By considering a probabilistic approach to contact, we aim to alleviate flatness of gradients, as well as stabilize fast-changing gradients of contact dynamics to produce a more globally informative direction of improvement.
We formalize gradients taken through randomized smoothing as the gradient bundle
, and show that it has many benefits over exact gradients of contact models, such as overcoming non-smoothness, alleviating nonconvexity, and supporting a zero-order variant. The gradient bundle can readily replace the exact gradient in optimal control, policy search, state estimation, and system identification. In our work, we apply bundled gradients on optimal control using iterative LQR (iLQR) to address planning-through-contact problems.
In addition, randomized smoothing allows obtaining the gradient bundle of contact dynamics that are defined through implicit optimization problems [stewarttrinkle, anitescupotra, carpentierddp], that are often preferred over penalty methods [huntcrossley] for its ability to take longer timesteps. To demonstrate this advantage, we utilize our method on a quasi-dynamic simulator that uses an implicit time-stepping scheme [Pang2020ACQ] and show that we can tractably tackle robotic manipulation problems.
Contributions. We formalize the notion of the gradient bundle by introducing randomized smoothing, which we then apply to penalty and implicit time-stepping models of contact dynamics. We replace the exact gradients of dynamics with the gradient bundle in iLQR, showing that we can more robustly solve manipulation problems.
Ii Literature Review
Ii-a Differentiable Modeling of Contact
We mainly address two classes of approaches for modeling contact: the penalty method [huntcrossley], and the implicit time-stepping method [stewarttrinkle, anitescupotra]. Although other classes of models exist (e.g. hybrid dynamics simulation through event-detection and impacts [brianmirtich]), they have yet to scale to simulating the complex geometry and dynamics of manipulation.
Works that utilize the penalty method [add, tobia2, terryfelix] approximate contact behavior at each timestep with forces from stiff spring and dampers. While the method can approximate a wide range of contact behavior such as impact, the dynamics often results in a stiff ODE, which requires small timesteps to approximate well. In addition, although penalty methods often have straightforward expressions for the derivative [add], the computation of long-horizon gradients may be expensive due to large number of timesteps required.
On the other hand, implicit time-stepping methods [posa, variablesmoothing, carpentierddp] can take larger timesteps without sacrificing integration accuracy, but integration requires solving a Linear Complementarity Problem (LCP) [cottle2009linear], whose per-step computation is harder and solutions may not be unique. Nevertheless, the LCP can be converted to a convex Quadratic Program (QP) by relaxing the exact Coulomb friction constraints [anitescu2006optimization, convexsmoothcontactmodel, Pang2020ACQ]. The implicit models also allow differentiation by computing the derivatives of the KKT conditions of the QP using the implicit function theorem [boot1963sensitivity].
Ii-B Smooth Approximations of Contact for Planning
As solving the non-smooth problem directly (e.g. via mixed-integer programming [tobia, francois]) is difficult, many existing works consider solving a smooth approximation of the problem. For instance, penalty-based methods make explicit smooth approximations of the contact dynamics [add, tobia2, terryfelix]. While the behavior of such approximations have intuitive interpretations at a dynamics-level, the choice of functions that are used is arbitrary. In addition, it is not clear how to apply such explicit smoothing to implicit dynamics whose result is defined by an optimization problem.
Other works [variablesmoothing, posa, scott, convexsmoothcontactmodel, contactinvariantoptimization, stochasticcomplementarity] do not directly smooth out the dynamics, but find an explicit relaxation of constraints that implicitly define the dynamics. While these works are often better connected to the underlying optimization, the interpretation of the resulting dynamics can be unintuitive which makes debugging cumbersome.
Our philosophy behind why stochasticity is necessary for planning is similar to the argument presented in [stochasticcomplementarity]. However, unlike [stochasticcomplementarity] which explicitly uses “rounded” complementarity constraints and interprets it as minimizing expected constraint violation, randomized smoothing is fully implicit [duchi1, duchi2], in the sense that we do not require explicit smoothing formulas for dynamics, nor the constraints that define them. This allows us to make smooth approximations to a more general class of models. In addition, the bundled dynamics from randomized smoothing is more physically-interpretable than numerical smoothing strategies that “sand off” the sharp corners of non-smooth constraints.
Ii-C Randomized Smoothing for Non-smooth Optimization
The destabilization of gradient descent for non-smooth functions was observed by early works such as Wolfe’s example [yuxinchen, clarke]. Stabilized gradient descent through gradient sampling was introduced in [gradientsampling, subdifferentials, sqpgs], which proposed to approximate an instance of a subgradient in the convex hull of sampled gradients around the current iterate. Based on these works, an optimal convergence guarantee was provided in [duchi1] where the term randomized smoothing was introduced, and a zero-order variant was presented in [duchi2].
Our work strongly builds upon existing works, but we analyze the consequences of randomized smoothing in the presence of contacts and aim to use the gradient bundle for optimal control. In addition, we show potential pitfalls of gradient sampling in the presence of dynamics of discontinuities, which has not been addressed by previous works.
Iii Randomized Smoothing
Motivated by [gradientsampling, duchi1], we first introduce the idea of randomized smoothing in the context of gradient descent.
Iii-a The Gradient Bundle and the Bundled Objective
Let be a non-smooth objective function that is differentiable a.e., and denote the gradient as . While the gradient suggests a local direction of descent, it is myopic. For smooth non-convex functions, gradient descent could lead to bad local minima. For non-smooth functions, the gradient may be subject to large jumps, stalling the descent on the cost function [clarke, yuxinchen] and compromising standard guarantees of convergence. To alleviate the problems caused by the locality of the gradient, we can aggregate gradient information in some neighborhood of the current , which we formalize with the term gradient bundle.
Gradient Bundle Let
be a symmetric probability density. Then, thefirst-order gradient bundle is defined as
The gradient bundle can be thought of as a convolution of and the kernel . Further insight of descent using the gradient bundle can be obtained by defining the bundled objective.
The Bundled Objective is defined as
The first-order gradient bundle is the exact gradient of the bundled objective ()
Exchange the expectation and the derivative using the Dominated Convergence Theorem. ∎
This suggests that using the bundled gradient in place of the exact one lets us implicitly operate on a different objective function defined by the bundled objective. The bundled objective has an intuitive explanation as result of filtering via convolution, such as Gaussian smoothing. Such smoothing schemes alleviate non-convexity by smoothing out local minima, and allows tackling non-smooth problems by stabilizing gradient descent [gradientsampling]
. Note that the gradient bundle is purposely biased to produce more global effects, and thus will not converge to a stationary point of the original objective, but rather that of the bundled objective. This can be mitigated by iteratively decreasing the variance, as done in typical stochastic approximation schemes [robbinsmonro, kieferwolfowitz].
Iii-B Zero-Order Variation
If is not available or costly to compute, one may instead compute an expected value of finite-differences (which we denote by ) to use as a direction of descent [duchi2]. We call this object the zero-order gradient bundle. Convergence of such schemes are well analyzed in Simultaneous Perturbation Stochastic Approximation (SPSA) [spsa].
Function with many Local Minina: Let
which has many bad local minima. Under a suitable choice for the Gaussian kernel , this function can be smoothed out to be convex, or gradient dominant [gradientdominance].
Iii-C Monte-Carlo Integration for the Gradient Bundle
Symbolic computation of the gradient bundle is prohibitive, and quadrature methods do not scale well with dimension. Thus, we propose use Monte-Carlo integration, which can scale to high dimensions under suitable conditions on the variance of the objective. This yields the gradient sampling algorithm [gradientsampling] and its zero order variant [duchi2],
Law of Large Numbers. If has no discontinuous jumps and , then and
However, one must be very careful with this Monte Carlo approximation when has discontinuous jumps, due to the dirac-delta that appears in the derivative [teg]. We formalize this through the following theorem:
Sampling Gradients of Discontinuities. Let
be a function differentiable a.e. with at least one jump discontinuity;. Then, the Monte-Carlo estimate of the first-order gradient bundle almost surely fails to converge to the gradient of the bundled dynamics,
Sampling from the dirac-delta is ill-defined, as it does not evaluate pointwise. In practice, however, such an attempt would always lead to almost surely. Without loss of generality, let . Then, s.t.
where . Then,
Given that samples drawn from have zero probability of satisfying ,
which completes the proof. ∎
Heaviside Function: Let
Then, , and the approximated gradient bundle almost surely. However, the analytical computation of the bundled objective with a Gaussian would result in , whose derivative is non-zero at . Thus .
Finally, first-order gradient sampling may compromise the quality of gradient bundle for non-smooth functions (even without discontinuities) in the regime of small samples.
Quantization of Gradients: Let
Then, for , and for . With samples, the approximated gradient bundle will be quantized and can only take a finite set of values.
Iii-D Jacobian Bundle and Bundled Dynamics
We extend the concept of randomized smoothing to dynamical systems with the Jacobian bundle and the bundled dynamics. Let be a discrete-time dynamical system with state , and input ,
Recall that the linearization of the dynamical system around the nominal trajectory can be formed by taking a first-order Taylor expansion:
While the linearization provides a local approximation, it suffers the same problem from the gradient in a sense that it is myopic. We apply randomized smoothing to the Jacobian of the linearized system, and aim to provide a more globally informative linear approximation to the system.
The Jacobian Bundle Let be a multivariate symmetric probability density with arguments . Then, the Jacobian bundle is defined as a convolution between and the Jacobian of :
The Bundled Dynamics The Bundled dynamics are defined as a multivariate convolution between the original dynamics and the probability density :
The linear system described by the Jacobian bundle is the exact linearization of the bundled dynamics .
Identical to Lemma 1. ∎
The above theorem establishes that using the bundled Jacobian, as opposed to the exact linearization of the system, is equivalent to implicitly operating on a smoother version of the dynamics that has been convolved with .
Iii-E Monte-Carlo Approximation and Zero-Order Variant
We propose to approximate the Jacobian bundle using Monte-carlo integration, where :
Finally, we introduce a zero-order variant that takes a least-squares approximation to the local dynamics, which can be useful when the Jacobians are impractical to compute:
Iv Randomized Smoothing of Contact Dynamics
What does it mean to apply randomized smoothing in the presence of contacts? Intuitively, different samples will encounter different contact modes. In expectation, the effect of sampling is a smoothed behavior where normal and frictional forces are applied at a distance, and the boundaries of stick and slip blur. We analyze this intuition through two classes of contact models: implicit contact defined by complementarity constraints [anitescupotra, stewart2000rigid], and the penalty method [huntcrossley].
Iv-a Contacts Defined by Complementarity Constraints
Implicit models of contact using complementarity constraints [stewarttrinkle, anitescupotra] are widely used in planning through contact for their ability to stably take bigger time steps [Landry2019BilevelOF, posa]. However, unlike penalty methods which permit explicit smoothing of forces [add, tobia2, terryfelix], smoothing results of implicit optimization problems is less straightforward; randomized smoothing is unique in that it provides the ability to do so without an explicit relaxation of constraints.
An accurate treatment of multibody dynamics with contact and friction in its full glory requires heavy notation and detracts from the intuition behind the advantage of smoothing. We instead demonstrate the effect of smoothing on contact dynamics through two simple, 2D examples, and refer the reader to [anitescupotra, stewart2000rigid, anitescu2006optimization, Pang2020ACQ] for a more detailed presentation of contact dynamics.
To simplify the dynamics aspect of the modeling and focus more on contact, we keep the full second-order dynamics for un-actuated objects, but make the assumption that the actuated bodies (robots) are gravity-compensated, PD-controlled and quasi-static [pang1996complementarity, Pang2020ACQ]: the robots are always in force equilibrium, with the generalized forces exerted by contact balanced by the virtual spring of the proportional part of the PD controller. The quasi-static assumption reduces the dimension of robot states by half, and is commonly made in systems where inertial forces are negligible compared to contact forces, such as robotic manipulation [francois, chavan2018stable]. Despite these simplifications, our analysis on the effect of contact smoothing does transfer to the full second-order dynamics of the entire robot-object system.
Randomized Smoothing of Normal Contact. We first consider a 1D system of two objects in Fig. 5a, where the boxes can only interact through normal contact forces. The dynamics of the robot (red box) is described by
where denotes the impulse generated by the normal force during the timestep ; and are the robot’s commanded and actual positions at the next timestep . Constraint (17) states that the contact force is balanced by the virtual spring force at .
The dynamics of the object (green box) is given by
where (18a) states that the gain in object momentum comes from the normal contact impulse , assuming that the robot has 0 velocity at the current timestep.
The contact impulse and the distance between the two objects satisfy the following complementarity constraint:
for vectorsand means that . The implication of (19) is three-fold: (i) the impulse can only push (); (ii) the two boxes cannot penetrate at (); and (iii) non-zero normal impulse exists only if the two objects are in contact ().
where . The explicit dynamics (20), which has the same form as (12), results in perfect position command being achieved for with no movement for if there is no contact; and being placed near the commanded position if there is contact.
As shown in Fig. 5b, before making contact, the control input has no effect on , producing zero gradients. In contrast, when sampling around , some of the samples will hit the un-actuated box, and in expectation, push them away from each other. Thus, the bundled dynamics give information that should approach to push the box.
Randomized Smoothing of Friction. In this example, we study the 2D system in Fig. 6 where the robot (red sphere) interacts with the green box through a frictional contact on the right of the box.
The Coulomb friction model constrains contact force to stay inside the friction cone. In addition, when the contact is sliding, the force needs to stay on the friction cone’s boundary and in the opposite direction of the relative sliding velocity. Similar to the normal contact constraint (19), Coulomb friction can be modeled with additional complementarity constraints [stewarttrinkle].
Assuming the same quasi-static dynamics for the robot, zero initial velocity for the box, and the initial condition , it can be shown, by solving the dynamics subject to the Coulomb friction constraints, that the box’s position at the next time step, , is again a piece-wise linear function of the control input . As shown Fig. 7a, the dynamics function consists of four pieces, where each piece corresponds to one contact mode. Note that the exact dynamics is flat for all , which provides no information about how the control affects the state. In contrast, the bundled dynamics (Fig. 7b) has non-zero gradient even for .
For faster speed and better numerics, multi-body simulators (such as [mujoco, Pang2020ACQ]) frequently relax the exact complementarity-based Coulomb friction constraints in [anitescupotra, stewart2000rigid]. One such relaxation is proposed by Anitescu [anitescu2006optimization], which allows the forward dynamics to be solved as a quadratic program (QP) instead of a linear complementarity problem (LCP). Using Anitescu’s relaxed constraints, the dynamics of with respect to (Fig. 7b) can be shown to have a very similar structure as the exact Coulomb friction contact dynamics in Fig. 7a. In particular, the relaxed dynamics is the same as the exact dynamics when the contact is sticking or separating. In sliding, a “boundary layer” is introduced, in which the ball drags the box even when they are not in contact.
Although relaxed contact models such as [anitescu2006optimization] sacrifice physical realism, the bundled dynamics of the relaxed contact models behave similar to the bundled dynamics of the exact contact models, as shown in Fig. 7c and 7d. This suggests that algorithms using bundled dynamics for planning can safely capitalize the speed and efficiency of the relaxed contact models despite their mild physical inaccuracy.
Iv-B Penalty-based Contacts
In the penalty method [huntcrossley], the forces in the normal direction are approximated using a stiff spring. Denoting the signed distance between two objects as , we can write the normal-direction force with two modes of contact and no-contact.
In the penalty approximation of Coulomb friction, sticking is often enforced by means of very viscous damping, while slipping is approximated by a constant multiple of the normal force. Let be the relative tangential velocity between two objects at the point of contact.
where controls the threshold of stick and slip. Simple models of Coulomb friction sets to make the approximation continuous (i.e. ), as done in [add]. However, the Stribeck effect [stribeck] tells us that dynamic friction is lesser in magnitude than static friction. Continuous approximations to the discontinuities of Stribeck effect is made in Drake [drake].
Iv-B2 Randomized Smoothing
We apply randomized smoothing by sampling and , and plot the results on Fig.8. We note several effects of randomized smoothing: (i) in the normal direction, even if the objects are not in contact, repulsive forces are applied at a distance; (ii) tangential friction is also applied at a distance due to smoothing that occurs in the direction of in Fig.8; (iii) discontinuous effects from the Stribeck effect is smoothed out.
Iv-C Key Insights
Iv-C1 Problems with Exact Gradients
The non-smoothness of both the implicit time-stepping scheme and the penalty method suggests that the algorithms that rely on the local validity of linearization will suffer. The flatness of the gradients in non-penetrating configurations also suggests no direction of improvement.
Iv-C2 Relaxing Coulomb friction leads to better planning
Relaxations of the LCP version of Coulomb friction (e.g. Anitescu) allow the application of friction at a distance and coincidentally provide meaningful gradients useful for planning. For instance, the slope of sliding in Fig.7.b suggests that the ball should be closer to the box in the direction of order to drag the box. This suggests that regularization used in popular simulators such as Mujoco [mujoco] may fundamentally help planning algorithms achieve better performance.
Iv-C3 Discontinuities of Contact Dynamics
Implicit time-stepping and the penalty method both result in non-smooth, yet continuous contact behavior with the exception of the Stribeck effect. The intuition behind the continuity of implicit time-stepping methods lies in the inelastic approximation of contact behavior, while the continuity of penalty methods come from considering an infinitesimal section of time.
However, we have yet to conclude the consequences of Theorem 1 to be a minor nuisance. In more complex examples with non-smooth geometries, the normal of the object surface can also change in a discontinuous manner [underactuated, Elandt2019APF]. In contrast, the zero-order gradient bundle can be free from the consequences of Theorem 1, which might suggest that the zero-order nature of popular RL algorithms such as the policy gradient [policygradienttheorem] might coincidentally help in overcoming pitfalls of sampling gradients directly.
V iterative Randomized Smoothing-LQR
The gradient bundle obtained via randomized smoothing can be used to replace the exact gradient in a variety of optimization problems, suggesting a new family of algorithms. In this work, we choose to replace the exact linearization with the Jacobian bundle in the trajectory optimization algorithm of iterative LQR (iLQR). [ddp, todorovilqr, tassa1]. We name our variant iterative Randomized Smoothing-LQR (iRS-LQR). Our resulting variant is similar to SaDDP [saddp] or Unscented DDP [plancher2017constrained], but we uniquely apply it in the presence of contact dynamics with the understanding of how sampling helps.
We first modify iLQR to handle more general constraints by solving a Model Predictive Control (MPC) problem on the locally linear dynamics at every time step, which results in solving a QP of decreasing length at every time step. Such approaches fit well within the computational budget of modern QP solvers [gurobi].
Here, are parameters of linear inequalities on state, and are parameters of linear inequalities on inputs, and represents the desired trajectory. Note that the time indices depend on ; the horizon of the MPC decreases with our progress along the trajectory. Using this subroutine, we roll out the system using MPC at every timestep (whose horizons get shorter at every timestep) and iteratively repeat until a satisfactory solution has been reached (Alg.1).
We introduce a variance scheduling function which takes the initial variance and the current iteration number to produce a reduced variance [robbinsmonro, spsa]. Convergence to a local minima of the original problem is guaranteed as long as .
Vi Experiments and Results
Here, we show experimental comparisons of iRS-LQR (Alg. 1) against the baseline of using the same algorithm with exact gradients, which we refer to as iLQR.
Vi-a Simulation and Parallelization
We test iRS-LQR on a quasi-dynamic simulation that relies on convex implicit time-stepping models of contact [Pang2020ACQ], as well as the simulation of Drake [drake]. We also parallelize iRS-LQR on the CPU by having different threads per knot point in , which are responsible for sampling around the designated points. We use 100 samples throughout all the experiments to demonstrate the effectiveness of Monte-Carlo integration in higher dimensions.
Vi-B Smooth Systems
Will the gradient bundle be more effective than the gradient even for smooth systems? We investigate this question for three smooth systems: the pendulum, Dubin’s car [dubin], and the quadrotor. In many cases, the performance of iRS-LQR is comparable to or better than that of exact linearization. However, when the initial guess supplied is not informative, the gradient bundle can power through local minima and still arrive at much better minima compared to exact gradients, as demonstrated by the example of Dubin’s car.
Vi-C Non-Smooth Systems with Contact
To test our hypothesis that the gradient bundle results in more stable behavior compared to using the exact gradient in the presence of contacts, we test iRS-LQR on three systems in robotic manipulation: a planar two-finger manipulation example with gravity, a planar pushing example [francois], and a box flipping (pivoting) example with gravity (Fig.1).
In the planar hand example, we start with an initial guess that makes contact. Although the exact gradient is able to drive the cost down, the fast changing gradients due to the switching of contact modes between iterations noticeably destabilize the algorithm. On the other hand, the first-order and zero-order gradient bundle are much more stable.
For the planar pushing and box pivoting example, we set the initial guess such that the robot is not in contact with the object. Due to flat gradients, iLQR is no longer able to make progress from this bad initialization. In contrast, iRS-LQR still succeeds in finding a valid descent direction and stably drives down the cost. These results demonstrate our insights in the problems of using exact gradients in Sec.IV-C.
Vi-D First-order vs. Zero-Order Gradient Bundle
Throughout our work, we were constantly surprised to see that the zero-order gradient bundle was on par with, sometimes outperformed, the first-order one. Despite only having samples, the zero-order variant also produced high-quality solutions in high-dimensional systems (e.g. quadrotor with states). which is consistent with the findings of [saddp].
Combined with the robustness of zero-order optimization for sampling non-smooth functions (Example 2), this may suggest that zero-order methods may be a promising alternative to first-order methods in tackling non-smooth contact problems. As a consequence, simulators, given enough computation speed and parallelizability, may not necessarily need to focus on end-to-end differentiability.
Vi-E Contact-dynamics formulations
We show that randomized smoothing is also able to stabilize second-order (fully-dynamic) systems with penalty-based contact models, whose gradients are obtained through Drake’s autodiff feature. However, using the implicit quasi-dynamic formulation for planning resulted in much faster (x36) computation compared to the second-order penalty method due to less number of timesteps.
In this work, we have presented a method to obtain the gradient bundle through contact dynamics by using randomized smoothing. By applying randomized smoothing on two different contact models of implicit time-stepping and penalty, we answered what it means to take a stochastic formulation of contact dynamics. We then applied the gradient bundle to planning-through-contact problems, and showed that the gradient bundle enables much more stable convergence behavior compared to exact gradients. Through this, we answered how a stochastic formulation of dynamics might help the planning process.
In addition to the qualitative analysis, our contribution includes taking the gradient bundle of implicit quasi-dynamic dynamics, whose explicit smooth approximations are difficult to make. Combining the stability of our gradients and the computational benefits of the quasi-dynamic simulation, we took a step towards stable and real-time solving of contact-rich manipulation problems.