 # Riemannian Motion Policies

This paper introduces a new mathematical object for modular motion generation called the Riemannian Motion Policy (RMP). An RMP is an acceleration field (dynamical system) coupled with a corresponding Riemannian metric defining directions of importance at each point, typically defined on a nonlinear task space. We derive operators for transforming RMPs through the network of nonlinear maps defining their task spaces and combining multiple RMPs residing at a given task space, and show that these operators optimally preserve the geometric defined by the metrics in a way strongly analogous to the covariant transformations of natural gradients. This framework enables us to bridge the literature by fusing motion policies together across differing motion generation paradigms, from Dynamical Systems and Dynamic Movement Primitives to Optimal Control and Model Predictive Control. RMPs are easy to implement and manipulate, simplify controller design, clarify a number of open questions around how to effectively combine existing techniques, and their properties of geometric consistency, for the first time, make feasible the generic application of a single smooth and reactive motion generation system across a range of robots with zero re-tuning.

## Authors

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## I Introduction

This paper introduces a new mathematical object for motion generation called the Riemannian Motion Policy (RMP) which significantly simplifies the design and implementation of modular motion generation components that can be shared between robots. RMPs are geometrically consistent under transformation and combination (e.g. they are covariant to reparameterization in the same sense as natural gradient operators ()), which makes them well-suited to parallel computation (computing different parts of the overall controller, such as motion optimization and local reactive control, on separate processes to be combined later) and enables straightforward reuse across different robots. These operators may be viewed as a generalization of the pseudoinverse, and in the same sense they are provably optimal in terms of the way they transform and combine.

There is a vast literature on motion generation, ranging from motion planning  to motion optimization [10, 14, 12, 4, 15, 11] to probabilistic inference [16, 8] or incremental trajectory generation from generator equations or dynamical systems . Here, we take a step back and analyze the problem from first principles, prioritizing efficiency and simplicity of behavioral design and reuse across robots, and build on insights from Gauss’s Principle of Least Constraint  to theorize that covariance and generalization across robots can be achieved by leveraging the same mathematics governing the way dynamics manifest across differing systems in classical mechanics . In doing so, we clarify what may be considered the “right” way to modularly represent motion, and unify many of these existing techniques withing a common framework.

We demonstrate in this paper that dynamical system representations of motion policies (i.e. second-order nonlinear differential equations mapping position and velocity to a desired acceleration) are very general and flexible and cover many different techniques from the literature, and that such motion policies alone are insufficient representations with residing alongside each other within a single motion generation systems. All motion policies should come equipped with an associated Riemannian metric to fully capture the geometric structure surrounding the vectors of the vector field that define the directions of importance for a given policy. RMPs unify a number of existing technique, are easy to implement and manipulate, simplify controller design, clarify a number of open questions on how to effectively combining different types of motion generation components into a single system (such as Operational Space Control, Dynamic Movement Primitives (DMPs), (continuous) motion optimization, nonlinear reactive controllers, etc.), and are the first of their kind to generalize from robot to robot out-of-the-box with zero re-tuning.

## Ii Mathematics of Robot Control

Notation: is the -dimensional configuration of the robot. are the generalized coordinates, a parameterization of the configuration space, typically joint values. Dots denote velocities. e.g. are the positions and velocities of the joints, respectively. We make ample use of task spaces. A differentiable map is often referred to as a task space, and is the -dimensional task space variable. As above, and denote velocities and accelerations, respectively, through the task space.

The Jacobian is the first derivative of a task map. We denote it as

 ∂ϕ∂q=Jϕ. (1)

The following relations hold from calculus:

 ˙x =ddtϕ(q)=Jϕ˙q (2) ¨x =d2dt2ϕ(q)=Jϕ¨q+˙Jϕ˙q≈Jϕ¨q. (3)

That last approximation, dropping the term, is similar to a Gauss-Newton approximation. It drops the term associated with the second-order curvature of . Specifically, is the curvature of in the direction of the velocity . In practice, integration is fine-grained enough in control loops (usually between 100Hz and 1kHz) that the second-order correction term is unnecessary.

We often drop the subscript on the Jacobian when the context is clear.

A Riemannian metric is a positive (semi-)definite matrix defined on the tangent space to measure the inner product between two tangent vectors and as ; likewise, . This means

 ∥˙x∥2A=∥˙q∥2JTϕAJϕ (4)

since . This operation of a metric becoming transformed to the domain of a differentiable map is known as a pullback. The pullback is the metric in the domain that mimics the geometry of the map’s co-domain under . I.e. algorithms defined on task space w.r.t. behave the same when run in w.r.t. the pullback (e.g. optimization). This is important when we have many task spaces with controllers running on them that need to be combined in the configuration space in a geometrically consistent way (see next section).

Note that when the state space consists of position and velocity , the tangent space can be parameterized by accelerations despite the dimensionality discrepancy. This is due to constraints on the actions. Using the approximation , we get the same math as above, so we typically refer to a positive (semi-)definite matrix in this context as a Riemannian metric as well.

### Ii-a Basics of controlling motion with acceleration policies

Motion policies are usually defined as second-order differential equations mapping position and velocity to acceleration . For any initial state , it’s common to integrate forward using Euler integration to get

 qt+1 =qt+Δt˙qt (5) ˙qt+1 =˙qt+Δtf(qt,˙qt). (6)

The robot’s underlying control system might consume any subset of , depending on whether it’s position controlled, velocity controlled, position and velocity controlled with inverse dynamics, etc. It’s often convenient to write , so that .

Many control systems control the robot very precisely. In those cases, reactions and compliance is implemented directly as modifications to the position signals. Force control can, too, be implemented by pushing the position into known collision and leveraging the underlying PD controllers of the robot. But ideally, the robot has its own force force controllers taking force-torque measurements as feedback since it’s easier to implement that at a lower level. Note that force control on external objects should be able to operate within the nullspace of position control since contact forces can only be generated when in contact. If the system is out of contact when it should be in contact, that’s a state estimation error, and it’s the job of the state-estimation system to update the estimate so the right controllers are run until the contact event is measurably encountered.

Innately compliant torque controlled systems are often controlled this way, too. Torque control requires a strong real-time 1kHz loop, so a common decomposition is to command and contact forces as needed, while monitoring deviations induced by compliance to control the desireds appropriately under perturbation.

#### Ii-A1 A new algebra for differential equation manipulation

If there are multiple policies defined across multiple task spaces, we need a formal way to combine them. Operational space control gives a nice formulation defined around quadratic programming, but it’s continued to be common practice to just superimpose (add together) differential equations on the same space or transform one through the robot’s forward kinematics (pull it from the end effector space to the configuration space) without considering the geometric implications. That often results in the different policies fighting each other. We, therefore, modularize the mathematics of operational space control, focusing on the often ignored Hessian or weighting matrix, to formalize a new mathematical algebra for combining and transforming differential equations. We call the resulting mathematical objects Riemannian Motion Policies (RMPs).

## Iii Riemannian Motion Policies (RMPs)

A Riemannian Motion Policy (RMP) is a second-order differential equation (or acceleration policy) defined on task space augmented with a Riemannian metric denoted . Associated with the definition are operations of summation (combining RMPs) and pullback (transforming from co-domain to domain of a nonlinear map).

A collection of RMPs are combined into a single RMP as a metric-weighted average

 (fc,Ac)=((∑iAi)†∑iAifi, ∑iAi). (7)

Often, is full rank, and the pseudoinverse reduces to the full inverse. Note that if each metric is of the form this reduces to the traditional weighted average, and the combined metric is just the sum of weights. We define this as the summation operation on RMPs and denote it simply as .

The pullback operation, denoted by a superscript of , pulls an RMP from the co-domain of a nonlinear differentiable map to the domain, forming an equivalent RMP on the domain as

 (8)

For generality, we use the dagger () notation since we make no restrictions on the dimensionality or rank of . When has full row rank, this differential equation transformation can be rewritten as the Moore-Penrose pseudoinverse (the metric factor drops out). See Section IV for details on the derivation. We write the expression as Equation 8

to emphasize it’s relationship to the natural gradient common in machine learning

[1, 3]. If is the gradient of a potential function , the parametric gradient of the composed function is . The differential equation transformation defined by Equation 8 is, therefore, the parametric gradient transformed by the pullback metric . Thus, we can view this pullback differential equation as the natural vector field, in analogy to the natural gradient, paired with the corresponding pullback metric.

The power of this representation is seen in the linearity of this pullback operation. Combining in the co-domain and pulling the result back into the domain is equivalent to pulling each RMP from the co-domain to the domain and then combining there:

 (∑i(fi,Ai))ϕ=∑i(fi,Ai)ϕ. (9)

This relation makes the pullback a group homomorphism. Note that the same linearity does not hold when transforming under the Jacobian and combining by summing. This is a property of the covariance of pullbacks: it doesn’t matter where we define the RMPs, their geometry is tracked so that the final combined controller is always the same.

We often have a network of task spaces such as

 x1 =ϕ1(q)       x2=ϕ2(q) z1 =ψ1(x1)       z2=ψ2(x1) z3 =ψ2(x2)

If each task space (including the configuration space) has a collection of RMPs defined, it doesn’t matter what order we combine and pull them back, the answer is always the same, equivalent to pulling them all back into the configuration space and combining them there.

## Iv Derivations

The mathematics of RMPs follow the mathematics of quadratic functions (similar to much of the calculational mathematics behind Riemannian geometry and operational space control). The algebra of RMPs modularizes that mathematics and makes it easier to conceptualize and decompose in systems as discussed below. Key to the derivations is the correspondence between an RMP and the centered form of the quadratic:

 (f,A) ⟷ 12∥f−¨x∥2A. (10)

### Iv-a Summation operation

Summation can be derived by solving for the equivalent minimizer-Hessian form of the quadratic. Let be a collection of RMPs defined on a task space . Then operational space control defines a quadratic of the form

 Q(¨x)=12n∑i=1∥¨xdi−¨x∥2Ai. (11)

It’s easy to show that the minimizer of this quadratic is , and the Hessian is . In most cases is full rank and the pseudoinverse reduces to the full inverse. The equivalent quadratic is, therefore,

 Q(¨x)=12∥¨x∗−¨x∥2A, (12)

with and defined as above, showing that is the desired RMP.

### Iv-B Pullbacks

Let (using the standard GN approximation). The pullback operation can be derived by solving for the centered form of the composed quadratic:

 min¨q12∥¨xd−J¨q∥2A. (13)

We can rewrite this quadratic as

 min¨q  12¨xdTA¨xd−¨xTdAJ¨q+12¨qT(JTAJ)¨q. (14)

Setting the gradient to zero gives

 JTA¨xd=JTAJ¨q∗  ⇒  ¨q∗=(JTAJ)†JTA¨xd, (15)

where we use the notation to denote the pseudoinverse to capture the most general case. When the Hessian is invertible, this generalized inverse reduces to the standard inverse.

Of note is the special case when is full row rank, but reduced column rank (i.e. with and full row rank), and is strictly positive definite. It’s easy to show that for matrices with this property . Defining , we can write the right hand side of Equation 15 as

 ((A12J)T(A12J))†(A12J)TA12¨xd (16) (17) =BT(BBT)−1A12¨xd (18) =JTA12(A−12(JJ)−1A−12)A12¨xd (19) =JT(JJT)−1¨xd=J†¨xd. (20)

In this case, the co-domain metric doesn’t affect the natural pullback vector field. It only shapes the pullback metric.

### Iv-C Associativity and commutativity of RMP addition

In this section, we show that RMPs can be added in any order by demonstrating associativity of the addition operation. Let be three RMPs defined on . Then

 ((¨x1,A1)+(¨x2,A2))+(¨x3,A3) =((A1+A2+A3)−1(A1¨x1+A2¨x2+A3¨x3), A1+A2+A3). (21)

This same algebra holds for any other ordering of addition, which shows that it’s independent of the ordering and grouping.

### Iv-D Associativity of the pullback operator

It is straightforward to show the associativity of the pullback operator. Let and so that is well defined, and suppose is an RMP on . We need to show that .

Note that , and that the equivalent quadratic for a pullback through can be expressed

 12∥¨xd−¨x∥2A  s.t.  ¨x=Jϕ¨z. (22)

Likewise, the subsequent pullback through introduces a second constraint , which in combination with the first, makes . This constraint is equivalent to the constraint introduced by pulling through the composed function , which is what we wanted to show.

### Iv-E Linearity of the pullback operator

Differential equations alone, where combination is implemented by superposition (simple addition) and transformation is accomplished by Jacobian-transpose, have traditionally been convenient representations of motion since they’re easy to manipulate and consistent. In particular, transformation is linear under superposition which means the ordering of operations doesn’t matter. Explicitly, let be differential equations on , and suppose is a differentiable map with Jacobian , then in general

 JT∑i¨xi=∑iJT¨xi. (23)

We show here, that RMPs have the same property with respect to the pullback operation (transformation) under metric-weighted averaging (addition).

Explicitly, this means that the pullback of a sum is the same as the sum of the individual pullbacks as shown in Equation 9. We can show this by calculation. Let and be two RMPs defined on , and let be a differentiable mapping from to . We want to show that

 ((¨x1,A1)+(¨x2,A2))ϕ=(¨x1,A1)ϕ+(¨x2,A2)ϕ. (24)

We expand the left hand side of Equation 24 first:

 ((¨x1,A1)+(¨x2,A2))ϕ (25) =((A1+A2)−1(A1¨x1+A2¨x2), A1+A2))ϕ. (26)

The pullback vector field becomes

 (JT(A1+A2)J)†JT(A1+A2)⋅ ((A1+A2)−1(A1¨x1+A2¨x2)) =(JT(A1+A2)J)†JT(A1¨x1+A2¨x2). (27)

Likewise, the metric is .

Now expanding the right hand side of Equation 24 we get

 ((JTA1J)†JTA1¨x1,JTA1J) (28) +((JTA2J)†JTA2¨x2,JTA2J).

The combined metric is which is the same as the Hessian from the right hand side. Similarly, the pullback vector field becomes

 (JTA1J+JTA2J)†[JTA1¨x1+JTA2¨x2] (29) =(JT(A1+A2)J)†JT(A1¨x1+A2¨x2),

which matches the expression in Equation 27. The simplified left and right side expressions match for the both the vector field and metric, showing the linearity of the pullback operation.

Linearity, in conjunction with the two associativity properties, means it doesn’t matter what order we combine or pullback the RMPs. This is a powerful result. It enables us to parallelize computation by computing parts of the solution separately before combining, and even, as described in Section V-D unify motion optimization and operational space control into a single geometrically consistent framework with each part computed in separate processes at vastly different update rates.

### Iv-F Covariance

Like related operations in Riemannian geometry, the pullback operation is covariant to reparameterization [2, 3]. That means the algorithm is unaffected by a change of coordinates. Specifically, if is a bijective differentiable map with Jacobian , and is an RMP on , then

 (¨qd,A)ζ=(JTζ(JζJTζ)−1¨qd, JTζAJζ)=(¨ud,B). (30)

Is equivalent up to numerical precision. I.e. for any , the following holds:

 JTζ¨ud(u,˙u)=¨qd(ζ(u),Jζ˙u). (31)

Therefore, integral curves created in and transformed to will match the corresponding integral curve found directly in . (Although, they may differ slightly if estimated numerically.)

Section V-C gives an example of the utility of this property.

## V Building a motion generation system

We typically model the robot mathematically as with . The nonlinear function is the forward kinematic function mapping to a collection of points on the robot’s body. We denote , and when relevant also use with to denote the axes of a frame with origin . E.g. the end-effector is typically represented as a full frame. The map satisfies the joint limits by squashing the unconstrained variable through a sigmoid applied dimension-wise (scaled appropriately so the range spans the robot’s joint limits).

### V-a Basic local reactive policies

Below we sometimes use soft normalization so we define it here first. Soft normalization of a vector is defined as where

 Z(x)=x+clog(1+exp(−2cx)),  c>0. (32)

is increasingly like when is large, but limits to the finite when .

The control system often consists of a collection of RMPs defined on these spaces of the following forms (in all cases, alternative smooth controllers with similar properties can be substituted—these are just one choice, but they exemplify the problem decomposition):

1. Collision. Let be a collection of obstacle points (sometimes reduced to key points on obstacles such as closest points). For each pair, we define

 fij(xi,˙xi)=α(dij)ˆv−β(dij)[ˆvˆvT]˙xi, (33)

where , the hat denotes normalization, and and is a distance function. This policy both pushes away from obstacles and damps velocities toward obstacles. Parameters are the distance function, , and . Obstacle controllers don’t care about motion orthogonal to , so we typically use a directionally stretched metric of the form , where denotes soft normalization as defined above and is a weighting function that decreases to zero further from the obstacle. Note that obstacles include world obstacles as well as the robot’s own body (including its other arm). All obstacles are treated in the same way. Handling each obstacle as a separate RMP and using the RMP combination operations to combine the (many) different policies makes a big difference in practice. Section V-B analyzes the differences between this technique and other related methods from the literature.

2. Target. Let be the 3D end-effector position task space, and let be a desired target. Then the target controller is defined as

 (34)

where is the soft normalization as defined above and

are scalar gains. This policy pulls directly toward the target with constant force and damping, but diminishes the pull term to zero close to the target and just damps. This results in well-behaved convergent behavior at the target. We’ve seen good behavior using either a similar directionally stretched metric as used for the collision controllers or just the identity matrix.

3. Orientation. Orientation controllers take the same form as the target controllers, but operate on the frame axes rather than the frame origin point. They can operate on one (partial constraint) or two axes (full constraint).

4. Redundancy resolution and damping. End-effector constraints constrain only between 3 and 6 dimensions, but these arms often have 7 dofs. We resolve that redundancy using a controller of the form:

 fd(q,˙q)=α(q0−q)−β˙q, (35)

where are constant gains and is some default posture configuration. We typically use the identity metric here. Note that this controller is not covariant. Ideally, we should define the controller across points on the arm to better generalize to other robots. However, this controller is relatively a minor contribution to the overall system behavior and defining it directly in the configuration space works well in practice.

These controllers are all local: they react without planning. They’re very effective for local navigation among obstacles, but more sophisticated techniques are needed to guide the arm across long distance that require deciding how to navigate large obstacles that significantly warp the workspace geometry.

### V-B Collision policies

As described in Section V-A, collision controllers are often described as motion policies acting on a point . Commonly, contributions from multiple obstacles are simply superimposed (added together) to form [5, 9]. A thought experiment shows that superposition may result in unintuitive behavior. Experiments from the literature often consider obstacles situated only to one side of the end-effector, which is a configuration that doesn’t exercise the most degenerate error modes, so they sufficed as a first demonstration on these techniques.

Consider a scenario where the robot reaches between two obstacles. When the desired acceleration contributions from the two obstacles are symmetric then their sum is zero . That can result in two types of behavior: 1. If contributing motion policies are simply superimposed in the configuration space, this means multiple obstacles interfere with each other, effectively removing their individual contributions causing the control system to ignore the obstacles. 2. Alternatively, if this desired acceleration is combined with other desired accelerations such as the target attractor’s desired acceleration within a traditional operational space control framework as

 ¨q∗=argmin¨qα∥¨xdtarget−¨xeff∥2+β∥¨xobs−¨xi∥2, (36)

then we have the flexibility to choose and to make sure the obstacle controller gets the appropriate weight, but the desired acceleration is zero. Mixing in a zero desired acceleration, slows the system down resulting in sluggish behavior around obstacles. This an extreme case, but the same phenomenon occurs close to any obstacle or obstacle collection since the desired cumulative acceleration is directionally indiscriminate. Rather, we’d want a way to say that these contributions are important in directions pointing toward obstacles, but not in directions orthogonal to obstacles. In the above two obstacle situation, the desired zero acceleration should only pertain to accelerations in the direction toward the obstacles; accelerations within the 2D orthogonal plane can be anything. In other words, we need to construct an appropriate metric to accompany the combined obstacle control contribution.

Similarly, adding contributions from multiple obstacle points from the same direction will inflate the contribution, while averaging them down-weights important contributions. These simple techniques all result in unintuitive artifacts and require per-problem tuning to get good performance. We see below that constructing and combining obstacle controllers as RMPs results in an obstacle controller that’s intuitively similar, but handles these details correctly and combines more naturally and consistently with the rest of the system’s controllers.

Under the RMP framework, we design individual controllers myopically to control only specifically a small portion of the problem where we understand the geometry. Pullbacks through non-linear maps connecting these spaces then handle transforming the geometry to common spaces where their contributions can be combined through metric-weighed averaging. In this case, we can design individual controllers from each relevant obstacle point (e.g. the closest point on an obstacle) to control specifically the distance to that obstacle point. Let be the distance from point on the robot to the obstacle point. Then is a one-dimensional task space that we can build an RMP in. In this case, we parameterize it simply as with effectively playing the roll of the controller’s weight, increasing from zero outside a radius from the obstacle to a max value as the the distance decreases. is the desired acceleration within the distance variable as a function of the distance from the obstacle point and the velocity of in the direction of the obstacle (). Often, we choose

 a(u,˙u)=α(u)|u|−β(u)˙u, (37)

i.e. one term that pushes explicitly away from the obstacle, and another that damps motion in the direction of the obstacle.

The pullback of this RMP into (suppressing the dependence on to simplify notation) is insightful:

 (aj,γj)dj=(aj^vj, γj^vj^vTj), (38)

where is a unit vector pointing away from the obstacle point. In , this RMP is a vector field that pushes away from the obstacle with a strength that both damps motion toward the obstacle and explicitly pushes away from it, along with a metric

with a single eigenvector

and corresponding eigenvalue

defining the direction toward the obstacle as important for the RMP and orthogonal directions as irrelevant.

The combined obstacle controller at this body point is the metric weighted average of all of these

 (¨xi,Ai)=((∑jAij)−1∑jAj¨xij, ∑jAij), (39)

where and from Equation 38.

These collision controllers solve both of the problems outlined above. They carve out the directions in the workspace that matter to the collision controllers by tracking how the metrics transform, and prevent over- or under-contribution by performing a metric-weighted average of the contributing controllers. Note that we commonly add a small amount to the diagonal of the metric in Equation 39 in practice to behave nicely in reduced rank settings (too few obstacles), and drop terms that are zero. Alternatively, we can solve the linear system by taking the pseudoinverse, as described in Section 7.

Note that these controllers are agnostic to the choice of distance function . Alternative nonlinear distances, such as the electric potentials studied in , can also be used.

### V-C A new way to handle joint limits

RMPs give us an elegant new way to handle joint limits. Rather than operating in , we can define , where is a dimension-wise sigmoid, with dimension scaled and shifted appropriately so that the maps range matches the joint limits. This gives us an unconstrained space to operate in while ensuring that always remains within the joint limits.

For simplicity here, we assume . In practice, joint limits will require scaled and shifted versions of this, but the mathematics remains the same. Denote its Jacobian by . Note that this matrix is square, diagonal, and invertible for all finite .

Let denote any RMP on the configuration space. To regulate joint limits, we add a simple RMP on of the form . A simple choice of regulating policy is , i.e. larger accelerations in are penalized more heavily. By using as a metric, the strength of this penalty is governed by the non-linearities of . Since

 ¨u=D−1σ¨q   and   D−1σ=diag(1σ(1−σ)), (40)

accelerations closer to joint limits are increasingly inflated, and thus look bigger in .

To see how this new term affects our original RMP in , we can pull it back from to through the inverse sigmoid map to get

 (¨ud,λI)σ−1=(Dσ¨ud,λD−2σ) (41)

and add it to the original RMP in :

 (¨q′d,B′) =(¨qd,B)+(Dσ¨ud,λD−2σ) (42) .=(B+λD−2σ)−1(B¨qd+λD−2σDσ¨ud) (43) =Dσ(DσBDσ+λI)−1(DσB¨qd+λ¨ud), (44)

where we use the to denote that we’ve dropped the metric portion to focus on the algebra of the differential equation transformation. Note that if is an aggregation space used to aggregate together many pullback RMPs we have

 (¨qd,B) (45)

Plugging that expression into Equation 44 above, we get

 (¨q′d,B′).=Dσ(Dσ(∑iJTiAiJi)Dσ+λI)−1⋅ (46) (Dσ∑iJTiAi¨xdi+λ¨ud) =Dσ((∑i˜JTiAi˜Ji)+λI)−1(∑i˜JTiAi¨xdi+λ¨ud),

where . In other words, there are three augmentations to the differential equation:

1. Scale each Jacobian as . Entries of are close to zero when is close to a joint limit, so this scaling increasingly weighs down columns (all the way toward zero) near limits.

2. Add directly to the differential equation. If , this is a null step, but it otherwise it encourages the acceleration to include components that push away from limits.

3. Scale the final resulting acceleration by . This step shrinks joint accelerations for joints close to limits.

This technique is smooth, computationally efficient, and numerically stable. It generalizes common Jacobian-pseudoinverse weighting techniques described in the literature and works well in practice.

Alternatively, since RMPs are covariant to reparameterization (see Section IV-F), we can also pull the RMP in back into and implement the differential equation there. The resulting behavior is the same in the limit as Euler integration steps get small, but it may be numerically nicer to take finite-sized step through the unconstrained space of rather than the constrained space of . The resulting RMP is:

 (D−1σ¨qd, DσBDσ)+(¨ud,λI) .=(DσBDσ+λI)−1(DσB¨qd+λ¨ud). (47)

Note that this expression is the same as Equation 44 but without the transformation by . This relation comes from the identity and demonstrates the covariance of representation (and the independence of the calculations on ordering). Additionally, due to the linearity of the pullback operator, even when operating in , other RMPs defined in can be added by pulling them back into and combining via the metric weighted average without affecting the solution.

Finally, we note that in practice we operate in and directly calculate a directionally aware form of the joint limit Jacobian as

 ˜di=σi(αidi+(1−αi)1)+(1−σi)((1−αi)di+αi1), (48)

where is the joint limit sigmoid of joint (0 close to the lower limit and 1 close to the upper limit), and are the diagonal entries of and , respectively, and for is a sigmoidal value that is close to 1 for large positive velocities and close to 0 for large negative velocities. This Jacobian behaves like the sigmoidal Jacobian when moving toward a nearby joint limit, but behaves like the identity Jacobian when moving away from a joint limit. This Jacobian works extremely well in practice even under . It’s what we use in the experiments.

### V-D Adding anticipatory behaviors: Continuous optimization over a horizon

Above, we focused on purely local behavior, but it’s easy also to define controllers that account for a short horizon anticipation by leveraging ideas from optimal control and model predictive control. In this case, we define a controller as the mapping from a state to the first acceleration along the locally optimal trajectory starting from where optimality is defined in terms of some time-varying collection of cost functions.

Let denote that mapping from state to optimized action. This is a policy like those defined above. Moreover, we leverage the Riemannian Motion Optimization (RieMO) framework  to properly account for the Riemannian geometry of the non-linear manipulator and workspace similar to the RMP framework above. Under RieMO Gauss-Newton Hessian approximations are shown to be pullback metrics, so this Gauss-Newton Hessian approximation defines a Riemannian metric associated with the policy making the pair an RMP which can be combined with the above controllers like any other RMP.

In practice, evaluating is expensive, and even with warm-starts can only be evaluated at a rate of 10-20 times a second. We, therefore, use the typical optimal control technique of using the optimization to calculating linearizations of the problem that can be re-evaluated at a much faster rate. This results in a time-varying sequence of linear RMPs of the form that are communicated to the local controllers and combined using the RMP framework. These controllers are are local policies, but they do a better job of anticipating the future. This helps in coordinating target and orientation controllers with each other, generating more sophisticated and effective local obstacle avoidance behaviors, coordinating with the other arm, etc.

However, these controllers are still local since they optimize and track a trajectory over time. More global navigation behaviors such as the effective choice of homotopy class (e.g. left vs right around an obstacle) requires more global policies such as the heuristic technique described above or the learning technique we map out below.

The above controllers make it quite easy to navigate heuristically over long-ranges across the workspace. We need only lay down heuristic way-points (like breadcrumbs) for the end-effector to follow. As an example, we describe here a very simple, but effective approach for long-range navigation that requires zero planning when the robot’s elbow is not blocked.111This assumption is reasonable in practice. Human work environments are often engineered to keep elbows free of obstacles; humans often prefer that as well and get annoyed when someone or something gets in the way of their elbow. Working under elbow constraints is considered “awkward”. This technique is also specific to 7-dof robots with the common joint layout defining an axis aligned 3-dof wrist, an axis aligned 3-dof shoulder and a 1-dof elbow, but it demonstrates that strong local controllers make the problem of long-range navigation much easier lending support to the hypothesis that we can learn robust long-range navigation policies that learn to effectively exploit the innate behavior of the local controllers.

The technique is built on two observations. Let be some retracted configuration with the elbow back and the end-effector up and at ready close to the robot’s side. Then

1. Retracting from any outstretched configuration to , even when is surrounded by obstacles, is quite easy given the above controllers using a simple controller of the form .

2. Suppose is some collision-free approximate IK solution reaching to a point close to the target , and let be a series of points along the physical IK solution ordered by increasing distance from the robot’s base, with (e.g. the elbow, wrist, and end-effector points). Since the IK solution, itself, defines a physical collision-free pathway to the target , then an effective technique for reaching the target from is to simply follow the path outlined by points with the end-effector.

Taken in combination, this suggests we can define a heuristic policy for navigating from any configuration to any other configuration by 1. finding a collision-free IK solution, 2. retracting, and 3. moving forward to the target following heuristic way-point from the IK solution. This strategy is very robust in practice. Additionally, IK need not be solved exactly since the controller itself solves it in the final step. In practice, we often use just a handful of demonstrated configurations mapping out the free space.

This technique is specific to the elbow begin unblocked and 6 or 7-dof arms with a clear elbow structure, but the key observation is that a successful and efficient policy for long-range navigation can be quite course and heuristic when operating under effective local controllers efficiently handle local behavior.

1. Provide details on the continuous motion optimization.

2. Spacial consistency: adaptive gains and time-varying spacial predictions.

3. Combined RMP behaves as though the entire system of motion policies had been optimized in a single process, despite being distributed.

## Vii System implementation

A number of videos of a motion generation system developed using Riemannian motion policies and running on a number of different robots can be found at the website https://www.lularobotics.com/

## References

• Amari  Shun-Ichi Amari. Natural gradient works efficiently in learning. Neural Comput., 10(2):251–276, February 1998. ISSN 0899-7667.
• Amari and Nagaoka  Shun-Ichi Amari and Hiroshi Nagaoka. Methods of Information Geometry. American Mathematical Society, 1994. Translated 2000, and renewed 2007.
• Bagnell and Schneider  J. Andrew (Drew) Bagnell and Jeff Schneider. Covariant policy search. In Proceeding of the International Joint Conference on Artifical Intelligence, August 2003.
• Erez et al.  Tom Erez, Kendall Lowrey, Yuval Tassa, Vikash Kumar, Svetoslav Kolev, and Emanuel Todorov. An integrated system for real-time model-predictive control of humanoid robots. In IEEE/RAS International Conference on Humanoid Robots, 2013.
• Ijspeert et al.  A. J. Ijspeert, J. Nakanishi, H. Hoffmann, P. Pastor, and S. Schaal. Dynamical movement primitives: Learning attractor models for motor behaviors. Neural Computation, 25(2):328–373, Feb 2013. ISSN 0899-7667. doi: 10.1162/NECO˙a˙00393.
• LaValle  Steven M. LaValle. Planning Algorithms. Cambridge University Press, Cambridge, U.K., 2006. Available at http://planning.cs.uiuc.edu/.
• Mainprice et al.  Jim Mainprice, Nathan Ratliff, and Stefan Schaal. Warping the workspace geometry with electric potentials for motion optimization of manipulation tasks. 10 2016.
• Mukadam et al.  Mustafa Mukadam, Ching-An Cheng, Xinyan Yan, and Byron Boots. Approximately optimal continuous-time motion planning and control via probabilistic inference. In Proceedings of the 2017 IEEE Conference on Robotics and Automation (ICRA), 2017.
• Park et al.  Dae-Hyung Park, Heiko Hoffmann, Peter Pastor, and Stefan Schaal. Movement reproduction and obstacle avoidance with dynamic movement primitives and potential fields. In IEEE-RAS International Conference on Humanoid Robots, pages 91–98. IEEE, 2008.
• Ratliff et al.  Nathan Ratliff, Matthew Zucker, J. Andrew (Drew) Bagnell, and Siddhartha Srinivasa. CHOMP: Gradient optimization techniques for efficient motion planning. In IEEE International Conference on Robotics and Automation (ICRA), May 2009.
• Ratliff et al.  Nathan Ratliff, Marc Toussaint, and Stefan Schaal. Understanding the geometry of workspace obstacles in motion optimization. In IEEE International Conference on Robotics and Automation, 2015.
• Schulman et al.  John D. Schulman, Jonathan Ho, Alex Lee, Ibrahim Awwal, Henry Bradlow, and Pieter Abbeel. Finding locally optimal, collision-free trajectories with sequential convex optimization. In In the proceedings of Robotics: Science and Systems (RSS), 2013.
• Taylor  John R. Taylor. Classical Mechanics. University Science Books, 2005.
• Toussaint  Marc Toussaint. Robot trajectory optimization using approximate inference. In (ICML 2009), pages 1049–1056. ACM, 2009. ISBN 978-1-60558-516-1.
• Toussaint  Marc Toussaint. Newton methods for k-order Markov constrained motion problems. CoRR, abs/1407.0414, 2014. URL http://arxiv.org/abs/1407.0414.
• Toussaint and Goerick  Marc Toussaint and Christian Goerick. A Bayesian view on motor control and planning. In Olivier Sigaud and Jan Peters, editors, From motor to interaction learning in robots. Springer, 2010.
• Udwadia and Kalaba  F.E. Udwadia and R.E. Kalaba. Analytical Dynamics: A New Approach. Cambridge University Press, 1996. ISBN 9780521482172.