Fast Joint Space Model-Predictive Control for Reactive Manipulation

04/28/2021 ∙ by Mohak Bhardwaj, et al. ∙ 0

Sampling-based model predictive control (MPC) is a promising tool for feedback control of robots with complex and non-smooth dynamics and cost functions. The computationally demanding nature of sampling-based MPC algorithms is a key bottleneck in their application to high-dimensional robotic manipulation problems. Previous methods have addressed this issue by running MPC in the task space while relying on a low-level operational space controller for joint control. However, by not using the joint space of the robot in the MPC formulation, existing methods cannot directly account for non-task space related constraints such as avoiding joint limits, singular configurations, and link collisions. In this paper, we develop a joint space sampling-based MPC for manipulators that can be efficiently parallelized using GPUs. Our approach can handle task and joint space constraints while taking less than 0.02 seconds (50Hz) to compute the next control command. Further, our method can integrate perception into the control problem by utilizing learned cost functions from raw sensor data. We validate our approach by deploying it on a Franka Panda robot for a variety of common manipulation tasks. We study the effect of different cost formulations and MPC parameters on the synthesized behavior and provide key insights that pave the way for the application of sampling-based MPC for manipulators in a principled manner. Videos of experiments can be found at:



There are no comments yet.


page 1

page 9

Code Repositories


Stochastic Tensor Optimization for Robot Motion - A GPU Robot Motion Toolkit

view repo
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

Real-world robot manipulation can greatly benefit from real-time perception-driven feedback control [4, 21]. Consider an industrial robot tasked with stacking boxes from one pallet to another or a robot bartender moving drinks placed on a tray [28]. In both cases, the robot must ensure object stability via perception while respecting joint limits, avoiding singular configurations and collisions, and handling task constraints such as maintaining orientation during transfer. This leads to a complex control problem with competing objectives as well as nonlinear and sometimes discontinuous dynamics that is difficult to solve in real-time.

Classic approaches to solving these tasks include operational-space control (OSC) [31, 5, 10], where the different task costs are formulated in their respective spaces and then projected into the joint space (i.e., the control space of the robot) via a Jacobian map. OSC methods are very local as they only optimize for the next time step and do not consider future actions or states while optimizing for the current action.

Recent approaches based on model predictive control (MPC) attempt to find a locally-optimal policy over a finite horizon starting from the current state using a potentially imperfect dynamics model. An action from the policy is executed on the system and the optimization is performed again from the resulting next state which can overcome the effects of model-bias. Model predictive control has been successfully applied on real robotic systems allowing them to rapidly react to changes in environment [2, 11, 35, 5, 44].

Model predictive control can be performed either by optimizing over states with a constraint on the dynamics function to enforce feasibility (“direct”) or by optimizing over the control space (“indirect” or “shooting”) with states computed by rolling out the dynamics. Direct methods are hard to optimize, especially when the dynamics is nonlinear or discontinuous as the constraints on the dynamics become more difficult to satisfy. With shooting methods, dynamics feasibility is always guaranteed as the states are obtained by rolling out dynamics from control. Additionally, for most real world tasks, the optimization problem can be set up as an unconstrained problem that allows for very fast solutions enabling real-time model predictive control on real robotic systems with high degrees of freedom 

[39, 45].

Shooting-based MPC methods can be solved with either a gradient-based  [26] or a sampling-based optimization scheme [45]. Gradient-based methods require the cost and the dynamics function to be differentiable for optimization which severely restricts the class of functions they can handle and makes them highly susceptible to local minima, often requiring extensive expertise in both modelling and tuning for desired behavior. These methods are also computationally expensive as derivatives are often computed using finite differences on real systems. These key issues must be resolved to effectively employ MPC algorithms in manipulation problems.

In this work, we tackle the problem of developing a general framework for perception driven model-predictive control that offers the flexibility of using generic cost and dynamics functions while simultaneously ensuring desirable properties such as smoothness, constraint satisfaction, and computational speed. To this end, we focus on sampling-based MPC algorithms that use simple policies to sample control sequences followed by rolling out the dynamics function to compute a sample-based gradient of the objective function used to update the sampling policy. These algorithms make no restrictive assumptions about the cost, dynamics or policy class, are straightforward to parallelize and can be effectively applied in highly dynamic systems and environments [44, 42]

. Using complex dynamics and cost functions makes each rollout computationally expensive and this can also increase the number of rollouts required to generate a good solution. In literature these problems have been tackled by either learning a neural-network model that is fast to simulate 

[46], learning a terminal cost function to reduce the planning horizon [3], or using an augmented control space such as the operational (task) space of a manipulator [7]. In this work, we make the case for using simple models coupled with simple behavior-based cost functions and fast parallel computation to achieve strong performance on manipulation tasks in complex environments. We would like to note that our approach can easily incorporate learned models or cost functions and does not suffer from errors induced by ignoring the full joint space of the robot in the optimization. In summary, our key contributions are:

  1. A framework for fast joint space sampling-based MPC on high degree-of-freedom manipulators that optimizes task objectives while satisfying task and joint-space constraints.

  2. Key insights into different components of sampling-based MPC that enable smooth, reactive and precise robot motions.

  3. Coupling perception with control by leveraging learned discrete/discontinuous cost terms from raw sensor data which we validate by leveraging an existing learned function for collision checking [8] between the robot links and the environment point cloud.

  4. A highly parallelized implementation which achieves a control rate of 50Hz111Without the learned collision checking function, we achieve a control rate of 80hz with 500 particles and 50 timesteps. on a single GPU, a speedup of 50x compared to existing MPPI based manipulation [8].

  5. Empirical evaluation in simulation and a real-world Franka Panda robot.

We formally define the problem in Sec. II and build our approach in Sec. III, followed by our real time implementation in Sec. IV. We report experimental results in Sec. V and discuss our approach’s strengths and limitations in Sec. VI. We then share our method’s relation to previous work in Sec. VII.

Ii Problem Definition

We consider the problem of generating a feedback control law (or policy) for a robot manipulator with

joints performing user-specified tasks in an unstructured environment where it is subject to non-linear constraints and must react in real-time to overcome errors due to its internal dynamics model, state estimation, and perception.

The problem can be modelled as optimal control of a discrete-time stochastic dynamical system described by the equation,



defines the probability of transitioning into state

conditioned on and control input . The robot chooses controls using a deterministic or stochastic closed-loop policy , incurs an instantaneous cost and transitions into the next state, and the process continues for a finite horizon . The goal is to design a policy that optimizes a task-specific objective function



is the space of all possible policies. The above setup is akin to optimizing a finite horizon Markov Decision Process (MDP) with continuous state and action spaces as done in reinforcement learning approaches 


Solving for a complex globally optimal policy is a hard problem especially since the task objective could be sparse or difficult to optimize. MPC can be viewed as a practically motivated strategy that simplifies the overall problem by focusing only on the states that the robot encounters online during execution and rapidly re-calculating a “simple” locally optimal policy. At state , MPC performs a look-ahead for a finite horizon using an approximate model , approximate cost function and parameterized policy to find the optimal parameters that minimize an objective function


where is a terminal cost function that serves as a coarse approximation of the cost-to-go beyond the horizon. An action is sampled from and the optimization is performed again from the resulting state after applying the action to the robot. The optimization is hot-started from the solution at the previous timestep by using a shift operator, which allows MPC to produce good solutions with few iterations of optimization (usually 1 in practice).

In the next section, we first introduce the sampling-based MPC technique to solving the optimization in Eq. 3 and discuss the objective function, policy classes, and update equations. We then present our approach for applying it to manipulation problems. An overview of the notation used in the paper are presented in Table I.

Variable Description
joint position, velocity, acceleration
robot state at time
joint space command at time 
number of steps in horizon 
change in time between steps in horizon
vector of 
parameters of policy at time t.
distribution over control given state
sequence of H Gaussian means
sequence of H Gaussian covariances
batch of N control sequences of length H
batch of N state sequences of length H
batch of N cost sequences of length H

MPC loss function

TABLE I: Summary of Notations

Iii Sampling-Based Model Predictive Control

Sampling based MPC can be viewed as iteratively optimizing simple policy representations such as time-independant Gaussians over open-loop controls with parameters such that . Here, represent the sequence of means and covariances at every step along the horizon. At every iteration of optimization, a batch of control sequences of length , , are sampled from the current distribution, followed by rolling out the approximate dynamics function using the sampled control to get a batch of corresponding states and costs . The policy parameters are then updated using a sample-based gradient of the objective function. After optimization iterations we can either sample an action from the resulting distribution or execute the mean action. A standard algorithm is shown in Alg. 1. We next describe how the distribution is updated. Consider the function ,


where is a discount factor that is used to favor immediate rewards. A widely used objective function is the exponentiated utility or the risk-seeking objective,


where is a temperature parameter. The mean and covariance are updated using a sample-based gradient of the above objective as,


where and are step-sizes that regularize the current solution to be close to the previous one and


The update equation for mean is the same as the well-known Model-Predictive Path Integral Control (MPPI) algorithm [46]. We refer the reader to [43] for the connection and derivation of update equations. While, covariance adaptation has previously been explored in the context of Path Integral Policy Improvement [36] to automatically adjust exploration, standard implementations of MPPI on real systems generally do not update the covariance [46, 43]. However, we found that updating the covariance leads to better performance with a fewer number of particles, such as stable behavior upon convergence to the goal. Once an action from the updated distribution is executed on the system, the mean and covariance are shifted forward with default values appended at the end to hotstart the optimization at the next timestep.

The above formulation of MPC offers us the flexibility to extract different behaviors from our algorithm by tuning different knobs such as the choice of approximate dynamics, running cost, terminal cost, the policy class and parameters such as the horizon length, number of particles, step sizes and discount factor . Next, we switch our focus to the domain of robot manipulation and build our approach for sampling-based MPC by systematically evaluating the effects of a subset of key design choices on the overall performance of the controller.

Input : Initial sampling policy parameters
Parameter : Horizon(H), Number of samples (N), Number of optimization iterations (K), algorithm specific parameters
1 for  do
2       get_state()
3       shift()
4       for  do
5             sample_controls()
6             generate_rollouts()
7             update_distribution( )
9       end for
10       = next_command()
11       execute_command()
13 end for
Algorithm 1 Sampling-Based MPC

Iii-a Approximate Model

The MPC paradigm allows us to effectively leverage simple models that are both computationally cheap to query and easy to optimize, as re-optimizing the control policy at every timestep can help to overcome the effects of errors in the approximate model. We leverage this error correcting property of MPC and utilize the kinematic model of the manipulator as our approximate transition model. Let the robot state be defined in the joint space as  and the commanded action be the joint acceleration . At every optimization iteration, we compute the state of the robot across the horizon (, , ) by integrating the sampled control sequences  from the robot’s initial state

 (i.e., current state of the real robot). This can be efficiently implemented as a tensor product followed by a sum as,


where  is a lower triangular matrix filled with 1, and  is a vector of delta timesteps across the horizon. We use variable timesteps, with a smaller dt for the earlier steps along the horizon for higher resolution cost near the robot’s current state and larger ones for later steps to get a better approximation of cost-to-go, similar to [11]. We intentionally write  as  to highlight the fact that we compute the states across the batch and horizon with a tensor operation without the need to iteratively compute the states across the horizon.

Given a batch of states , the Cartesian poses , velocities , and accelerations  of the end-effector are obtained by using forward kinematics , the kinematic Jacobian , and the derivative of the kinematic Jacobian function  respectively.


We can compute the forward kinematics in a batch as it depends only on the current state. This provides us an easily parallelizable formulation of the robot model that is amenable to GPU acceleration.

Iii-B Cost Function

The cost function allows us to encode high-level robot behavior directly into the MPC optimization. This can be viewed as a form of cost-shaping that allows MPC to achieve sparse task objectives while also satisfying auxillary requirements such as avoiding joint limits, ensuring smooth motions, safety and maximizing manipulability during execution. Further, sampling-based MPC approaches do not rely on the gradient of the cost function and can thus incorporate general cost terms that are not constrained to be smooth or differentiable.

We consider cost functions that are a weighted sum of simple cost terms, where each individual term encodes a desired behavior for manipulator control that can be easily implemented in a batched manner.

Iii-B1 Reaching Goal Poses

Given the desired and current Cartesian poses for the end-effector   respectively in the world frame , we compute a task-space cost term that penalizes the distance between the two


where  are the rotation and translation part of the goal pose , and  are the rotation and translation part of the current end-effector pose. The weight vectors  allow us to weigh errors in different directions and orientations with respect to each other and can be set to different values to obtain qualitatively different behavior such as partial pose reaching or enforcing partial pose constraints. For example, we can set a high weight in  along a desired axes to maintain the goal orientation throughout the motion of the robot as we demonstrate in our experiments.

Iii-B2 Stopping for Contingencies

The finite horizon of MPC makes it myopic to events that can occur far out in the future, especially in dynamic environments. Thus, it is desirable to ensure that the robot can safely stop within the horizon in reaction to events that might be observed at timestep . We encode this behavior by computing a time varying velocity limit  for every timestep in the horizon based on the maximum acceleration of the robot  (or a user-specifed maximum acceleration) and the time available until . This means the joint velocity of the robot must allow it to come to a stop at the end of the horizon by applying the max acceleration. Any state that exceeds this velocity is penalized by a cost which is expressed as


where  is an upper triangular matrix filled with 1.

Iii-B3 Joint Limit Avoidance

Given minimum and maximum limits on joint state  respectively, we penalize the joint state  only if it exceeds a safety threshold defined by a  ratio of its full range.


where , adds a safety threshold from the actual bounds of the robot. In our experiments, we chose .

Iii-B4 Avoiding Cartesian Local Minima

The manipulability score describes the ability of the end-effector to achieve any arbitrary velocity from a given joint configuration. It measures the volume of the ellipsoid formed by the kinematic Jacobian which collapses to zero at singular configurations. Thus, to encourage the robot to optimize control policies that avoid future kinematic singularities, we employ a cost term that penalizes small manipulability scores [23, 41]


where we choose  based on values obtained by Haviland and Corke [14] for the Franka Panda robot.

Iii-B5 Self Collision Avoidance

Computing self-collision between the links of the robot has previously shown to be computationally expensive, especially when we want to compute for many joint configurations [34, 8]. Hence, similar to previous approaches [34, 8], we train a neural network that predicts the closest distance 222Distance is positive when two links are penetrating and negative when not colliding. between the links of the robot given a joint configuration (). One difference in our approach is the use of positional encoding (i.e.,) as we found this to improve the accuracy of the distance prediction [29]. Our network (which we call jointNERF) has three layers with neurons and ReLU activations. We compute a cost term as shown below,


Iii-B6 Environment Collision Avoidance

Safe operation in unstructured environments requires a tight coupling between perception and control for collision avoidance. General gradient-based trajectory optimization and MPC approaches [50] rely on either known object shapes or pre-computed signed distance fields that provide gradient information for optimization. However, our sampling-based approach can handle discrete costs and as such we explore collision avoidance without using signed distance. We specifically use a learned collision checking function from Danielczuk et al. [8]

that operates directly on raw pointcloud data. The method classifies if an object pointcloud 

is in collision or not with the pointcloud  given the object’s pose . The cost can be written as,


Finally our running cost function is given by


In our experiments we chose large values for and to enforce joint and collision avoidance constraints. For pose reaching, was used with smaller value on orientation as it only needs to be maintained at the goal. For enforcing orientation constraints while reaching goals, however, was chosen.

Fig. 2: We compare our sampling scheme on a planar robot reacher task [24]. The holonomic robot must move from its initial position (green cross) to the desired position (red cross) while avoiding obstacles shown as the grey regions. The path taken by the robot is given by the black dot-dashed line and red circles are positions of the robot after every 30 timesteps. The blue lines are the rolled out trajectories of the top 5 particles. Our Halton B-Spline is able to find a smooth optimal path to the goal while Random B-Spline takes a longer path. Halton with a comb filter is not smooth as shown by the sudden path changes.

Iii-C Sampling Strategy for Control Sequences

The method used for sampling controls from the Gaussian policy can have a great impact on the convergence of the optimization and can help embed different desired behaviors such as always ensuring smooth motions. Pseudorandom sequences typically used in Monte Carlo integration exhibit an undesirable clustering of sampled points which results in empty regions. Whereas low-discrepancy sequences, where low discrepancy refers to the degree of deviation from perfect uniform sampling, alleviate this problem by defining deterministic samplers that correlate each point to avoid groupings [32]. Halton sequences [13] are a widely used form of low discrepancy number generators that attempt to improve the rate of convergence of Monte Carlo algorithms. In particular, the Halton sequence uses the first prime numbers to define a sequence, , for integers and where and , with for . It has been reported that Halton sequences, together with other low-discrepancy sequences, achieve superior performance in high-dimensional problems as compared to standard MC [9]333See
-quasirandom-sequences/ for a visualization of different low-discrepancy methods.
Thus, we leverage Halton sequences for sampling controls that can provide a better estimate of the objective function gradient. Controls from the Halton sequence are sampled once at the beginning and then transformed using the mean () and covariance of the current Gaussian policy.

Further, we explore two different strategies for enforcing smoothness in sampled control sequences. The first method is a comb filter that uses user-specified coefficients to filter out each sampled control trajectory along the horizon as,


This method has previously been used with sampling-based control techniques [47, 37], however, it requires extensive hand tuning and the filtered trajectories are not guaranteed to be smooth as neighboring action samples in the horizon can have large difference in magnitude.

We propose an alternate strategy to enforce smoothness by fitting B-splines of degree to controls sampled using a Halton Sequence. The resulting curve is sub-sampled at a finer resolution to obtain smooth joint acceleration samples which are then integrated to obtain corresponding joint velocity and position trajectories using Eqs. (12-14). In Fig. 2

we show a qualitative comparison between using different combinations of sampling and smoothing strategies for a planar robot trying to reach a goal in an obstacle filled environment. We use a small number of particles (200), a horizon of 30 steps and run the controller for 100 time steps with replanning after every step. From the figure we can see that pseudorandom samples with a comb-filter are unable to fill the control space adequately to escape the local minimum. Halton samples consistently succeed, however the resulting trajectory with B-spline interpolation smoothly goes around obstacles as compared to

comb-filtering. Although, pseudorandom sampling with B-Splines is able to reach the goal, the non-uniformity in sampling does not explore the space and produces a longer, sub-optimal path.

Our sampling strategy also offers us the flexibility of incorporating certain fixed set of action trajectories which could be task-dependent or even a library of pre-defined desired motions. We leverage this fact by incorporating a set of zero acceleration or ”null” particles which allow the robot to easily stop at the goal as demonstrated in our experiments.

Iii-D Covariance Parameterization

Conventionally, sampling-based MPC algorithms such as MPPI parameterize the covariance of the Gaussian policy to be of the form where is a scalar value and is a identity matrix, which forces the covariance to be the same across all control dimensions. However, in the case of manipulation problems it is desirable to allow the covariance of different joints to adapt independently so they can possibly optimize different cost terms such as pose reaching versus increasing manipulability. Thus we also consider covariance of the form where . Each term in is then adapted based on the rollouts. Adapting the covariance along action dimensions has also been employed by [6] in the context of Cross-Entropy Method (CEM).

In the next section, we discuss the implementation details of our real-time control system.

Iv Real-Time Control Implementation

We implemented our MPC pipeline using PyTorch 


with manipulator forward kinematics adapted from the open-source implementation provided by Sutanta 

et al. [38] and all cost terms and update equations implemented in a batched fashion. Further, multiprocessing is used to run MPC in a seperate process to avoid latency issues. Table II shows a timing comparison of our system running on a Titan RTX GPU against leading manipulator control approaches in literature.

Fig. 3: The compute graph shows the flow of information between the different components in our approach.
Fig. 4: We compare the control latency (time taken to complete one iteration of optimization) of our proposed robot model “gpu-tensor” and our model with learned collision checking “gpu-tensor-coll’ against a serial implementation of the model on the CPU(“cpu-baseline”) using 8 threads and also on the GPU(“gpu-baseline”). The baseline models don’t use the matrix multiplication from Sec. III-A for integration. The red dotted line illustrates the 25ms latency bar which we want to meet for reactive behaviour of the robot. From the right figure, our method without collision checking maintains a steady latency of 12.5ms across different horizon steps, highlighting the parallel nature of our robot model.
Method Latency (ms) Horizon
OSC [10, 5, 14, 34] 1-17 1
Motion Planning [40, 22, 1] 140-1000 N/A
Custom Chip Motion Planning [30] 1 N/A
Gradient MPC [12, 11, 18] 20 - 140 16
Sampling MPC [8] 100 40
Ours 25 30
TABLE II: Control Latency of methods that can handle collision avoidance in high dimensional systems is tabulated here. A more thorough description is available in Sec. VII.

Iv-a Franka Panda Control System

The desired acceleration command from MPC is evaulated at 100Hz and integrated forward to obtain desired joint position () and joint velocity commands respectively. These commands are sent to a custom low-level torque controller that computes desired torque commands at 1000Hz to control the Franka robot,


where and are the inertia and coriolis force matrices respectively provided by and are gains for the position and velocity errors respectively. Fig. 3 shows the overall architecture of the control system.

We found the noise in the joint state read from , especially in joint velocities, to be prohibitive for precise control with MPC. In order to circumvent this issue we implemented a joint state filter that first predicts the state based on the previous commanded joint acceleration and then uses an exponential moving average filter to incorporate sensor readings.

Iv-B Perception

We use an Intel Realsense D455 Depth Camera placed at a fixed location in the workspace with a known camera-to-robot transform to obtain depth data in the form of point clouds. The raw point cloud is filtered to remove all the points lying inside the robot body to obtain the scene point cloud. The robot URDF is used to sample points along the robot body to create a robot point cloud. Both these point clouds are fed as input to SceneCollisionNet [8] for computing the collision cost in Eq. 23.

V Results

We setup our experiments to study the efficacy of our proposed MPC framework on a real Franka Panda robot in the following manipulation scenarios: (1) Accuracy and smoothness of the controller in reaching Cartesian poses, (2) Reacting to changing goals observed from perceptual data, (3) Handling task constraints in the form of orientation constraints [28], (4) Moving around obstacles to reach a target pose avoiding local minima. In all our real-world experiments we use Halton sequences with B-Spline interpolation and update all diagonal entries of the covariance. We provide qualitative ablations in simulation for different components of our framework such as cost terms, sampling strategy and policy parameterization on our website (

V-a Reaching Cartesian Poses

Fig. 5: We show the robot trying to reach a very hard orientation across three cycles of the pose reaching experiment Sec. V-A. Our manipulability cost (Sec. III-B4) acts as a regularizer and pushes the robot to be in a good manipulability region over time and this enables the robot to reach the orientation accurately ( orientation error) in the third cycle (right image).
Fig. 6: Time response for reaching the 6 Cartesion poses with new goals poses given every 12 seconds. (Top) Position of the end-effector where the robot goes close to the desired position very quickly (seconds). (Middle) Here we can see that there is no position overshoot. (Bottom) We can see slight overshoot at 62 second mark, but this is MPC trying to compensate for the inaccuracies in the robot model as it tries to reach the extremely hard pose shown in Fig. 5.

We select a sequence of six Cartesian poses for the robot to reach with a next target provided after every 12 seconds. This sequence is run three times in a loop. Some of the poses come from [14] where the authors demonstrate these poses to be very hard to reach with a standard OSC. One of the poses has a very hard change in the end-effector’s orientation as shown in Fig. 5.

By leveraging simple kinematic models coupled with fast re-optimization our MPC approach is able to achieve a median position error of 0.6cm and orientation error of , where we compute the orientation error % as difference between the desired and reached quaternion, orientation error%=. Further, we also show the time response in Fig. 6 for a window of the experiment. Here we observe that the robot reaches the position well ahead of 12 seconds. The robot does not oscillate after reaching the pose demonstrating stability of the resulting controller. This can be attributed to our used of explicit zero acceleration particles as part of our sampling strategy, as mentioned in III-C allowing our controller to latch onto these zero particles when near the goal. The accompanying video also qualitatively demonstrates the smoothness of motions optimized by the MPC algorithm using our Halton sequence with B-Splines sampling strategy.

V-B Reacting to Perception

A ball held by a human is tracked using a depth camera via a point cloud color filter and the robot tries to reach the ball while the human moves it to different positions in the tabletop workspace. In our experiments, we found the robot to react to changes in the ball position with no sluggish behavior as seen in the attached supplementary video. We show a plot of the ball’s trajectory and the robot end-effector’s trajectory in Fig. 7. During initial experiments, we found ourselves helping the robot reach the ball by adjusting the ball once the robot was close, which is a promising future direction for our work towards human robot interaction [12].

Fig. 7: The observed position of the ball is shown by the dotted lines, labeled , , and for , , and axes respectively. In this experiment, the ball was moved very rapidly to stress-test if the our approximate model can sufficiently track the target given our 25ms control latency. The lag seen in the end-effector position with respect to the observed ball position reflects the latency in control scheme and inaccuracy in the approximate model.
Fig. 8: We move the ball across the workspace while having a high weight on maintaining a specific orientation of the end-effector. Our control scheme can maintain the orientation during motion as seen by the very low orientation error (). The position trajectory in the top plot shows some regions where the robot cannot reach as it would violate the orientation constraint.
Fig. 9: We show two sequences from our collision avoidance experiment where the robot has to move between thin walls to reach the orange ball held by the human. In the top sequence, the robot tries to reach the ball but moves only as close as possible as any further motion would cause the elbow to hit the right book. In the bottom sequence, the robot moves around the workspace with a stretched posture to avoid the two books in order to reach the ball. Videos of the experiments are provided in the supplementary material.

V-C Handling Task Constraints

We test the ability of our framework to satisfy multiple task objectives by imposing orientation constraints on the end-effector while tracking a moving ball detected from point cloud data. The orientation constraint is a soft-constraint imposed by setting to a high value in the pose cost. Our controller achieves a median quaternion error of while also tracking the ball with sufficient accuracy as shown in Fig. 8. We refer the reader to the accompanying video for a qualitative evaluation.

V-D Collision Avoidance

To test our method’s capability to avoid obstacles, we setup two different table top environments, one consisting of two blocks representing common pick and place environments as shown in Fig. 1 and an environment with think walls to represent an heavily occupied space, as shown in Fig. 9. We again use our perception based ball tracker to make the robot reach different positions in the environment. During the experiment, we move the ball to some positions that are not reachable by the robot due to possibility of collision between the robot’s links and the obstacles. As seen in Fig. 9, the robot handles these situations very well as it prioritizes collision avoidance over reaching the pose accurately. We also conduct several collision avoidance experiments in simulation which are attached as a video in the supplementary material.

Our experimental results indicate that accurate and reactive manipulation behavior can be obtained by using relatively simple models and intuitive cost functions in an MPC framework. Sampling-based MPC also provides us flexibility to encode different desired behaviors such as smooth motions directly in the optimization and tightly couple perception with control which are key components for real-world robot control.

Vi Discussion

We presented a sampling-based stochastic MPC framework for manipulation that operates in the joint space and is able achieve smooth, reactive, and precise motions while respecting joint and task space constraints. The first key component of our approach is a fully tensorized kinematic robot models that allows for GPU-based acceleration of rollouts. Second, we leverage intuitive cost terms that encourage different desirable behaviors. For example, the stop cost on velocity prevents the controller from suddenly ramping up the acceleration. Third, our formulation allows us to leverage different sampling strategies to embed desirable properties directly into the optimization. These components allowed us to obtain similar results as full motion planners based on stochastic optimization [19, 24] in a reactive MPC setting. However, there a few key areas to be explored in the future. Inference on the learned collision checking model we used is slow, which increases latency as compared to simple pose-reaching. Furthermore, as we increase the speed of the robot, the kinematic model might not hold leading to significant model-bias. Here, learning a residual dynamics model [48] or a terminal Q-function [3] can help mitigate the effects of model-bias while still maintaining computational speed.

Vii Related Work

Perception driven feedback control on high dimensional systems is a large field of research with several existing approaches [20]. Operation Space Controllers (OSC) are some of the fastest algorithms available for feedback control, with methods achieving control latency of 1-2 ms [10, 5, 14, 34]. However OSC methods rely heavily on a higher level planner for avoiding local minima (e.g.,obstacles).

A more global approach has been explored by reformulating standard motion planning methods to do feedback control via online replanning [27, 40, 22, 1, 30]. However most of these methods run at a slow rates on high dimensional systems, with control latencies between 140ms and 1000ms [27, 40, 22, 1]. Murray et al. [30] researched leveraging a custom chip to do fast parallel collision checking and use this with a PRM style planner to replan at 1ms for reaching Cartesian Poses in a semi-structured environment. Their chip based collision checker uses a complete pointcloud of the environment obtained by placing many cameras in the environment and combining their pointclouds. This is an highly unrealistic setting for real world manipulation in unstructured environments.

In the realm of Model predictive control approaches, only gradient based joint space MPC methods have shown to work on real manipulation systems as their control latency is in an acceptable range (20ms - 125ms ) for feedback control [12, 11, 18]. Ishihara et al. [18] explore two stage hierarchical ILQR for fast MPC on a humanoid robot. Their two stage approach enables a very low control latency of 20ms. Erez et al. [11] explore ILQR on humanoid robots leveraging a simulator for the dynamics model. Fishman et al. [12] use gradient based MPC for finding trajectories for a manipulator while simultaneously predicting user intent in a human robot interaction setting. They solve the optimization problem leveraging Levenberg-Marquardt at a control latency of 140ms. Hogan and Rodriguez [15] explore gradient based MPC in the task space for planar non-prehensile manipulation leveraging a learned mode switcher to switch between different dynamic models. They are able to obtain a control latency of 5ms as their dynamic models are smooth.

Sampling-based methods have a rich history in MPC. Model-Predictive Path Integral Control (MPPI) [44] is one of the leading sampling-based MPC approaches that has shown great performance on real-world aggressive off-road autonomous driving by leveraging learned models [46] and GPU acceleration [45]. Wagener et al. [42] analyze MPC algorithms from the perspective of online learning and show connections between different methods such as Cross-Entropy method (CEM) and MPPI and have also demonstrated control rates of 40Hz with 1200 samples and a horizon of 2.5 seconds for off-road driving using GPU acceleration.

However, in the context of manipulation, sampling-based control in joint space has only been explored by optimizing in the joint position space without considering velocity and acceleration limits [8, 17, 16]. Danielczuk et al. [8] learn a collision classifier and do online replanning in joint space leveraging an inverse kinematic function to get a goal joint configuration and find straight line paths in joint space to reach the goal while avoiding collisions. Their approach doesn’t handle different task spaces directly in the form of cost functions and also has a much larger control latency of 1000 ms. Hyatt et al. [17, 16] compare sampling based MPC with gradient based MPC on large dimensional robots with piecewise linear functions. They show that sampling based MPC can run at 200Hz (5ms) even with large number of dimensions due to it’s parallelizability on the GPU. However, they do not explore collision avoidance or task space cost terms in their approach and leave it for future work.

Sampling based optimization has also been used for motion planning in high dimensional systems [19, 24, 25]. Kalakrishnan et al. [19]

formulate planning as a stochastic trajectory optimization problem and plan over the joint position space to reach Cartesian positions while orientation constraints on the end-effector. They structure their co-variance matrix based on the finite difference matrix to sample delta joint positions that start and stop at 0 with a smooth profile in between. This sampling along with projecting the weighted action through this matrix, pushes the optimization towards a smooth trajectory for execution on the real robot. Kobilarov 

[24, 25] explored using cross-entropy optimization for motion planning and showed global convergence in very tight environments on quadrotors and simple planar environments.


  • Alwala and Mukadam [2020] Kalyan Vasudev Alwala and Mustafa Mukadam. Joint sampling and trajectory optimization over graphs for online motion planning. arXiv preprint arXiv:2011.07171, 2020.
  • Bangura and Mahony [2014] Moses Bangura and Robert Mahony. Real-time model predictive control for quadrotors. IFAC Proceedings Volumes, 47(3):11773–11780, 2014.
  • Bhardwaj et al. [2020] Mohak Bhardwaj, Ankur Handa, Dieter Fox, and Byron Boots. Information theoretic model predictive q-learning. In Learning for Dynamics and Control, pages 840–850. PMLR, 2020.
  • Bohg et al. [2017] Jeannette Bohg, Karol Hausman, Bharath Sankaran, Oliver Brock, Danica Kragic, Stefan Schaal, and Gaurav S Sukhatme. Interactive perception: Leveraging action in perception and perception in action. IEEE Transactions on Robotics, 33(6):1273–1291, 2017.
  • Cheng et al. [2018] Ching-An Cheng, Mustafa Mukadam, Jan Issac, Stan Birchfield, Dieter Fox, Byron Boots, and Nathan Ratliff. Rmpflow: A computational graph for automatic motion policy generation. In International Workshop on the Algorithmic Foundations of Robotics, pages 441–457. Springer, 2018.
  • Chua et al. [2018] Kurtland Chua, Roberto Calandra, Rowan McAllister, and Sergey Levine. Deep reinforcement learning in a handful of trials using probabilistic dynamics models. arXiv preprint arXiv:1805.12114, 2018.
  • Cong et al. [2020] Lin Cong, Michael Görner, Philipp Ruppel, Hongzhuo Liang, Norman Hendrich, and Jianwei Zhang. Self-adapting recurrent models for object pushing from learning in simulation. arXiv preprint arXiv:2007.13421, 2020.
  • Danielczuk et al. [2020] Michael Danielczuk, Arsalan Mousavian, Clemens Eppner, and Dieter Fox. Object rearrangement using learned implicit collision functions. arXiv preprint arXiv:2011.10726, 2020.
  • Dick et al. [2013] J. Dick, F. Y. Kuo, and I. H. Sloan. High-dimensional integration: The quasi-monte carlo way. Acta Numerica, 22:133–288, 2013.
  • Dietrich et al. [2012] Alexander Dietrich, Thomas Wimbock, Alin Albu-Schaffer, and Gerd Hirzinger. Reactive whole-body control: Dynamic mobile manipulation using a large number of actuated degrees of freedom. IEEE Robotics & Automation Magazine, 19(2):20–33, 2012.
  • Erez et al. [2013] T. Erez, K. Lowrey, Y. Tassa, V. Kumar, S. Kolev, and E. Todorov. An integrated system for real-time model predictive control of humanoid robots. In 2013 13th IEEE-RAS International Conference on Humanoid Robots (Humanoids), pages 292–299, 2013. doi: 10.1109/HUMANOIDS.2013.7029990.
  • Fishman et al. [2019] Adam Fishman, Chris Paxton, Wei Yang, Dieter Fox, Byron Boots, and Nathan Ratliff. Collaborative Behavior Models for Optimized Human-Robot Teamwork. arXiv e-prints, art. arXiv:1910.04339, October 2019.
  • Halton and Smith [1964] J. H. Halton and G. B. Smith. Algorithm 247: Radical-inverse quasi-random point sequence. Communications ACM, 7(12):701–702, 1964.
  • Haviland and Corke [2020] Jesse Haviland and Peter Corke. A purely-reactive manipulability-maximising motion controller. arXiv e-prints, pages arXiv–2002, 2020.
  • Hogan and Rodriguez [2020] Francois R Hogan and Alberto Rodriguez. Reactive planar non-prehensile manipulation with hybrid model predictive control. The International Journal of Robotics Research, 39(7):755–773, 2020.
  • Hyatt and Killpack [2020] Phillip Hyatt and Marc D Killpack. Real-time nonlinear model predictive control of robots using a graphics processing unit. IEEE Robotics and Automation Letters, 5(2):1468–1475, 2020.
  • Hyatt et al. [2020] Phillip Hyatt, Connor S Williams, and Marc D Killpack. Parameterized and gpu-parallelized real-time model predictive control for high degree of freedom robots. arXiv preprint arXiv:2001.04931, 2020.
  • Ishihara et al. [2019] Koji Ishihara, Takeshi D Itoh, and Jun Morimoto. Full-body optimal control toward versatile and agile behaviors in a humanoid robot. IEEE Robotics and Automation Letters, 5(1):119–126, 2019.
  • Kalakrishnan et al. [2011] M. Kalakrishnan, S. Chitta, E. Theodorou, P. Pastor, and S. Schaal. Stomp: Stochastic trajectory optimization for motion planning. In 2011 IEEE International Conference on Robotics and Automation, pages 4569–4574, 2011. doi: 10.1109/ICRA.2011.5980280.
  • Kappler et al. [2018] D. Kappler, F. Meier, J. Issac, J. Mainprice, C. G. Cifuentes, M. Wüthrich, V. Berenz, S. Schaal, N. Ratliff, and J. Bohg. Real-time perception meets reactive motion generation. IEEE Robotics and Automation Letters, 3(3):1864–1871, 2018. doi: 10.1109/LRA.2018.2795645.
  • Kappler et al. [2018] Daniel Kappler, Franziska Meier, Jan Issac, Jim Mainprice, Cristina Garcia Cifuentes, Manuel Wüthrich, Vincent Berenz, Stefan Schaal, Nathan Ratliff, and Jeannette Bohg. Real-time perception meets reactive motion generation. IEEE Robotics and Automation Letters, 3(3):1864–1871, 2018.
  • Karaman et al. [2011] Sertac Karaman, Matthew R Walter, Alejandro Perez, Emilio Frazzoli, and Seth Teller. Anytime motion planning using the rrt. In 2011 IEEE International Conference on Robotics and Automation, pages 1478–1483. IEEE, 2011.
  • Klein and Blaho [1987] Charles A Klein and Bruce E Blaho. Dexterity measures for the design and control of kinematically redundant manipulators. The international journal of robotics research, 6(2):72–83, 1987.
  • Kobilarov [2012a] Marin Kobilarov. Cross-entropy randomized motion planning. In Robotics: Science and Systems, volume 7, pages 153–160, 2012a.
  • Kobilarov [2012b] Marin Kobilarov. Cross-entropy motion planning. The International Journal of Robotics Research, 31(7):855–871, 2012b.
  • Koenemann et al. [2015] Jonas Koenemann, Andrea Del Prete, Yuval Tassa, Emanuel Todorov, Olivier Stasse, Maren Bennewitz, and Nicolas Mansard. Whole-body model-predictive control applied to the hrp-2 humanoid. In 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 3346–3351. IEEE, 2015.
  • Kuntz et al. [2020] Alan Kuntz, Chris Bowen, and Ron Alterovitz. Fast anytime motion planning in point clouds by interleaving sampling and interior point optimization. In Robotics Research, pages 929–945. Springer, 2020.
  • Luo and Hauser [2017] Jingru Luo and Kris Hauser. Robust trajectory optimization under frictional contact with iterative learning. Autonomous Robots, 41(6):1447–1461, 2017.
  • Mildenhall et al. [2020] Ben Mildenhall, Pratul P Srinivasan, Matthew Tancik, Jonathan T Barron, Ravi Ramamoorthi, and Ren Ng. Nerf: Representing scenes as neural radiance fields for view synthesis. In

    European Conference on Computer Vision

    , pages 405–421. Springer, 2020.
  • Murray et al. [2016] Sean Murray, Will Floyd-Jones, Ying Qi, Daniel J Sorin, and George Dimitri Konidaris. Robot motion planning on a chip. In Robotics: Science and Systems, 2016.
  • Nakanishi et al. [2008] Jun Nakanishi, Rick Cory, Michael Mistry, Jan Peters, and Stefan Schaal. Operational space control: A theoretical and empirical comparison. The International Journal of Robotics Research, 27(6):737–757, 2008. doi: 10.1177/0278364908091463. URL
  • Niederreiter [1992] Harald Niederreiter. Random number generation and quasi-Monte Carlo methods, volume 63. Society for Industrial and Applied Mathematics, Philadelphia, US, 1992.
  • Paszke et al. [2019] Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, Alban Desmaison, Andreas Kopf, Edward Yang, Zachary DeVito, Martin Raison, Alykhan Tejani, Sasank Chilamkurthy, Benoit Steiner, Lu Fang, Junjie Bai, and Soumith Chintala.

    Pytorch: An imperative style, high-performance deep learning library.

    In H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alché-Buc, E. Fox, and R. Garnett, editors, Advances in Neural Information Processing Systems 32, pages 8024–8035. 2019.
  • Rakita et al. [2018] Daniel Rakita, Bilge Mutlu, and Michael Gleicher. RelaxedIK: Real-time Synthesis of Accurate and Feasible Robot Arm Motion. In Proceedings of Robotics: Science and Systems, Pittsburgh, Pennsylvania, June 2018. doi: 10.15607/RSS.2018.XIV.043.
  • Scianca et al. [2020] N. Scianca, D. De Simone, L. Lanari, and G. Oriolo. Mpc for humanoid gait generation: Stability and feasibility. IEEE Transactions on Robotics, 36(4):1171–1188, 2020. doi: 10.1109/TRO.2019.2958483.
  • Stulp and Sigaud [2012] Freek Stulp and Olivier Sigaud. Path integral policy improvement with covariance matrix adaptation. arXiv preprint arXiv:1206.4621, 2012.
  • Summers et al. [2020] Colin Summers, Kendall Lowrey, Aravind Rajeswaran, Siddhartha Srinivasa, and Emanuel Todorov. Lyceum: An efficient and scalable ecosystem for robot learning. In Learning for Dynamics and Control, pages 793–803. PMLR, 2020.
  • Sutanto et al. [2020] Giovanni Sutanto, Austin Wang, Yixin Lin, Mustafa Mukadam, Gaurav Sukhatme, Akshara Rai, and Franziska Meier. Encoding physical constraints in differentiable newton-euler algorithm. volume 120 of

    Proceedings of Machine Learning Research

    , pages 804–813, The Cloud, 10–11 Jun 2020. PMLR.
  • Tassa et al. [2014] Yuval Tassa, Nicolas Mansard, and Emo Todorov. Control-limited differential dynamic programming. In 2014 IEEE International Conference on Robotics and Automation (ICRA), pages 1168–1175. IEEE, 2014.
  • Torres et al. [2015] Luis G Torres, Alan Kuntz, Hunter B Gilbert, Philip J Swaney, Richard J Hendrick, Robert J Webster, and Ron Alterovitz. A motion planning approach to automatic obstacle avoidance during concentric tube robot teleoperation. In 2015 IEEE International Conference on Robotics and Automation (ICRA), pages 2361–2367. IEEE, 2015.
  • Vahrenkamp et al. [2012] Nikolaus Vahrenkamp, Tamim Asfour, Giorgio Metta, Giulio Sandini, and Rüdiger Dillmann. Manipulability analysis. In 2012 12th ieee-ras international conference on humanoid robots (humanoids 2012), pages 568–573. IEEE, 2012.
  • Wagener et al. [2019a] Nolan Wagener, Ching-An Cheng, Jacob Sacks, and Byron Boots. An online learning approach to model predictive control. 2019a.
  • Wagener et al. [2019b] Nolan Wagener, Ching-An Cheng, Jacob Sacks, and Byron Boots. An online learning approach to model predictive control. arXiv preprint arXiv:1902.08967, 2019b.
  • Williams et al. [2016] Grady Williams, Paul Drews, Brian Goldfain, James M Rehg, and Evangelos A Theodorou. Aggressive driving with model predictive path integral control. In 2016 IEEE International Conference on Robotics and Automation (ICRA), pages 1433–1440. IEEE, 2016.
  • Williams et al. [2017a] Grady Williams, Andrew Aldrich, and Evangelos A Theodorou. Model predictive path integral control: From theory to parallel computation. Journal of Guidance, Control, and Dynamics, 40(2):344–357, 2017a.
  • Williams et al. [2017b] Grady Williams, Nolan Wagener, Brian Goldfain, Paul Drews, James M Rehg, Byron Boots, and Evangelos A Theodorou. Information theoretic mpc for model-based reinforcement learning. In 2017 IEEE International Conference on Robotics and Automation (ICRA), pages 1714–1721. IEEE, 2017b.
  • Yang et al. [2020] Yuxiang Yang, Ken Caluwaerts, Atil Iscen, Tingnan Zhang, Jie Tan, and Vikas Sindhwani. Data efficient reinforcement learning for legged robots. In Conference on Robot Learning, pages 1–10. PMLR, 2020.
  • Zeng et al. [2020] Andy Zeng, Shuran Song, Johnny Lee, Alberto Rodriguez, and Thomas Funkhouser. Tossingbot: Learning to throw arbitrary objects with residual physics. IEEE Transactions on Robotics, 36(4):1307–1319, 2020.
  • Zhong et al. [2013] M. Zhong, M. Johnson, Y. Tassa, T. Erez, and E. Todorov. Value function approximation and model predictive control. In 2013 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL), pages 100–107, 2013. doi: 10.1109/ADPRL.2013.6614995.
  • Zucker et al. [2013] Matt Zucker, Nathan Ratliff, Anca D Dragan, Mihail Pivtoraiko, Matthew Klingensmith, Christopher M Dellin, J Andrew Bagnell, and Siddhartha S Srinivasa. Chomp: Covariant hamiltonian optimization for motion planning. The International Journal of Robotics Research, 32(9-10):1164–1193, 2013.