Stochastic Tensor Optimization for Robot Motion - A GPU Robot Motion Toolkit
Sampling-based model predictive control (MPC) is a promising tool for feedback control of robots with complex and non-smooth dynamics and cost functions. The computationally demanding nature of sampling-based MPC algorithms is a key bottleneck in their application to high-dimensional robotic manipulation problems. Previous methods have addressed this issue by running MPC in the task space while relying on a low-level operational space controller for joint control. However, by not using the joint space of the robot in the MPC formulation, existing methods cannot directly account for non-task space related constraints such as avoiding joint limits, singular configurations, and link collisions. In this paper, we develop a joint space sampling-based MPC for manipulators that can be efficiently parallelized using GPUs. Our approach can handle task and joint space constraints while taking less than 0.02 seconds (50Hz) to compute the next control command. Further, our method can integrate perception into the control problem by utilizing learned cost functions from raw sensor data. We validate our approach by deploying it on a Franka Panda robot for a variety of common manipulation tasks. We study the effect of different cost formulations and MPC parameters on the synthesized behavior and provide key insights that pave the way for the application of sampling-based MPC for manipulators in a principled manner. Videos of experiments can be found at: https://sites.google.com/view/manipulation-mppi.READ FULL TEXT VIEW PDF
Modern, torque-controlled service robots can regulate contact forces whe...
This paper proposes an MPC-based controller to efficiently execute multi...
This paper presents a novel Representation-Free Model Predictive Control...
Autonomous mobile manipulation is the cutting edge of the modern robotic...
Model predictive control (MPC) has been successful in applications invol...
We propose a model predictive control approach to pushing based manipula...
The probabilistic velocity obstacle (PVO) extends the concept of velocit...
Stochastic Tensor Optimization for Robot Motion - A GPU Robot Motion Toolkit
Real-world robot manipulation can greatly benefit from real-time perception-driven feedback control [4, 21]. Consider an industrial robot tasked with stacking boxes from one pallet to another or a robot bartender moving drinks placed on a tray . In both cases, the robot must ensure object stability via perception while respecting joint limits, avoiding singular configurations and collisions, and handling task constraints such as maintaining orientation during transfer. This leads to a complex control problem with competing objectives as well as nonlinear and sometimes discontinuous dynamics that is difficult to solve in real-time.
Classic approaches to solving these tasks include operational-space control (OSC) [31, 5, 10], where the different task costs are formulated in their respective spaces and then projected into the joint space (i.e., the control space of the robot) via a Jacobian map. OSC methods are very local as they only optimize for the next time step and do not consider future actions or states while optimizing for the current action.
Recent approaches based on model predictive control (MPC) attempt to find a locally-optimal policy over a finite horizon starting from the current state using a potentially imperfect dynamics model. An action from the policy is executed on the system and the optimization is performed again from the resulting next state which can overcome the effects of model-bias. Model predictive control has been successfully applied on real robotic systems allowing them to rapidly react to changes in environment [2, 11, 35, 5, 44].
Model predictive control can be performed either by optimizing over states with a constraint on the dynamics function to enforce feasibility (“direct”) or by optimizing over the control space (“indirect” or “shooting”) with states computed by rolling out the dynamics. Direct methods are hard to optimize, especially when the dynamics is nonlinear or discontinuous as the constraints on the dynamics become more difficult to satisfy. With shooting methods, dynamics feasibility is always guaranteed as the states are obtained by rolling out dynamics from control. Additionally, for most real world tasks, the optimization problem can be set up as an unconstrained problem that allows for very fast solutions enabling real-time model predictive control on real robotic systems with high degrees of freedom[39, 45].
Shooting-based MPC methods can be solved with either a gradient-based  or a sampling-based optimization scheme . Gradient-based methods require the cost and the dynamics function to be differentiable for optimization which severely restricts the class of functions they can handle and makes them highly susceptible to local minima, often requiring extensive expertise in both modelling and tuning for desired behavior. These methods are also computationally expensive as derivatives are often computed using finite differences on real systems. These key issues must be resolved to effectively employ MPC algorithms in manipulation problems.
In this work, we tackle the problem of developing a general framework for perception driven model-predictive control that offers the flexibility of using generic cost and dynamics functions while simultaneously ensuring desirable properties such as smoothness, constraint satisfaction, and computational speed. To this end, we focus on sampling-based MPC algorithms that use simple policies to sample control sequences followed by rolling out the dynamics function to compute a sample-based gradient of the objective function used to update the sampling policy. These algorithms make no restrictive assumptions about the cost, dynamics or policy class, are straightforward to parallelize and can be effectively applied in highly dynamic systems and environments [44, 42]
. Using complex dynamics and cost functions makes each rollout computationally expensive and this can also increase the number of rollouts required to generate a good solution. In literature these problems have been tackled by either learning a neural-network model that is fast to simulate, learning a terminal cost function to reduce the planning horizon , or using an augmented control space such as the operational (task) space of a manipulator . In this work, we make the case for using simple models coupled with simple behavior-based cost functions and fast parallel computation to achieve strong performance on manipulation tasks in complex environments. We would like to note that our approach can easily incorporate learned models or cost functions and does not suffer from errors induced by ignoring the full joint space of the robot in the optimization. In summary, our key contributions are:
A framework for fast joint space sampling-based MPC on high degree-of-freedom manipulators that optimizes task objectives while satisfying task and joint-space constraints.
Key insights into different components of sampling-based MPC that enable smooth, reactive and precise robot motions.
Coupling perception with control by leveraging learned discrete/discontinuous cost terms from raw sensor data which we validate by leveraging an existing learned function for collision checking  between the robot links and the environment point cloud.
A highly parallelized implementation which achieves a control rate of 50Hz111Without the learned collision checking function, we achieve a control rate of 80hz with 500 particles and 50 timesteps. on a single GPU, a speedup of 50x compared to existing MPPI based manipulation .
Empirical evaluation in simulation and a real-world Franka Panda robot.
We formally define the problem in Sec. II and build our approach in Sec. III, followed by our real time implementation in Sec. IV. We report experimental results in Sec. V and discuss our approach’s strengths and limitations in Sec. VI. We then share our method’s relation to previous work in Sec. VII.
We consider the problem of generating a feedback control law (or policy) for a robot manipulator with
joints performing user-specified tasks in an unstructured environment where it is subject to non-linear constraints and must react in real-time to overcome errors due to its internal dynamics model, state estimation, and perception.
The problem can be modelled as optimal control of a discrete-time stochastic dynamical system described by the equation,
defines the probability of transitioning into stateconditioned on and control input . The robot chooses controls using a deterministic or stochastic closed-loop policy , incurs an instantaneous cost and transitions into the next state, and the process continues for a finite horizon . The goal is to design a policy that optimizes a task-specific objective function
is the space of all possible policies. The above setup is akin to optimizing a finite horizon Markov Decision Process (MDP) with continuous state and action spaces as done in reinforcement learning approaches.
Solving for a complex globally optimal policy is a hard problem especially since the task objective could be sparse or difficult to optimize. MPC can be viewed as a practically motivated strategy that simplifies the overall problem by focusing only on the states that the robot encounters online during execution and rapidly re-calculating a “simple” locally optimal policy. At state , MPC performs a look-ahead for a finite horizon using an approximate model , approximate cost function and parameterized policy to find the optimal parameters that minimize an objective function
where is a terminal cost function that serves as a coarse approximation of the cost-to-go beyond the horizon. An action is sampled from and the optimization is performed again from the resulting state after applying the action to the robot. The optimization is hot-started from the solution at the previous timestep by using a shift operator, which allows MPC to produce good solutions with few iterations of optimization (usually 1 in practice).
In the next section, we first introduce the sampling-based MPC technique to solving the optimization in Eq. 3 and discuss the objective function, policy classes, and update equations. We then present our approach for applying it to manipulation problems. An overview of the notation used in the paper are presented in Table I.
|joint position, velocity, acceleration|
|robot state at time|
|joint space command at time|
|number of steps in horizon|
|change in time between steps in horizon|
|parameters of policy at time t.|
|distribution over control given state|
|sequence of H Gaussian means|
|sequence of H Gaussian covariances|
|batch of N control sequences of length H|
|batch of N state sequences of length H|
|batch of N cost sequences of length H|
MPC loss function
Sampling based MPC can be viewed as iteratively optimizing simple policy representations such as time-independant Gaussians over open-loop controls with parameters such that . Here, represent the sequence of means and covariances at every step along the horizon. At every iteration of optimization, a batch of control sequences of length , , are sampled from the current distribution, followed by rolling out the approximate dynamics function using the sampled control to get a batch of corresponding states and costs . The policy parameters are then updated using a sample-based gradient of the objective function. After optimization iterations we can either sample an action from the resulting distribution or execute the mean action. A standard algorithm is shown in Alg. 1. We next describe how the distribution is updated. Consider the function ,
where is a discount factor that is used to favor immediate rewards. A widely used objective function is the exponentiated utility or the risk-seeking objective,
where is a temperature parameter. The mean and covariance are updated using a sample-based gradient of the above objective as,
where and are step-sizes that regularize the current solution to be close to the previous one and
The update equation for mean is the same as the well-known Model-Predictive Path Integral Control (MPPI) algorithm . We refer the reader to  for the connection and derivation of update equations. While, covariance adaptation has previously been explored in the context of Path Integral Policy Improvement  to automatically adjust exploration, standard implementations of MPPI on real systems generally do not update the covariance [46, 43]. However, we found that updating the covariance leads to better performance with a fewer number of particles, such as stable behavior upon convergence to the goal. Once an action from the updated distribution is executed on the system, the mean and covariance are shifted forward with default values appended at the end to hotstart the optimization at the next timestep.
The above formulation of MPC offers us the flexibility to extract different behaviors from our algorithm by tuning different knobs such as the choice of approximate dynamics, running cost, terminal cost, the policy class and parameters such as the horizon length, number of particles, step sizes and discount factor . Next, we switch our focus to the domain of robot manipulation and build our approach for sampling-based MPC by systematically evaluating the effects of a subset of key design choices on the overall performance of the controller.
The MPC paradigm allows us to effectively leverage simple models that are both computationally cheap to query and easy to optimize, as re-optimizing the control policy at every timestep can help to overcome the effects of errors in the approximate model. We leverage this error correcting property of MPC and utilize the kinematic model of the manipulator as our approximate transition model. Let the robot state be defined in the joint space as and the commanded action be the joint acceleration . At every optimization iteration, we compute the state of the robot across the horizon (, , ) by integrating the sampled control sequences from the robot’s initial state
(i.e., current state of the real robot). This can be efficiently implemented as a tensor product followed by a sum as,
where is a lower triangular matrix filled with 1, and is a vector of delta timesteps across the horizon. We use variable timesteps, with a smaller dt for the earlier steps along the horizon for higher resolution cost near the robot’s current state and larger ones for later steps to get a better approximation of cost-to-go, similar to . We intentionally write as to highlight the fact that we compute the states across the batch and horizon with a tensor operation without the need to iteratively compute the states across the horizon.
Given a batch of states , the Cartesian poses , velocities , and accelerations of the end-effector are obtained by using forward kinematics , the kinematic Jacobian , and the derivative of the kinematic Jacobian function respectively.
We can compute the forward kinematics in a batch as it depends only on the current state. This provides us an easily parallelizable formulation of the robot model that is amenable to GPU acceleration.
The cost function allows us to encode high-level robot behavior directly into the MPC optimization. This can be viewed as a form of cost-shaping that allows MPC to achieve sparse task objectives while also satisfying auxillary requirements such as avoiding joint limits, ensuring smooth motions, safety and maximizing manipulability during execution. Further, sampling-based MPC approaches do not rely on the gradient of the cost function and can thus incorporate general cost terms that are not constrained to be smooth or differentiable.
We consider cost functions that are a weighted sum of simple cost terms, where each individual term encodes a desired behavior for manipulator control that can be easily implemented in a batched manner.
Given the desired and current Cartesian poses for the end-effector respectively in the world frame , we compute a task-space cost term that penalizes the distance between the two
where are the rotation and translation part of the goal pose , and are the rotation and translation part of the current end-effector pose. The weight vectors allow us to weigh errors in different directions and orientations with respect to each other and can be set to different values to obtain qualitatively different behavior such as partial pose reaching or enforcing partial pose constraints. For example, we can set a high weight in along a desired axes to maintain the goal orientation throughout the motion of the robot as we demonstrate in our experiments.
The finite horizon of MPC makes it myopic to events that can occur far out in the future, especially in dynamic environments. Thus, it is desirable to ensure that the robot can safely stop within the horizon in reaction to events that might be observed at timestep . We encode this behavior by computing a time varying velocity limit for every timestep in the horizon based on the maximum acceleration of the robot (or a user-specifed maximum acceleration) and the time available until . This means the joint velocity of the robot must allow it to come to a stop at the end of the horizon by applying the max acceleration. Any state that exceeds this velocity is penalized by a cost which is expressed as
where is an upper triangular matrix filled with 1.
Given minimum and maximum limits on joint state respectively, we penalize the joint state only if it exceeds a safety threshold defined by a ratio of its full range.
where , adds a safety threshold from the actual bounds of the robot. In our experiments, we chose .
The manipulability score describes the ability of the end-effector to achieve any arbitrary velocity from a given joint configuration. It measures the volume of the ellipsoid formed by the kinematic Jacobian which collapses to zero at singular configurations. Thus, to encourage the robot to optimize control policies that avoid future kinematic singularities, we employ a cost term that penalizes small manipulability scores [23, 41]
where we choose based on values obtained by Haviland and Corke  for the Franka Panda robot.
Computing self-collision between the links of the robot has previously shown to be computationally expensive, especially when we want to compute for many joint configurations [34, 8]. Hence, similar to previous approaches [34, 8], we train a neural network that predicts the closest distance 222Distance is positive when two links are penetrating and negative when not colliding. between the links of the robot given a joint configuration (). One difference in our approach is the use of positional encoding (i.e.,) as we found this to improve the accuracy of the distance prediction . Our network (which we call jointNERF) has three layers with neurons and ReLU activations. We compute a cost term as shown below,
Safe operation in unstructured environments requires a tight coupling between perception and control for collision avoidance. General gradient-based trajectory optimization and MPC approaches  rely on either known object shapes or pre-computed signed distance fields that provide gradient information for optimization. However, our sampling-based approach can handle discrete costs and as such we explore collision avoidance without using signed distance. We specifically use a learned collision checking function from Danielczuk et al. 
that operates directly on raw pointcloud data. The method classifies if an object pointcloudis in collision or not with the pointcloud given the object’s pose . The cost can be written as,
Finally our running cost function is given by
In our experiments we chose large values for and to enforce joint and collision avoidance constraints. For pose reaching, was used with smaller value on orientation as it only needs to be maintained at the goal. For enforcing orientation constraints while reaching goals, however, was chosen.
The method used for sampling controls from the Gaussian policy can have a great impact on the convergence of the optimization and can help embed different desired behaviors such as always ensuring smooth motions. Pseudorandom sequences typically used in Monte Carlo integration exhibit an undesirable clustering of sampled points which results in empty regions. Whereas low-discrepancy sequences, where low discrepancy refers to the degree of deviation from perfect uniform sampling, alleviate this problem by defining deterministic samplers that correlate each point to avoid groupings . Halton sequences  are a widely used form of low discrepancy number generators that attempt to improve the rate of convergence of Monte Carlo algorithms. In particular, the Halton sequence uses the first prime numbers to define a sequence, , for integers and where and , with for . It has been reported that Halton sequences, together with other low-discrepancy sequences, achieve superior performance in high-dimensional problems as compared to standard MC 333See http://extremelearning.com.au/unreasonable-effectiveness-of
-quasirandom-sequences/ for a visualization of different low-discrepancy methods. Thus, we leverage Halton sequences for sampling controls that can provide a better estimate of the objective function gradient. Controls from the Halton sequence are sampled once at the beginning and then transformed using the mean () and covariance of the current Gaussian policy.
Further, we explore two different strategies for enforcing smoothness in sampled control sequences. The first method is a comb filter that uses user-specified coefficients to filter out each sampled control trajectory along the horizon as,
This method has previously been used with sampling-based control techniques [47, 37], however, it requires extensive hand tuning and the filtered trajectories are not guaranteed to be smooth as neighboring action samples in the horizon can have large difference in magnitude.
We propose an alternate strategy to enforce smoothness by fitting B-splines of degree to controls sampled using a Halton Sequence. The resulting curve is sub-sampled at a finer resolution to obtain smooth joint acceleration samples which are then integrated to obtain corresponding joint velocity and position trajectories using Eqs. (12-14). In Fig. 2
we show a qualitative comparison between using different combinations of sampling and smoothing strategies for a planar robot trying to reach a goal in an obstacle filled environment. We use a small number of particles (200), a horizon of 30 steps and run the controller for 100 time steps with replanning after every step. From the figure we can see that pseudorandom samples with a comb-filter are unable to fill the control space adequately to escape the local minimum. Halton samples consistently succeed, however the resulting trajectory with B-spline interpolation smoothly goes around obstacles as compared tocomb-filtering. Although, pseudorandom sampling with B-Splines is able to reach the goal, the non-uniformity in sampling does not explore the space and produces a longer, sub-optimal path.
Our sampling strategy also offers us the flexibility of incorporating certain fixed set of action trajectories which could be task-dependent or even a library of pre-defined desired motions. We leverage this fact by incorporating a set of zero acceleration or ”null” particles which allow the robot to easily stop at the goal as demonstrated in our experiments.
Conventionally, sampling-based MPC algorithms such as MPPI parameterize the covariance of the Gaussian policy to be of the form where is a scalar value and is a identity matrix, which forces the covariance to be the same across all control dimensions. However, in the case of manipulation problems it is desirable to allow the covariance of different joints to adapt independently so they can possibly optimize different cost terms such as pose reaching versus increasing manipulability. Thus we also consider covariance of the form where . Each term in is then adapted based on the rollouts. Adapting the covariance along action dimensions has also been employed by  in the context of Cross-Entropy Method (CEM).
In the next section, we discuss the implementation details of our real-time control system.
We implemented our MPC pipeline using PyTorch
with manipulator forward kinematics adapted from the open-source implementation provided by Sutantaet al.  and all cost terms and update equations implemented in a batched fashion. Further, multiprocessing is used to run MPC in a seperate process to avoid latency issues. Table II shows a timing comparison of our system running on a Titan RTX GPU against leading manipulator control approaches in literature.
|OSC [10, 5, 14, 34]||1-17||1|
|Motion Planning [40, 22, 1]||140-1000||N/A|
|Custom Chip Motion Planning ||1||N/A|
|Gradient MPC [12, 11, 18]||20 - 140||16|
|Sampling MPC ||100||40|
The desired acceleration command from MPC is evaulated at 100Hz and integrated forward to obtain desired joint position () and joint velocity commands respectively. These commands are sent to a custom low-level torque controller that computes desired torque commands at 1000Hz to control the Franka robot,
where and are the inertia and coriolis force matrices respectively provided by and are gains for the position and velocity errors respectively. Fig. 3 shows the overall architecture of the control system.
We found the noise in the joint state read from , especially in joint velocities, to be prohibitive for precise control with MPC. In order to circumvent this issue we implemented a joint state filter that first predicts the state based on the previous commanded joint acceleration and then uses an exponential moving average filter to incorporate sensor readings.
We use an Intel Realsense D455 Depth Camera placed at a fixed location in the workspace with a known camera-to-robot transform to obtain depth data in the form of point clouds. The raw point cloud is filtered to remove all the points lying inside the robot body to obtain the scene point cloud. The robot URDF is used to sample points along the robot body to create a robot point cloud. Both these point clouds are fed as input to SceneCollisionNet  for computing the collision cost in Eq. 23.
We setup our experiments to study the efficacy of our proposed MPC framework on a real Franka Panda robot in the following manipulation scenarios: (1) Accuracy and smoothness of the controller in reaching Cartesian poses, (2) Reacting to changing goals observed from perceptual data, (3) Handling task constraints in the form of orientation constraints , (4) Moving around obstacles to reach a target pose avoiding local minima. In all our real-world experiments we use Halton sequences with B-Spline interpolation and update all diagonal entries of the covariance. We provide qualitative ablations in simulation for different components of our framework such as cost terms, sampling strategy and policy parameterization on our website (https://sites.google.com/view/manipulation-mppi).
We select a sequence of six Cartesian poses for the robot to reach with a next target provided after every 12 seconds. This sequence is run three times in a loop. Some of the poses come from  where the authors demonstrate these poses to be very hard to reach with a standard OSC. One of the poses has a very hard change in the end-effector’s orientation as shown in Fig. 5.
By leveraging simple kinematic models coupled with fast re-optimization our MPC approach is able to achieve a median position error of 0.6cm and orientation error of , where we compute the orientation error % as difference between the desired and reached quaternion, orientation error%=. Further, we also show the time response in Fig. 6 for a window of the experiment. Here we observe that the robot reaches the position well ahead of 12 seconds. The robot does not oscillate after reaching the pose demonstrating stability of the resulting controller. This can be attributed to our used of explicit zero acceleration particles as part of our sampling strategy, as mentioned in III-C allowing our controller to latch onto these zero particles when near the goal. The accompanying video also qualitatively demonstrates the smoothness of motions optimized by the MPC algorithm using our Halton sequence with B-Splines sampling strategy.
A ball held by a human is tracked using a depth camera via a point cloud color filter and the robot tries to reach the ball while the human moves it to different positions in the tabletop workspace. In our experiments, we found the robot to react to changes in the ball position with no sluggish behavior as seen in the attached supplementary video. We show a plot of the ball’s trajectory and the robot end-effector’s trajectory in Fig. 7. During initial experiments, we found ourselves helping the robot reach the ball by adjusting the ball once the robot was close, which is a promising future direction for our work towards human robot interaction .
We test the ability of our framework to satisfy multiple task objectives by imposing orientation constraints on the end-effector while tracking a moving ball detected from point cloud data. The orientation constraint is a soft-constraint imposed by setting to a high value in the pose cost. Our controller achieves a median quaternion error of while also tracking the ball with sufficient accuracy as shown in Fig. 8. We refer the reader to the accompanying video for a qualitative evaluation.
To test our method’s capability to avoid obstacles, we setup two different table top environments, one consisting of two blocks representing common pick and place environments as shown in Fig. 1 and an environment with think walls to represent an heavily occupied space, as shown in Fig. 9. We again use our perception based ball tracker to make the robot reach different positions in the environment. During the experiment, we move the ball to some positions that are not reachable by the robot due to possibility of collision between the robot’s links and the obstacles. As seen in Fig. 9, the robot handles these situations very well as it prioritizes collision avoidance over reaching the pose accurately. We also conduct several collision avoidance experiments in simulation which are attached as a video in the supplementary material.
Our experimental results indicate that accurate and reactive manipulation behavior can be obtained by using relatively simple models and intuitive cost functions in an MPC framework. Sampling-based MPC also provides us flexibility to encode different desired behaviors such as smooth motions directly in the optimization and tightly couple perception with control which are key components for real-world robot control.
We presented a sampling-based stochastic MPC framework for manipulation that operates in the joint space and is able achieve smooth, reactive, and precise motions while respecting joint and task space constraints. The first key component of our approach is a fully tensorized kinematic robot models that allows for GPU-based acceleration of rollouts. Second, we leverage intuitive cost terms that encourage different desirable behaviors. For example, the stop cost on velocity prevents the controller from suddenly ramping up the acceleration. Third, our formulation allows us to leverage different sampling strategies to embed desirable properties directly into the optimization. These components allowed us to obtain similar results as full motion planners based on stochastic optimization [19, 24] in a reactive MPC setting. However, there a few key areas to be explored in the future. Inference on the learned collision checking model we used is slow, which increases latency as compared to simple pose-reaching. Furthermore, as we increase the speed of the robot, the kinematic model might not hold leading to significant model-bias. Here, learning a residual dynamics model  or a terminal Q-function  can help mitigate the effects of model-bias while still maintaining computational speed.
Perception driven feedback control on high dimensional systems is a large field of research with several existing approaches . Operation Space Controllers (OSC) are some of the fastest algorithms available for feedback control, with methods achieving control latency of 1-2 ms [10, 5, 14, 34]. However OSC methods rely heavily on a higher level planner for avoiding local minima (e.g.,obstacles).
A more global approach has been explored by reformulating standard motion planning methods to do feedback control via online replanning [27, 40, 22, 1, 30]. However most of these methods run at a slow rates on high dimensional systems, with control latencies between 140ms and 1000ms [27, 40, 22, 1]. Murray et al.  researched leveraging a custom chip to do fast parallel collision checking and use this with a PRM style planner to replan at 1ms for reaching Cartesian Poses in a semi-structured environment. Their chip based collision checker uses a complete pointcloud of the environment obtained by placing many cameras in the environment and combining their pointclouds. This is an highly unrealistic setting for real world manipulation in unstructured environments.
In the realm of Model predictive control approaches, only gradient based joint space MPC methods have shown to work on real manipulation systems as their control latency is in an acceptable range (20ms - 125ms ) for feedback control [12, 11, 18]. Ishihara et al.  explore two stage hierarchical ILQR for fast MPC on a humanoid robot. Their two stage approach enables a very low control latency of 20ms. Erez et al.  explore ILQR on humanoid robots leveraging a simulator for the dynamics model. Fishman et al.  use gradient based MPC for finding trajectories for a manipulator while simultaneously predicting user intent in a human robot interaction setting. They solve the optimization problem leveraging Levenberg-Marquardt at a control latency of 140ms. Hogan and Rodriguez  explore gradient based MPC in the task space for planar non-prehensile manipulation leveraging a learned mode switcher to switch between different dynamic models. They are able to obtain a control latency of 5ms as their dynamic models are smooth.
Sampling-based methods have a rich history in MPC. Model-Predictive Path Integral Control (MPPI)  is one of the leading sampling-based MPC approaches that has shown great performance on real-world aggressive off-road autonomous driving by leveraging learned models  and GPU acceleration . Wagener et al.  analyze MPC algorithms from the perspective of online learning and show connections between different methods such as Cross-Entropy method (CEM) and MPPI and have also demonstrated control rates of 40Hz with 1200 samples and a horizon of 2.5 seconds for off-road driving using GPU acceleration.
However, in the context of manipulation, sampling-based control in joint space has only been explored by optimizing in the joint position space without considering velocity and acceleration limits [8, 17, 16]. Danielczuk et al.  learn a collision classifier and do online replanning in joint space leveraging an inverse kinematic function to get a goal joint configuration and find straight line paths in joint space to reach the goal while avoiding collisions. Their approach doesn’t handle different task spaces directly in the form of cost functions and also has a much larger control latency of 1000 ms. Hyatt et al. [17, 16] compare sampling based MPC with gradient based MPC on large dimensional robots with piecewise linear functions. They show that sampling based MPC can run at 200Hz (5ms) even with large number of dimensions due to it’s parallelizability on the GPU. However, they do not explore collision avoidance or task space cost terms in their approach and leave it for future work.
formulate planning as a stochastic trajectory optimization problem and plan over the joint position space to reach Cartesian positions while orientation constraints on the end-effector. They structure their co-variance matrix based on the finite difference matrix to sample delta joint positions that start and stop at 0 with a smooth profile in between. This sampling along with projecting the weighted action through this matrix, pushes the optimization towards a smooth trajectory for execution on the real robot. Kobilarov[24, 25] explored using cross-entropy optimization for motion planning and showed global convergence in very tight environments on quadrotors and simple planar environments.
European Conference on Computer Vision, pages 405–421. Springer, 2020.
Pytorch: An imperative style, high-performance deep learning library.In H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alché-Buc, E. Fox, and R. Garnett, editors, Advances in Neural Information Processing Systems 32, pages 8024–8035. 2019.
Proceedings of Machine Learning Research, pages 804–813, The Cloud, 10–11 Jun 2020. PMLR. URL http://proceedings.mlr.press/v120/sutanto20a.html.