I Introduction
As autonomous vehicles and other robots become increasingly popular in our daily lives, one of the major concerns is whether humans can trust the robots’ ability to complete their assigned tasks safely. For autonomous vehicles, for example, neglecting or misinterpreting the disturbances during driving can lead to serious consequences within milliseconds. The demand for safety has led many researchers develop robust control and planning algorithms for autonomous robotic systems. For example, samplingbased planning algorithms that consider uncertainty and collision probability in the vertex selection or evaluation processes have been proposed in
[22, 16, 10, 12], and optimizationbased planning algorithms that consider systematic disturbances and chance constraints explicitly by solving optimization problems have been developed in [1, 9, 20, 19].In this paper, we propose an MPCbased robust trajectory planning approach that deals with environmental and plant uncertainty, while providing guarantees on the dispersion of the closedloop system future trajectories. Model Predictive Control (MPC) is an algorithmic, optimizationbased control design [25] that has gained popularity for autonomous vehicle control over the past years [24, 13]. The deterministic MPC approaches are modelbased and generate trajectories assuming there are no uncertainties in the dynamics. As a result, MPC controllers are typically not robust to model parameter variations. To improve the performance of MPC controllers by taking system uncertainty into account, Robust MPC (RMPC) controllers have been proposed to handle deterministic uncertainties residing in a given compact set. RMPC generates control commands by considering worstcase scenarios, thus the resulting trajectories can be conservative. Reference [11] provides an extensive review summarizing all types of RMPC controllers. To achieve more aggressive planning, Stochastic MPC (SMPC) utilizes the probabilistic nature of the system uncertainty to account for the most likely disturbances, instead of considering only the worstcase disturbance, as with the RMPC [17, 8]. There are two classes of SMPC approaches in the literature. The first one is based on the analytical solutions of some optimization problem, such as [4, 5, 21], while the second approach relies on randomization to solve optimization problems, such as [2, 3, 29]. The proposed CCMPPI controller is somewhere inbetween these two, as it analytically computes a controlled dynamics by considering the model uncertainty and then generates the optimal control using randomized rollouts of the controlled dynamics. This is discussed in greater detail in Section IV.
Most of current MPC implementations assume linear system dynamics and formulate the resulting MPC task as a quadratic optimization problem, which helps MPC meet the strict realtime requirements required for safe control. However, these approaches depend on simplified linear models that may not capture accurately the dynamics of the real system. Model Predictive Path Integral (MPPI) control [28] is a type of MPC algorithm that solves repeatedly finitehorizon optimal control tasks while utilizing nonlinear dynamics and general cost functions. Specifically, MPPI is a simulationbased algorithm that samples thousands of trajectories around some mean control sequence in realtime, by taking advantage of the parallel computing capabilities of modern Graphic Processing Units (GPUs). It then produces an optimal trajectory and its corresponding control sequence by calculating the weighted average of the cost of the ensuing sampled trajectories, where the weights are determined by the cost of each trajectory rollout. One of the advantages of the MPPI approach over more traditional MPC controllers is that it does not restrict the form of the cost function of the optimization problem [26], which can be nonquadratic and even discontinuous.
Despite its appealing characteristics, the MPPI algorithm may encounter problems when implemented in practice. In particular, when the mean control sequence lies inside an infeasible region, all the resulting MPPI sampled trajectories are concentrated within the same region, as illustrated in Fig. 1, and this may lead to a situation where the trajectories violate the constraints. Two cases this may happen are: first, when the MPPI algorithm diverges because the environment changes too fast; and, second, when the algorithm fails because the predicted dynamics do not capture the noise and uncertainty of the actual dynamics. The reason the MPPI algorithm may perform poorly under the previous two cases is because it fails to take into account the disturbances (either from the dynamics or from the environment) so that all sampled trajectories end up violating the constraints. Figure 1 shows the influence of the noise on MPPI sampled trajectories. In this figure, the gray curves are the MPPI sampled trajectories, the red curves show the boundaries of the trajectory sampling distribution, and the green curve represents the simulated trajectory of the robot following the optimal control sequence given the current distribution. In Fig. 1(a) the autonomous vehicle has sampling distribution mostly inside the track initially. In Fig. 1(b), the vehicle ends up in an unexpected pose due to unforeseen disturbances after it executes the control command. This further leads to the situation depicted in Fig. 1(c), where the algorithm diverges because all of the sampled trajectories violate the constraints.
To mitigate the previous shortcomings of the MPPI algorithm, prior works apply a controller to track the output of the MPPI controller in order to keep the actual trajectory as close as possible to the predicted nominal trajectory. These approaches separate the planning and control tasks so that MPPI acts similarly to a path planner. For example, in [27] an iterative Linear Quadratic Gaussian (iLQG) controller was used to track the planned trajectory provided by MPPI. In [23] the authors propose a method that utilizes a tracking controller with augmentation to compensate for the mismatch between the nominal dynamics and the true dynamics. However, these methods do not improve the performance of the MPPI algorithm if there are significant changes in the environment within a short interval of time. The proposed CCMPPI algorithm tries to address some of these shortcomings by improving the performance of the MPPI algorithm under the scenarios mentioned above. This is achieved by introducing adjustable trajectory sampling distributions, and by directly controlling the evolution of these trajectory distributions to avoid an uncontrolled dispersion at the end of the control horizon.
Ii Problem Formulation
The goal of the proposed CovarianceControlled MPPI (CCMPPI) controller is to make the distributions of the sampled trajectories more flexible than the ones generated by MPPI, such that the CCMPPI algorithm samples more efficiently and with a smaller probability to be trapped in local minima when the optimal trajectory from the previous time step lies inside some highcost region, as illustrated in Fig. 1(c). To this end, we introduce a desired terminal state covariance for the states of the dynamics (1b) at the final time step
as a hyperparameter for the CCMPPI controller. The key idea is that the distribution of the sampled trajectories can be adjusted by a suitable choice of
together with the control disturbance variance
. The CCMPPI controller solves the following optimization problem,(1a)  
subject to,  
(1b)  
(1c)  
(1d) 
at each iteration, where the state terminal cost and the state portion of the running cost can be arbitrary functions. The objective function (1a) minimizes the expectation of the state and control costs with
being a random vector subject to the dynamics (
1b).Iii MPPI Algorithm Review
The MPPI controller, as described in [26], minimizes (1a) subject to (1b). As in Problem (1), the terminal cost and the state portion of the running cost of the MPPI can be arbitrary functions.
As in an MPC setting, the MPPI algorithm samples trajectories during each optimization iteration. Let be the mean control sequence, be the actual control sequence, and let , , be the control disturbance sequence corresponding to the sampled trajectory at the current iteration, such that , where . The cost for the sampled trajectory is given by [29]
(2) 
where is the cost for the sampled trajectory at step, and is given in [29],
(3) 
where is the ratio between the covariance of the injected disturbance and the covariance of the disturbance of the original dynamics [29]. The term in (3) is the cost for the disturbancefree portion of the control input, and both and in (3) penalize large control disturbances and smooth out the resulting control signal. The weights of the sampled trajectory are chosen as [30]
(4) 
where,
(5) 
and where determines how selective is the weighted average of the sampled trajectories. Note that the constant does not influence the solution, and it is introduced to prevent numerical instability of the algorithm. The MPPI algorithm generates the optimal control sequence and the mean sequence of the next iteration using the following equations,
(6) 
In Section IV we discuss how the CCMPPI controller satisfies the terminal constraint (1d), and then present the complete CCMPPI algorithm.
Iv CovarianceControlled MPPI
In this section, we introduce the proposed CCMPPI controller. Section IVA discusses the linearization of the dynamics (1b), and Section IVB uses this linearization to achieve the terminal constraint (1d). Section V presents the proposed CCMPPI algorithm that solves the optimization Problem (1).
Iva Linearized Model
We start by linearizing system (1b) along some reference trajectory using the approach outlined in [7]. The reference trajectory of the first optimization iteration is a random trajectory; starting with the second iteration, the reference trajectory of the current iteration is the trajectory generated by the optimal control sequence from the previous iteration. To this end, let be the reference control sequence at the current iteration, and let be the corresponding reference state sequence, such that
(7) 
The dynamical system in (7) can then be approximated in the vicinity of with a discretetime linear timevarying (LTV) system as follows
(8) 
where and are the state and control input respectively to the LTV system at step , and
(9) 
(10) 
where and are the system matrices, and is the residual term of the linearization.
IvB CovarianceControlled Trajectory Sampling
As with the baseline MPPI algorithm, the CCMPPI algorithm simulates trajectories during each iteration. Let be the control sequence of the CCMPPI sampled trajectory during the current iteration of the algorithm where we drop the superscript for simplicity. The optimal control sequence from the previous iteration and the reference control sequence of the current iteration is injected with artificial noise and a feedback term is added, such that
(11) 
where follows the dynamics,
(12) 
where since we assume perfect observation of the initial state [21]. Substituting (11) into (8) yields,
(13) 
where is the state of the CCMPPI sampled trajectory at step , and is the residual term of the linearization at step as defined in (10). Let be the state at the beginning of the current iteration. We can then rewrite the system in (13) in the compact form,
(14) 
where , , , and the augmented system matrices , , , are defined similarly as in [20]. In order to compute to satisfy the terminal covariance constraint (1d), the CCMPPI solves Problem (15) at each optimization iteration.
(15a)  
subject to,  
(15b)  
(15c) 
where , the augmented cost parameter matrices and . Since and , we have . It follows from (12) and (14) that
(16) 
and,
(17) 
The cost function in (15) can then be converted to the following equivalent form [19]
(18) 
The reference control sequence is fixed and is given by the optimal control sequence from the previous CCMPPI iteration, which implies that is fixed and is given by (16). For the optimization problem in (15), we can then drop the terms representing constant values in (18) and obtain the cost
(19) 
Substituting (17) into (19), and using the fact that , yields,
(20) 
where . Substituting (14) into (15c), we obtain,
(21) 
Finally, Problem (15) can be converted into the following convex optimization problem,
(22a)  
subject to,  
(22b) 
The problem (22) can be easily solved by a convex optimization solver such as Mosek [18] to obtain . It follows from (11) that the control sequence of the sampled trajectory is . We can then rollout the sampled trajectories using and the dynamical model (1b). The complete CCMPPI algorithm is detailed in Section V.
V The CCMPPI Algorithm
The CCMPPI algorithm is given in Algorithm 1. Line 1
obtains the current estimate of the state
at the beginning of the current optimization iteration. Lines 1 to 1 rollout the reference trajectory using the discretetime nonlinear dynamical model . Line 1 linearizes the model along and its corresponding control sequence as described in (7), (8), (9), (10), and calculates the augmented dynamical model matrices , , along with the linearization residual term . Line 1 computes the feedback gain for the closedloop system in (14) by solving the convex optimization problem in (22). Lines 1 to 1 sample the control sequences, perform the rollouts and evaluate the sampled trajectories with the running cost (3). Specifically, lines 1 to 1 introduce sample trajectories of the closeloop dynamics and sample trajectories of zeromean input, so that the algorithm can balance between smoothness of trajectories and low control cost [26]. Line 1 computes the optimal control sequence following (4), (5) and (6). Line 1 sends the first control command of the optimal control sequence to the actuators. Line 1 removes the executed command , and duplicates at the end of the horizon for .Vi Results
In this section, we show via a series of numerical examples that the CCMPPI algorithm outperforms the baseline MPPI algorithm in critical situations described in Section II. The terminal covariance in (22b) for the CCMPPI should be determined based on the environment, and we can train a policy to compute . The design of such a policy is out of the scope of this paper.
Via Vehicle Model
We assume the injected artificial noise in CCMPPI and MPPI algorithms are significantly greater than the noise of the vehicle model, such that the model noise is negligible. We model the vehicle using a singletrack bicycle model
(23a)  
(23b)  
(23c)  
(23d) 
where and the parameters , are distances from the COM to the rear and front wheels, respectively. The , are position coordinates of a fixed world coordinate frame. The is the vehicle yaw angle, and is velocity at COM with respect to the world coordinate frame. The and are throttle and steering inputs to the model, respectively. We discretize the system (23) with the Euler method, , where and time step .
ViB Controller Setup
Assuming that the model noise is significantly smaller than the injected noise , thus the term in (3) is negligible [29]. It follows from (3) that the MPPI and the CCMPPI running cost for the step of the sampled trajectory takes the form,
(24) 
for , where we take the statedependent cost as,
(25) 
The term in (25) is the boundary cost which prevents the vehicle from leaving the track, and it is given by
(26) 
The term in (25) penalizes collisions with obstacles, where is a weighting coefficient. We choose two different forms of in our simulations. The first is discontinuous on the obstacles’ edges,
(27) 
and the second is continuous on the obstacles’ edges,
(28) 
where describes the distance from the vehicle’s COM to the center of the circular obstacle, and is the radius of the obstacle. We take m for all of the circular obstacles in this section. In our simulations, the terminal cost for the MPPI and CCMPPI controllers has the form,
(29) 
The first term in (29) is the progress cost, where is a weighting coefficient and represents the distance between the current vehicle state and the terminal state of the sample trajectory along the track centerline.
The second term in (29) represents the vehicle’s lateral deviation from the track centerline. For both the MPPI and the CCMPPI controllers, we set the control horizon to , the inverse temperature [26] to , the number of sampled trajectories at each iteration to , the portion of uncontrolled sample trajectories to , and the control cost matrix to .
The parameter values discussed here are shared by all the controller setups in the simulations of this section.
ViC Planning in Fastchanging Environment: Unpredictable Obstacles
This experiment tests the CCMPPI controller’s ability to respond to emergencies owing to unpredictable appearance of obstacles. We test the CCMPPI and benchmark its performance against a baseline MPPI controller in an environment where an obstacle suddenly appears in the traveling direction of an autonomous vehicle. In this simulation, the CCMPPI and MPPI controllers have injected noises of the same covariance , same weighting coefficients and in their trajectory costs. Figure 2 demonstrates that the MPPI controller fails to find a feasible solution and results in a collision with the obstacle. Figure 2 further shows that the CCMPPI has a more effective trajectory sampling distribution strategy, which leads the vehicle to take a feasible trajectory that avoids collision.
ViD Aggressive Driving in Cluttered Environment
To further examine the performance of the CCMPPI controller in more complicated environments, we run simulations using a CCMPPI controller and an MPPI controller on a race track environment with obstacles densely scattered on the track. The track has a constant width of 0.6 m, the centerline has a length of 10.9 m and each turn of the centerline has radius 0.3 m. Each obstacle has radius 0.1 m and uses the continuous obstacle cost (28). The simulations in this section were set so that both controllers achieve minimum lap time while avoid collisions with cluttered obstacles. Both the CCMPPI and the MPPI controllers have the same covariance for their injected noises. We then perform a grid search by varying the cost weight in (25) which corresponds to avoiding collisions with obstacles, and the cost weight in (29) for optimizing the vehicle velocity along the track centerline. Table I shows the grid search parameters. Figure 4 presents the results of the grid search in a scatter plot showing the distribution of lap times and number of collisions. Table II summarizes the grid search.
Cost Parameter  Min  Max  Interval 

75  450  37.5  
1.65  2.97  0.33 
We define a collision as a situation where the vehicle state overlaps with an obstacle. We further define a failure to be the situation when the vehicle comes to a complete stop, or when the vehicle is too far away from the track centerline ( m). If the vehicle finishes laps without a failure, the simulation is considered a success. Figure 4 shows that the data points corresponding to CCMPPI occupy the bottom part of the scatter plot, which indicates that the CCMPPI generates trajectories that are significantly faster than those by the MPPI controller. Table II shows that the CCMPPI achieves smaller average lap time, fewer collisions and higher success rate than the MPPI in simulations. Moreover, the two data points in the red circles in Figure 4 are produced by MPPI and CCMPPI with the same set of and values, and Figure 3 visualizes the trajectories that correspond to these two data points. We see that the CCMPPI generates a driving maneuver that is more aggressive than the MPPI, which helps explain why the CCMPPI achieves a significantly smaller average lap time.
The performance of CCMPPI, however, comes with an increased computational overhead. Using our implementation, the CCMPPI controller runs at 13Hz, while the MPPI controller runs at 97Hz. All simulations were done on a desktop computer equipped with an i9 3.5GHz CPU, and an RTX3090 GPU. The main computational bottleneck of CCMPPI is the computation of the feedback gain at each iteration. Possible remedies include updating the feedback gain less frequently, computing the feedback gains offline and storing them in a lookup table, or using a faster, dedicated convex optimization solver that is more suitable for realtime implementation [14, 15, 6].
Controller  Avg. laptime(s)  No. collision/lap  Success rate 

CCMPPI  4.20  160.52  98.18% 
MPPI  6.44  171.38  30.78% 
Vii Conclusions And Future Work
We have proposed the CovarianceControlled Model Predictive Path Integral (CCMPPI) algorithm that incorperates covariance steering within the MPPI algorithm. The CCMPPI algorithm has adjustable trajectory sampling distributions which can be tuned by changing the terminal covariance constraint in (1d) and the covariance of the injected noise in (1c), which makes it more flexible and robust than the MPPI algorithm. In the simulations, we showed that the CCMPPI explores the environment and samples trajectories more efficiently than MPPI for the same level of exploration noise (). This results to the vehicle responding faster to unpredictable obstacles and avoid collisions in a cluttered environment than MPPI. The CCMPPI performance can be further improved if and are tuned synchronously based on the information of the robot’s surrounding environment.
In the future, we can design a policy to choose judiciously the terminal covariance constraint and the injected noise covariance onthefly. The policy should evaluate the environment and assign , for the CCMPPI controller, such that the trajectory sampling distribution of the controller can be tailored to carry out informed and efficient sampling in any environment.
References
 [1] (2021) Covariance steering of discretetime stochastic linear systems based on Wasserstein distance terminal cost. IEEE Control Systems Letters 5 (6), pp. 2000–2005. External Links: Document Cited by: §I.
 [2] (2009) Scenariobased model predictive control of stochastic constrained linear systems. In Proceedings of the 48h IEEE Conference on Decision and Control, Shanghai, China, pp. 6333–6338. Note: held jointly with the 28th Chinese Control Conference External Links: Document Cited by: §I.
 [3] (2013) Robust model predictive control via scenario optimization. IEEE Transactions on Automatic Control 58 (1), pp. 219–224. External Links: Document Cited by: §I.
 [4] (2009) Probabilistic tubes in linear stochastic model predictive control. Systems & Control Letters 58, pp. 747–753. External Links: Document Cited by: §I.
 [5] (2009) Probabilistic constrained MPC for multiplicative and additive stochastic uncertainty. IEEE Transactions on Automatic Control 54 (7), pp. 1626–1632. External Links: Document Cited by: §I.
 [6] (2014) Automated custom code generation for embedded, realtime second order cone programming. IFAC Proceedings Volumes 47 (3), pp. 1605–1612. Cited by: §VID.
 [7] (2007) A linear time varying model predictive control approach to the integrated vehicle dynamics control problem in autonomous systems. IEEE Conference on Decision and Control (), pp. 2980–2985. External Links: Document Cited by: §IVA.
 [8] (201608) Stochastic linear model predictive control with chance constraints – a review. Journal of Process Control 44, pp. 53–67. External Links: Document Cited by: §I.
 [9] (2017) Finitehorizon covariance control of linear timevarying systems. In 56th IEEE Conference on Decision and Control (CDC), Vol. , pp. 3606–3611. External Links: Document Cited by: §I.
 [10] (2009) Collisionprobability constrained prm for a manipulator with base pose uncertainty. In IEEE/RSJ International Conference on Intelligent Robots and Systems, Vol. , pp. 1426–1432. External Links: Document Cited by: §I.
 [11] (2006) A survey on robust model predictive control from 19992006. In International Conference on Computational Inteligence for Modelling Control and Automation and International Conference on Intelligent Agents Web Technologies and International Commerce (CIMCA’06), Vol. , pp. 207–207. External Links: Document Cited by: §I.
 [12] (2009) Stochastic mobilitybased path planning in uncertain environments. In IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 1183–1189. External Links: Document Cited by: §I.
 [13] (2017) Racing miniature cars: enhancing performance using stochastic MPC and disturbance feedback. In American Control Conference, Seattle, WA, pp. 5642–5647. External Links: Document Cited by: §I.
 [14] (2018) A tutorial on realtime convex optimization based guidance and control for aerospace applications. In Annual American Control Conference (ACC), Vol. , pp. 2410–2416. External Links: Document Cited by: §VID.
 [15] (2012) CVXGEN: a code generator for embedded convex optimization. Optimization and Engineering 13 (1), pp. 1–27. Cited by: §VID.
 [16] (2007) Particle RRT for path planning with uncertainty. In Proceedings 2007 IEEE International Conference on Robotics and Automation, Vol. , pp. 1617–1624. External Links: Document Cited by: §I.
 [17] (2016) Stochastic model predictive control: an overview and perspectives for future research. IEEE Control Systems Magazine 36 (6), pp. 30–44. External Links: Document Cited by: §I.
 [18] (2017) MOSEK aps, the MOSEK optimization toolbox for MATLAB manual. version 8.1.. External Links: Link Cited by: §IVB.
 [19] (2018) Optimal covariance control for stochastic systems under chance constraints. IEEE Control Systems Letters 2 (2), pp. 266–271. External Links: Document Cited by: §I, §IVB.
 [20] (2019) Optimal stochastic vehicle path planning using covariance steering. IEEE Robotics and Automation Letters 4 (3), pp. 2276–2281. External Links: Document Cited by: §I, §IVB.
 [21] (2019) Stochastic model predictive control for constrained linear systems using optimal covariance steering. Note: arXiv:1905.13296 External Links: 1905.13296 Cited by: §I, §IVB.
 [22] (2006Nov.) Safe path planning in an uncertainconfiguration space using RRT. In IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5376 – 5381. External Links: Document Cited by: §I.
 [23] (2020) L1adaptive MPPI architecture for robust and agile control of multirotors. In IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 7661–7666. External Links: Document Cited by: §I.
 [24] (2020) Learning how to autonomously race a car: a predictive control approach. IEEE Transactions on Control Systems Technology 28 (6), pp. 2713–2719. External Links: Document Cited by: §I.
 [25] (201702) Toward an algorithmic control theory. Journal of Guidance, Control, and Dynamics 40, pp. 1–3. External Links: Document Cited by: §I.
 [26] (2018) Informationtheoretic model predictive control: theory and applications to autonomous driving. IEEE Transactions on Robotics 34 (6), pp. 1603–1622. External Links: Document Cited by: §I, §III, §V, §VIB.
 [27] (2018) Robust sampling based model predictive control with sparse objective information. In Robotics: Science and Systems, Pittsburgh, PA, pp. 42–51. External Links: Document Cited by: Fig. 1, §I.

[28]
(2017)
Information theoretic MPC for modelbased reinforcement learning
. In IEEE International Conference on Robotics and Automation (ICRA), Singapore, pp. 1714–1721. External Links: Document Cited by: §I.  [29] (2017) Model predictive path integral control: from theory to parallel computation. Journal of Guidance, Control, and Dynamics 40 (2), pp. 344–357. External Links: Document, https://doi.org/10.2514/1.G001921 Cited by: §I, §III, §VIB.
 [30] (2017) Autonomous racing with autorally vehicles and differential games. Note: ArXiv:1707.04540 Cited by: §III.
Comments
There are no comments yet.