# Reactive Navigation under Non-Parametric Uncertainty through Hilbert Space Embedding of Probabilistic Velocity Obstacles

The probabilistic velocity obstacle (PVO) extends the concept of velocity obstacle (VO) to work in uncertain dynamic environments. In this paper, we show how a robust model predictive control (MPC) with PVO constraints under non-parametric uncertainty can be made computationally tractable. At the core of our formulation is a novel yet simple interpretation of our robust MPC as a problem of matching the distribution of PVO with a certain desired distribution. To this end, we propose two methods. Our first baseline method is based on approximating the distribution of PVO with a Gaussian Mixture Model (GMM) and subsequently performing distribution matching using Kullback Leibler (KL) divergence metric. Our second formulation is based on the possibility of representing arbitrary distributions as functions in Reproducing Kernel Hilbert Space (RKHS). We use this foundation to interpret our robust MPC as a problem of minimizing the distance between the desired distribution and the distribution of the PVO in the RKHS. Both the RKHS and GMM based formulation can work with any uncertainty distribution and thus allowing us to relax the prevalent Gaussian assumption in the existing works. We validate our formulation by taking an example of 2D navigation of quadrotors with a realistic noise model for perception and ego-motion uncertainty. In particular, we present a systematic comparison between the GMM and the RKHS approach and show that while both approaches can produce safe trajectories, the former is highly conservative and leads to poor tracking and control costs. Furthermore, RKHS based approach gives better computational times that are up to one order of magnitude lesser than the computation time of the GMM based approach.

## Authors

• 2 publications
• 3 publications
• 4 publications
• 15 publications
• 51 publications
• 154 publications
• ### Solving Chance Constrained Optimization under Non-Parametric Uncertainty Through Hilbert Space Embedding

In this paper, we present an efficient algorithm for solving a class of ...
11/22/2018 ∙ by Bharath Gopalakrishnan, et al. ∙ 0

• ### Safe and Fast Tracking Control on a Robot Manipulator: Robust MPC and Neural Network Control

Fast feedback control and safety guarantees are essential in modern robo...
12/22/2019 ∙ by Julian Nubert, et al. ∙ 0

• ### Model Predictive Control for Autonomous Driving considering Actuator Dynamics

In this paper, we propose a new model predictive control (MPC) formulati...
03/09/2018 ∙ by Mithun Babu, et al. ∙ 0

• ### Deep Learning Tubes for Tube MPC

Learning-based control aims to construct models of a system to use for p...
02/05/2020 ∙ by David D. Fan, et al. ∙ 0

• ### Path Integral Control by Reproducing Kernel Hilbert Space Embedding

We present an embedding of stochastic optimal control problems, of the s...
08/13/2012 ∙ by Konrad Rawlik, et al. ∙ 0

• ### On the Dual Implementation of Collision-Avoidance Constraints in Path-Following MPC for Underactuated Surface Vessels

A path-following collision-avoidance model predictive control (MPC) meth...
03/10/2021 ∙ by Simon Helling, et al. ∙ 0

• ### Distribution System Voltage Control under Uncertainties

Voltage control plays an important role in the operation of electricity ...
04/28/2017 ∙ by Pan Li, et al. ∙ 0

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## I Introduction

Reactive navigation in real-world environments requires robots to consider perception and ego-motion uncertainty explicitly while computing control inputs to ensure robust collision avoidance. Incorporating only the mean information of the uncertainty often turns to be inadequate. In contrast, bounding volume approaches like [7], [17] are simple and induce robustness but tend to be overly conservative [4]. Under uncertainty, we would typically like to compute low cost control inputs (based on some metric) and at the same time, ensure some upper bound on the risk of collision. In this paper, we formulate these requirements in a robust Model Predictive Control (MPC) framework, wherein the robustness stems from the constraints imposed by Probabilistic Velocity Obstacle (PVO) [8]. Essentially, PVOs are chance constraints defined over the deterministic VO [3]. Thus, a robust MPC formulation is, in fact, an instance of chance-constrained optimization (CCO). Recently CCO has been used as a general template for developing numerous navigation algorithms for a wide class of robots, ranging from aerial vehicles [9] to autonomous cars [10]. Although CCO provides a rigorous template for decision making under uncertainty, they are, in general, computationally intractable. In fact, under non-Gaussian or non-parametric uncertainty, it is difficult to even compute an analytical expression for the chance constraints (such as PVO in our case).

### I-a Contributions and Overview of the Proposed Approach

Our primary motivation in this paper is to reformulate CCO as a more tractable problem without making any assumption on the parametric form of the underlying uncertainty and/or resorting to any linearization of the constraints. To this end, we show that our robust MPC or CCO, in general, can be interpreted as a problem of distribution matching 111Although conceptually simple, to the best of our knowledge, there are no other works based on this interpretation.. To be precise, we construct a certain desired distribution and ensure that the distribution of the PVO matches the desired distribution by choosing low-cost control inputs. We present two methods to accomplish this. Our first baseline method builds upon existing works like [14], [2]

, and is based on approximating the desired distribution and PVO as a GMM and subsequently sampling control inputs to minimize the KL divergence between the two. Although this approach produces trajectories with a high probability of collision avoidance, it is computationally expensive and leads to conservative results with high tracking error and control costs.

Our second method is inspired by the result that local similarity of two given distributions can be bounded through their higher order moments

[11]. To accomplish moment matching in a tractable manner, we embed our desired distribution and PVO in Reproducing Kernel Hilbert Space (RKHS)[15], [16] wherein evaluating the similarity of higher-order moments of two given distribution can be performed through Kernel trick. The RKHS based approach provides following advantages

• It takes the form of a standard non-linear regression problem that can be solved easily through any gradient-based optimizer. Using the kernel trick, the evaluation of the cost or gradients reduces to simple matrix multiplication that can be accelerated using GPUs for large matrix sizes.

• The RKHS based approach provides an additional tuning parameter for trading off cost and robustness. The parameter has a clear physical interpretation and thus easier to choose.

• The RKHS based approach leads to substantially lower tracking cost and control cost than GMM, which in turn translates to lower time of traversal to the goal. The former also requires up to 8 times less computation time since it by-passes the need for performing expensive GMM fits.

## Ii Problem Formulation

Our Approach assumes a robot operating in 2D space, although extending the method to 3D is straightforward. Some of the essential symbols and notations are summarized in Table I

. We also define some notations in the first place of their use. We use regular faced lower case letters to represent scalars, while boldface variants would represent vectors. Upper faced letters represent the matrices.

### Ii-a Ego-motion uncertainty

We model the state at any time

as a random variable with an unknown probability distribution. The state evolves according to the following stochastic linear motion model:

 ξt=Aξt−1+B(ut−1+δt−1), (1)

where, is a random variable with unknown probability distribution that acts as a perturbation to the control input . We assume that we do not have access to the probability distribution of but we have a black-box process model which can generate different instances of . Now, given samples of at , we can adopt a particle filter like approach and motion model (1) to propagate uncertainty over time. Note that since we don’t know the probability distribution of and

, a Kalman Filter like approach is untenable even for a linear system like (

1).

### Ii-B Perception uncertainty

We assume that we have access to a predicted trajectory for each dynamic obstacle in the environment. Consequently, at any time instant , we have access to the state of the obstacle . We model perception uncertainty by treating as random variables with unknown probability distributions. As with ego-motion uncertainty, we assume that we can have access to samples of . For example, this can come from a separate particle filter which tracks obstacle state over time and is initialized with samples of at .

### Ii-C Velocity Obstacle (VO)

If both the robot and the obstacles are modeled as circular disks, VO is defined by the following set of equations:

 fjt(.)≤0:(rTjvj)2∥vj∥2−∥rj∥2+(ojR+R)2≤0.,∀j, (2)
 rj=[xt−ojxtyt−ojyt],vi=[˙xt−oj˙xt˙yt−oj˙yt],

where, , are the radii of the circular disks representing

the shapes of the robot and obstacle respectively.

### Ii-D Probabilistic Velocity Obstacle (PVO)

If the state of the robot evolves according to (1), then, given the current robot state , the predicted next instant obstacle state , and the perturbation , VO (2) can be represented as an explicit function of the control input.

 fjt(.)=h1(wt−1,ojξt)(uxt−1)2+h2(wt−1,ojξt)uxt−1uyt−1 +h3(wt−1,ojξt)(uyt−1)2+h4(wt−1,ojξt)uxt−1 +h5(wt−1,ojξt)uyt−1+h6(wt−1,ojξt), (3)

where222The concatenation of and to obtain is motivated by how ego-motion uncertainty is evolved using a particle filter like approach. At time , we independently sample samples of and to compute samples of , , are the components of . The functions and depend on random variables and, consequently, are random variables with some unknown probability distributions themselves. To model collision-free velocities under uncertainty, we need to formulate a probabilistic variant of VO. As mentioned earlier, PVOs are essentially chance constraints defined over VO, and thus we have where represents probability. Essentially, PVO ensures that the VO constraints are satisfied with some lower bound probability, . Note that our definition of PVO stems from [8] but is written in a slightly different form to highlight it as a chance constraint explicitly. For future use, we use the notation to denote the distribution of , parameterized in terms of control input for given random variables .

### Ii-E Robust MPC

We formulate the problem of reactive navigation in terms of the following robust MPC.

 argminut−1J(ut−1)=∥¯¯ξt−ξdt∥22+∥ut−1∥22 (4a) P(fjt(wt−1,ojξt,ut−1)≤0)≥η (4b) ut−1∈C (4c)

where represents the mean of and is used to keep the cost function deterministic333A more rigorous approach would be to define the cost function using the expectation operator. However, the formulation based on the mean state is simpler and acts as a lower bound for the cost based on the expectation operator. [1]. The first term in the cost function makes the mean state track some desired state , which in turn could be defined based on some desired trajectory for the robot. The second term penalizes the usage of control input with a high magnitude. The inequality (4b) ensures the probabilistic safety through the formulation of PVO. in (4c) models the feasible set of the control commands based on velocity and acceleration bounds and is assumed to be convex.

Optimizations of the form (4a)-(4c) are known as chance-constrained optimizations (CCO). Its computational complexity stems from the presence of the chance constraints (4b). As shown in (3), are highly non-linear functions of both control inputs and the random variables . In such a case, it is difficult to even compute an analytical expression for . In the next section, we present our main theoretical results, which allow reformulating (4a)-(4c) into a tractable, non-linear optimization problem while retaining low sample complexity.

## Iii Main Results

### Iii-a Intuitive interpretation of robust MPC

At an intuitive level, optimization (4a)-(4c) can be interpreted as a problem of ensuring that has such a shape that a specific portion of its mass lies to the left of (refer to Fig. 1). The chance constraint probability has a direct correlation with the amount of mass lying to the left of . A larger mass amounts to a higher . Given random variables , the distribution is parametrized by and therefore can be used to manipulate the shape of . However, each choice of incurs a cost in terms of deviation from the desired trajectory and magnitude of the control inputs.

### Iii-B Desired Distribution

Our goal is to ensure that achieves an appropriate shape. To this end, our desired distribution acts as a benchmark for ; in other words, a distribution that it should resemble as closely as possible for an appropriately chosen . We formalize the notion of desired distribution with the help of the following definitions:

###### Definition 1.

The nominal control input refers to any solution of the optimization (4a)-(4c) associated with a cost in the vicinity of that obtained in the deterministic setting.

###### Definition 2.

Let , be random variables that represent the same entity as and but belonging to a different distributions , , respectively. Further, when , , then . In such a case, is called the desired distribution if the following holds for some given nominal solution .

 P(fjt(˜wt−1,˜ojξt,unomt−1)≤0)≈1.0,˜wt−1∼Pdeswt−1,˜ojξt,∼Pdesojξt (5)

Essentially, equation (5) implies that if the uncertainty belongs to the distribution, , , then the shape and consequently, the entire mass of the distribution of can be manipulated to lie almost completely to the left of by choosing . This setting represents an ideal case because we have constructed uncertainties so appropriately that we can manipulate the distribution of the chance constraints while incurring a nominal cost. Moving forward, we assume that we have access to the samples , from the distribution , respectively and using it we can construct a desired distribution . We lay out the exact process of achieving this in Appendix.

### Iii-C Reformulating (4a)-(4c)

From Def.2 it can be deduced that as we make the distribution of chance constraints (PVO) more and more similar to , we ensure that more and more mass of gets shifted to the left of . As a result, we satisfy the chance constraints (4b) with higher . Building on these insights, we present the following reformulation of (4a)-(4c).

 argminJ(ut−1)+ρLdist(Pfjt(ut−1),Pdesfjt) (6a) ut−1∈C, (6b)

where, is a cost which measures some notion of similarity between the and . It should be noted that the chance constraint probability is not explicitly present in the above optimization. Rather, it is accommodated implicitly through the residual of . Lower the residual, higher the . The residual in turn can be controlled through the weight .

We solve (6a)-(6b) through a control sampling approach. We discretize the feasible set and evaluate the for each control sample from the feasible set. Subsequently, we choose the control input which results in minimum value of the cost (6a). The primary complexity of our control sampling approach depends on the form of or more precisely on the computational ease with which we can evaluate for a given . In the next couple of sections, we present two choices for it based on KL divergence and RKHS embedding.

### Iii-D GMM-KL Divergence (GMM-KLD) Based Approach

KL divergence is extensively used to measure how a given probability distribution is different from another. In other words, KL divergence is a valid choice for . The primary challenge in using KL divergence as a choice for is that we do not have access to the the probability densities associated with and . To address this problem, we approximate and at any time using GMMs. It should be noted that is parametrized in terms of control and changes for every .

At a given time step, , we first obtain samples from the desired distribution and subsequently fit a GMM over it. Then, for a given sampled control, we generate different samples of using the sample from and . We then fit GMM over it which now acts as an approximation of for the given sampled control. Using the GMM fit for and we can then compute the KL divergence. This process is repeated for all the sampled control input and the one which minimizes is chosen as the solution of (6a)-(6b).

### Iii-E RKHS based Approach

#### Iii-E1 Formulating Ldist in terms of Moment Matching

One of the vital building blocks of our approach based on RKHS embedding is the following theorem from [11].

###### Theorem 1.

, ,

where, refers to the order up to which the moments of and are similar. The above theorem suggests that the difference between two distributions can be bounded by a non-negative function , which decreases with increasing order of moment . The authors in [11] also show that this bound is particularly tight near the tail end of the distribution. Theorem 1 provides a way of ensuring local similarity between two distributions through moment matching. In other words, moment matching is a valid choice for

###### Remark 1.

For our purpose local similarity is sufficient since as we make the tail of and similar by matching higher order moments, we ensure that more and more of the mass of also gets shifted to the left of . This in turn leads to the satisfaction of chance constraints (4b) with a higher .

###### Remark 2.

The moment order can directly control the extent of local similarity and consequently acts as a surrogate for . Higher leads to higher (see Fig 4(a), 5(a)). Furthermore, only takes non-negative integer values. On the other hand, tuning GMM based solution is difficult as it relies on manipulating in (6a) to prioritize minimizing MMD over primary cost . However, unlike , the parameter does not have a statistical interpretation and thus it is difficult to ascertain whether a numerical change in will lead to any change in at all.

#### Iii-E2 Moment Matching through RKHS Embedding

To the best of our knowledge, there is no mapping that directly quantifies the similarity between the first moments of two given distributions. However, a workaround can be devised based on the concept of embedding distributions in Reproducing Kernel Hilbert Space (RKHS) and Maximum Mean Discrepancy (MMD) distance. The key ideas here are based on the results from [15], [16].

Let represent the RKHS embedding of , given by the following equations:

 μPfjt(ut−1)=p=n∑p=1q=n∑q=1αpβqk(fjt(wpt−1,ojξqt,ut−1),.) (7)
 μPdesfjt=p=nr∑p=1q=no∑q=1λpφqk(fjt(˜wpt−1,˜ojξqt,unomt−1),.), (8)

where, is a positive definite function called the kernel. To exploit Theorem 2, we use the polynomial kernel of order defined in (9) defined for some arbitrary vectors .

 k(x1,x2)=(1+xT1x2)d. (9)

Note that the in (9) is same as that used in Theorem 1. The constants play a vital role in the RKHS embedding, and we discuss them towards the end of this section. Now, consider the following theorem based on [16], ([18], pp-15) assuming that the RKHS embedding is constructed through polynomial kernel of order .

###### Theorem 2.

If , then moments of and up to order become similar.

That is, decreasing the residual of MMD distance becomes a way of matching the first moments of the distribution and . Theorem 2 allows us to make the following choice

 Ldist=∥μPfjt(ut−1)−μPdesfjt∥. (10)

The MMD (10) can be expanded in the following manner

 ∥μPfjt(ut−1)−μPdesfjt∥2=⟨μPfjt(ut−1),μPfjt(ut−1)⟩ −2⟨μPfjt(ut−1),μPdesfjt⟩+⟨μPdesfjt,μPdesfjt⟩ (11)

For a given , each of the terms in (11) is an inner product of two functions that are linear combinations of kernel functions and thus can be easily simplified using the so-called ”kernel trick”. These inner products give rise to large matrices, and hence easily parallelized using GPU.

### Iii-F Reduced Set Methods

Let and represent i.i.d samples of , respectively. Further, let and represent a subset (reduced set) of the i.i.d samples. It is implied that . Now, intuitively, a reduced set method would re-weight the importance of each sample from the reduced set such that it would retain as much information as possible from the original i.i.d samples. The weights associated with are computed through the following optimization and are then used to compute the RKHS embedding in (7). The same process can also be used to compute .

 αp=argminαp∥1Ni=N∑i=1k(^wit−1,.)−p=n∑p=1αpk(wpt−1,.)∥2,∑αp=1 (12)

### Iii-G Performance Guarantees

Both our GMM-KLD and RKHS based approaches work with only sample-level information without assuming any parametric form for the underlying distribution. Thus, the performance guarantees on collision avoidance depend on the following aspects. First, on how well are we modeling the distribution of our collision avoidance function (PVO) for a finite sample size. Second, on whether our modeling improves as the sample size increase: a property popularly known as consistency in estimation. Finally, on whether we can tune our model to produce diverse trajectories with different probability of avoidance. Remark

2 already addresses the third question. Moreover, the first two questions about GMM fit of distributions have already been established in the existing literature [6], [20]. Thus, in this subsection, we focus on the first two questions regarding our RKHS based approach.

To show the consistency of (7), we compute a ground truth embedding in the following manner:

 ¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯μf(wt−1,ojξt,ut−1)=(1/l2)l∑i=1l∑j=1k(f(wt−1i,ojξtj,ut−1),.), (13)

where, is similar to except that the former is computed over a larger sample size . That is, . We can analyze the consistency by constructing the following error function from [15],[16] for a fixed value of .

 L=∥μf(wt−1,ojξt,ut−1)−¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯μf(wt−1,ojξt,ut−1)∥22, (14)

and analyzing its behavior for increasing value of . The results are summarized in Fig.2. As shown, for moment order in the polynomial kernel function, we get very low error for a sample size as low as 40.

## Iv Simulation and Validation

Additional results, realtime experiments and detailed derivation of the kernel matrices along with the implementation details are provided in the supplimentary material.

We used our RKHS based algorithm to simulate the navigation of quadrotor in the X-Y plane amongst dynamic obstacles. We approximate the motion model as a double integrator system, wherein we subject the acceleration control inputs to random perturbations. To showcase the potential of our algorithm, we used non-Gaussian error distributions for the positions, velocities, and accelerations of the robot and obstacle as applicable. Fig. 3 illustrates the error distributions used. 50 samples of and were used to evaluate both RKHS and GMM based approach. We also used a set of 50 control samples to search for an optimal solution of (6a)-(6b).

#### Iv-1 Evaluation of the RKHS based Approach for one obstacle benchmark

The solution process and results are summarized in the results shown in the Fig. 4. As described previously, the solution process starts with the construction of the desired distribution . Subsequently, we ensure that the distribution of is similar to (atleast near the tail end) by choosing an appropriate and degree of the polynomial kernel . This is highlighted in Fig. 4(c), which plots the distribution of 555Kernel density estimators are used to plot the distributions. for different degrees, , of the polynomial kernel. It is clearly shown that as increases, the distributions and become more alike (at least near the tail end). In this figure is highlighted in a grey shade, while the PDF of for are outlined in blue, orange and green respectively. The increase in the similarity between these two distributions can be directly correlated with the actual collision avoidance shown in Fig. 4(a). An increased value of results in lesser overlap between the samples of the robot and obstacle, thus resulting in higher value of . Fig. 4(b) shows that the control cost () and tracking error () increases at higher values of , thus making maneuvers more conservative. The top row of figure 4(a), shows progressively conservative avoidance with increase in , while the bottom row shows that the progress to the goal is slowed with increase in , due to such conservative maneuvers. The maneuver for , resembles closely the maneuver due to GMM-KLD.

#### Iv-2 Evaluation of the GMM-KL Divergence Approach

For the same one obstacle benchmark, we evaluate the GMM-KL Divergence approach according to section III-D. In short, we select the control, , that minimizes the cost function given in (4a) by choosing as KL divergence. The results are shown in Fig. 4(c). It is evident that matches the shape of , thus satisfying one of the basic objectives of minimizing a KL Divergence scheme. However, as mentioned in Remark 2, it is difficult to tune the GMM-KLD approach to produce diverse trajectories. For this specific result, we obtained a solution for which . Thus, the trajectories have a high safety factor but at the same time have higher tracking error and control cost.

### Iv-a Two obstacle benchmark

Here we extend the proposed schemes to a two obstacle benchmark, where similar trends as obtained in section IV-1 are observed. Fig. 5(a) shows that an increased value of results in higher value of but at the same time leads to higher tracking errors ( and control costs (refer Fig. 5(b)).

### Iv-B Comparative results based on Computational Time

The table II shows the computation time for RKHS and GMM-KLD based approaches on an Intel i7,4770 laptop. The GMM-KLD approach takes a computational time of 1.24s, while the RKHS based approach takes 0.17s to achieve a similar . The reduced computation time of the later stems primarily from two reasons. (i) Unlike GMM-KLD approach, RKHS based approach avoids the need to first fit a parametric form to the distribution of PVO and the desired distribution and then compute collision avoidance control. (ii) Furthermore, we exploit the kernel trick to reduce the MMD computation in it to just matrix multiplication, which can be efficiently parallelized on GPUs (NVIDIA GTX 1050 in our case). Fig. 5(c) shows the difference in computation time between the two approaches increases with the number of obstacles. The time to compute maneuvers grows faster for the GMM-KLD approach with an increase in the number of obstacles, so much so for four obstacles, the GMM approach is 2.5s slower than the RKHS approach. Whereas for one obstacle scenario GMM is 1s slower.

### Iv-C Experimental Results

We implemented the RKHS framework on a Bebop drone equipped with a monocular camera. A person walking with an April tag marker in front of him constitutes the moving object. We compute the distance, velocity, and bearing to the marker using the April Tag library. We obtain the state and velocity of the drone from the onboard odometry. The state, velocity, control, and perception/measurement noise distributions were computed through a non-parametric Pearson fit over experimental data obtained over multiple runs.

We performed 15 experimental runs to evaluate the RKHS method, which successfully avoided the obstacle on 75% of the 15 runs. Whereas naive deterministic maneuvers avoided the moving target, only 40% of the 15 runs. Fig. 6 shows the snapshots of the experimental runs.

## V Conclusions and Future Work

In this paper, we formulated a robust MPC/CCO as a problem of distribution matching. We illustrated two approaches that give tractable solutions to optimization problem with complex chance constraints like PVO under non-parametric uncertainty. We first proposed a baseline method that approximates the distribution of the chance constraints with a GMM model and then proceed to perform distribution matching using KL divergence. Our second method is built on the possibility of embedding distributions in the Reproducing Kernel Hilbert Space (RKHS). We evaluated both the GMM-KLD and the RKHS based approaches for quality of maneuvers and computational time. The RKHS based approach results in lower tracking errors and control costs than the GMM-KLD based approach. Another distinct advantage of the former is the massive reduction in computational time as compared to the latter. To the best of the author’s knowledge, this is a first such work that brings the benefits of RKHS embedding to the domain of robust MPC/CCO. This has enormous potential benefits for computationally efficient motion planning and control under uncertainty.

Our algorithm can be further enhanced along the following lines. First, the cost function of robust MPC/CCO is assumed to be deterministic, i.e., they do not contain state and motion uncertainty. A straightforward way of rectifying this would be to formulate stochastic cost as constraints using some slack variables. We are also looking at more complex applications like multi-agent navigation, reinforcement learning.

Limitations: Our formulations can fail due to two reasons. First, there can be a failure in the construction of the desired distribution if no feasible solution exists for the optimization problem proposed in (16a)-(16c). Second, if the control sampling is not dense enough, and we miss some out on some feasible control inputs. We can handle the latter by increasing the resolution of the discretization of control inputs, although this would lead to an increase in computation time.

Extensions to multi-agent scenario: Our formulation can be easily extended to multiple decision-making agents. Essentially, instead of building our chance-constrained optimization and its reformulations over the VO constraint (2), we would construct its distributed multi-agent variant called reciprocal velocity obstacle [19]. The construction of the desired distribution would change accordingly. The remaining steps of our GMM or RKHS based formulation remain the same.

## Constructing the desired distribution

We now describe how distributions , and can be computed. While exact computations may be intractable, we provide a simple way of constructing an approximate estimate of these distributions. The basic procedure is as follows. Given samples of we construct two sets , containing samples of and samples of respectively. For clarity of exposition, we choose , to respectively identify samples from set , . Now, assume that the following holds.

 fjt(˜wpt−1,˜ojξqt,unomt−1)≤0,∀˜wpt−1∈C˜wt−1,˜ojξqt∈C˜ojξt (15)

By comparing (5) and (15) it can be inferred that the sets , are in fact sample approximations of the distributions and respectively. Furthermore, a set containing samples of can be taken as the sample approximation of the desired distribution .

One major issue is which samples of and samples of should be chosen to construct sets , . In particular, we need to ensure that the chosen samples indeed satisfy the assumption (15). To this end, we follow the following process. We arbitrarily choose samples of and samples of and correspondingly obtain a suitable as a solution to the following optimization problem.

 unomt−1=argminJ(ut−1) (16a) fjt(˜wpt−1,˜ojξqt,unomt−1)≤0,∀p=1,2..nr,q=1,2..no (16b) ut−1∈C (16c)

Note that satisfaction of (16b) ensures that the assumption (15) holds. Few points are worth noting about the above optimization. First, it is a deterministic problem whose complexity primarily depends on the algebraic nature of . Second, the desired distribution can always be constructed if we have access to sets , . The construction of these two sets is guaranteed as long as we can obtain a feasible solution to (16a)-(16c). Although it is difficult to provide solution guarantees, in our simulations and experimental runs, we have observed that the solution could always be obtained if there existed a collision avoiding control considering only the mean of the uncertainty. Finally, (16a)-(16c) is precisely the so-called scenario approximation for the robust MPC (4a)-(4c). Conventionally, scenario approximation is solved with a large (typically each ) in order to obtain a solution that satisfy chance constraints (4b) with a high (). In contrast, we use (16a)-(16c) to estimate the desired distribution and thus for our purpose, a small sample size in the range of proves to be sufficient in practice.

Supplementary Material

The supplementary material has detailed discussions on

• Derivations for expressing MMD (11) in terms of kernel matrices

• Guarantees on safety

• Computational aspects of the proposed approach

• Additional comparisons with state of the art methods

The videos for all the simulations and the real time runs can be found at

Test

## Derivations: Expressing MMD (11) in terms of Kernel matrices

### Prerequisites

#### Rkhs

RKHS is a Hilbert space with a positive definite function called the Kernel. Let, x denote an observation in physical space (say Euclidean). It is possible to embed this observation in the RKHS by defining the following kernel based function whose first argument is always fixed at x.

 ϕ(x)=k(x,⋅) (17)

An attractive feature of RKHS is that it is endowed with an inner product, which in turn, can be used to model the distance between two functions in RKHS. Furthermore, the distance can be formulated in terms of the kernel function in the following manner.

 ⟨ϕ(xi),ϕ(xj)⟩=k(xi,xj) (18)

The equation (18) is called the ”kernel trick” and its strength lies in the fact that the inner product can be computed by only pointwise evaluation of the kernel function.

#### Distribution Embedding of PVO

Let be samples drawn from a distribution corresponding to the uncertainty of the robot’s state and control. Similarly, let be samples drawn from the distribution corresponding to the uncertainty in the state of the obstacle. It is important to note here that the parametric forms of these distributions need not be known. Both these distributions (of the robot and the obstacle) can be represented in the RKHS through a function called the kernel mean, which is described as

 μ[wt−1]=n∑p=1αpk(wpt−1,⋅) (19a) μ[ojξt]=n∑q=1βqk(ojξqt,⋅) (19b)

where is the weight associated with and is the weight associated with . For example, if the samples are , then .

Following [refer], equation (19a) can be used to embed functions of random variables like shown in equation (20), where is the VO constraint and are random variables with definitions taken from table I.

 μfjt(ut−1)=n∑p=1n∑q=1αpβqk(fjt(wpt−1,ojξqt,ut−1),⋅) (20)

Similarly, we can write the same for the samples for the desired distribution,

 nr∑p=1no∑q=1λpψqk(fjt(˜wpt−1,˜ojξqt,%unomt−1),⋅) (21)

where and are the constants obtained from a reduced method that is explained in the section III-F.

An important point to notice from above is that for given samples the kernel mean is dependent on variable . The rest of the material focuses on the detailed derivation of the used in our cost function to solve our robust MPC problem.

The proposed method has its foundations built by first posing the robust MPC as a moment matching problem (Theorem 1) and then describes a solution that is a workaround based on the concept of embedding distributions in RKHS and Maximum Mean Discrepancy (MMD). Further insights on how MMD can act as a surrogate to the moment matching problem are described in Theorem 2 of the paper. Theorem 2 suggests that if two distributions and , have their distributions embedded in RKHS for a polynomial kernel up to order , then decreasing the MMD distance becomes a way to match the first moments of these distributions.

Using these insights, an optimization problem given by (4a), (6a)-(6b) is proposed.

 argminρ1Ldist+ρ2J(