Consider the following optimization problem in terms of a scalar variable .
where, is a user defined cost function,
represents probability andis the constraint function which depends on the decision variable and uncertain parameters, . The dependence of on both and could possibly be highly non-linear and non-convex. The inequality (1b) can be generalized to include any number of uncertain parameters and multiple chance constraints. Further, multiple optimization variables can also be accommodated. However, for easier exposition, we first restrict our analysis to the simple case described above. Extensions to a more general case are straightforward and we discuss those later. The set represents the feasible space of and is assumed to be convex for simplicity. Optimizations such as (1a)-(1c) are called chance-constrained optimizations and are used extensively for decision making under uncertainty. In robotics and control applications, they form the backbone of the robust Model Predictive Control (MPC) frameworks. For example, see , , , .
At an intuitive level, chance-constrained optimizations can be interpreted as a problem of ensuring that a specific portion of the mass of the distribution lie to the left of (refer to Fig. 1). For given uncertain parameters , the distribution is parametrized by the decision variable and can therefore be used to manipulate the location of a specified portion of its mass. However, each choice of incurs a cost .
The chance constraint probability has a direct correlation with the amount of mass of the distribution lying to the left of . A Larger mass amounts to a higher .
Chance-constrained optimizations are known to be very difficult to solve. The complexity increases further even when the uncertainty is non-parametric, that is, the analytical functional form of the probability distribution ofare not known. Chance constraints are easy to solve when ,
are assumed to have a Gaussian distribution and the constraint functionis affine with respect to for given , , . However, in general, optimization problems where chance constraints are defined over non-linear and non-convex functions and the underlying uncertainty cannot be represented in any parametric form are known to be computationally intractable. Thus, various approximations and reformulations are proposed in existing literatures to tackle chance-constrained optimization problems.
|Chance constraint probability|
|Distribution of the chance constraints|
|sample of uncertain parameters|
|variant of the uncertain parameter .|
|Kernel Mean of the distribution|
|Kernel Mean of the distribution|
|Expectation of a function with respect to its random arguments|
|Variance of a function with respect to its random arguments|
A popular approximation called the scenario approach ,  starts with drawing samples (or scenarios) of from their distribution and then replaces (1b) with constraints of the form . The scenario approach has a very interesting set of pros and cons. On the one hand, it is conceptually simple and is applicable even when the parametric form of the distribution of uncertain parameters is not known and just their samples are given. On the other hand, the naive implementation of the scenario approach is known to be overly conservative. To be precise, the cost increases with , although the solution becomes more robust at the same time. Works like  provide algorithms for rejection sampling to reduce the conservativeness of the scenario approach.
An alternate class of approach relies on replacing chance constraints with a deterministic surrogate , , . For example, (2) represents the robust variant of the so called sample average approximation (SAA) , where, represents the samples of and represents an indicator function. The variable is similar but not necessarily the same as the chance constraint probability . A strong advantage of SAA (2) is that it provides a very tight approximation resulting in a low cost solution. However, (2) represents an extremely difficult non-smooth and non-convex constraint. Thus, the reformulated chance-constrained optimization itself becomes very difficult. Our experimentation has shown that it is possible to solve SAA based reformulations of single variable chance constrained optimization with an exhaustive search. However, such an approach is unlikely to scale to problems with multiple decision variables.
where, , represent the mean and variance of
, taken with respect to random variables. Using, Cantelli’s inequality, it can be shown that the satisfaction of (3) ensures that chance constraints are satisfied with . However, it should be noted that this bound can be rather loose. The attractive feature of (3) is that it is applicable for a wide class of chance constraints. However, its efficiency is predicated on how easy it is to compute analytical expressions for and . For example, if is highly non-linear or/and the parametric form of is not known, then computing an accurate analytical expression for and becomes a very challenging problem. A workaround has been proposed in works like ,   where the analytical expressions for and are approximated through Monte Carlo sampling. However we should note two key bottlenecks of such approaches . First, the sample complexity is poor and our experimentation shows that it usually requires around
samples to get to a reasonable approximation. Second, for a given sample size, it is difficult to estimate how well the surrogate constraints are approximated. This in turn, makes it difficult to accurately infer the feasibility of the original chance constraints.
In this paper, we present a novel approach built on the fact that any arbitrary distribution ( in our case) can be embedded as a function (or a point, refer to Fig. 2) in Reproducing Kernel Hilbert Space (RKHS). The embedded function is generally referred to as the Kernel Mean and thus RKHS embedding is also known as Kernel Mean Embedding (KME) in the existing literature . A few key advantages of RKHS embedding are worth pointing out. First, the embedding can be achieved even when the parametric form of the underlying uncertainty ( in our case) is not known. Second, the embedding only requires point-wise evaluations of and is thus not influenced by its algebraic complexity. Finally, it opens avenues for the use of established reduced set methods to achieve a good sample complexity. Intuitively, reduced set methods provides a systematic way of choosing a subset of samples while still retaining as much information as possible from the original sample size by re-weighting the importance of those samples.
In the current work, we build on the concept of RKHS embedding and put forward the following contributions:
We interpret chance-constrained optimization as a problem of matching higher order moments of two given distributions. The two distributions in consideration are the distribution of the constraint functions and a certain ”desired distribution”, which we show, can be systematically constructed, borrowing notions from scenario approximation. Although conceptually simple, to the best of our knowledge, there are no other works based on this interpretation.
We further reformulate moment matching as a problem of minimizing the distance between the RKHS embeddings of the constraint function and the desired distribution. We use the so called Kernel trick to show that the complexity of the RKHS embedding based reformulation is comparable to solving a deterministic variant of the chance-constrained optimization (1a-1c), obtained by replacing (1b) with a single deterministic constraint of the form . To be precise, if is polynomial in of order , then the reformulated problem is also a polynomial optimization problem of order .
We benchmark our formulation with the existing approaches based on two metrics : sample complexity and obtained optimal cost. In particular, we highlight the following results: First, we show that our formulation significantly outperforms scenario approximation in both the metrics. Second, our formulation and the SAA approach based on surrogate constraints (2) results in similar optimal cost. However, our formulation leads to a simpler optimization problem and enjoy better sample complexity. Finally, our formulation also outperforms approaches based on surrogate constraints (3).
We apply our formulation on two challenging motion planning/control problems. The first problem involves navigating a mobile robot in dynamic and uncertain environments. Herein, we consider noise arising from both perception and ego-motion during formulating the chance constraints for ensuring probabilistic collision avoidance. Our second problem implements a stochastic variant of inverse dynamics based path tracking for manipulators. We assume that the concerned manipulator has noise-less motions but noisy state estimation. Consequently, the manipulator should compute the necessary torque commands while considering the state estimation uncertainty to ensure that the probability of exerting a torque that violates the specified bounds is under some threshold. This requirement can be naturally put in the form of chance constraints.
Ii Embedding Distribution in RKHS
RKHS is a Hilbert space with a positive definite function called the Kernel. Let, x denote an observation in physical space (say Euclidean). It is possible to embed this observation in the RKHS by defining the following kernel based function whose first argument is always fixed at x.
An attractive feature of RKHS is that it is endowed with an inner product which, in turn, can be used to model the distance between two functions in RKHS. Furthermore, the distance can be formulated in terms of kernel function in the following manner
Equation (5) is called the ”kernel trick” and its strength lies in the fact that the inner product can be computed by only point wise evaluation of the kernel function.
Ii-B Distribution Embedding
The projection to RKHS can also be generalized to distributions. Let, , , ….. be samples drawn from an unknown probability distribution . This distribution can be represented in the RKHS through a function called the Kernel Mean, which is described in the following manner
where, is the weight associated with . For example, if the samples are i.i.d then, . The estimator (6) is consistent, i,e, the estimation improves as the number of samples increases.
Ii-C Maximum Mean Discrepancy (MMD)
Given two distributions , , MMD refers to the distance between their RKHS embeddings , . That is:
An important thing to note from (8) is how the kernel trick allows us to express MMD only in terms of the point-wise evaluation of the kernel function.
Ii-D Reduced Set Methods
Consider a vector described in terms of weighted combination of basis functions. Reduced set methods are a class of algorithms which allows us to compute an optimal approximation of the vector using a highly reduced number of basis functions. Interestingly, the same class of algorithms can be applied to improve the sample complexity of RKHS embedding as well. The process can be described as follows. Let and represent i.i.d samples of , respectively. Further, let and represent a subset (reduced set) of the i.i.d samples. It is implied that . Now, intuitively, a reduced set method would re-weight the importance of each sample from the reduced set such that they retain as much as information of the original i.i.d samples. The weights , associated with and are computed through the following optimization problems.
Iii Main Results
We assume that the uncertainty is non-parametric, which in our case means that the probability distribution functions associated with are not known. Rather, we have access to their discrete samples. These samples could come from a simulator which mimics a very generalized distribution with arbitrary order of moments.
We assume that the analytical form for the constraint functions are known.
Iii-a Algebraic Form of the Constraint Function
In this paper, we consider the chance constraints defined over the following class of constraint functions.
where, is a generic possibly non-linear function of , while represents a monomial of order . The definition (11) is very general and has the famous affine class of chance constraints as a special case with and , . It can be seen that even if the uncertain parameters, , are Gaussian, the chance constraints defined over may still be too complex to get an analytical characterization for the distribution of .
Let, represent the distribution of parametrized in terms of . Its RKHS embedding can be computed using (7) in the following manner:
Iii-B Desired Distribution
The notion of desired distribution is derived from the observations made in Remark 1. To recap, we want to ensure that the distribution achieves an appropriate shape. To this end, desired distribution acts as a benchmark for ; in other words, a distribution that should resemble as closely as possible for an appropriately chosen . We formalize the notion of desired distribution with the help of the following definitions:
Let , be random variables which represent the same entity as but belong to some known distributions , . Further, when and , then, . In such a case, is called the desired distribution if the following holds:
Equation (14) suggests that if the uncertain parameters belong to the distribution , , then the entire mass of the distribution, can be manipulated to lie almost completely to the left of by choosing . This setting represents an ideal case because we have constructed uncertainties appropriately, so that we can manipulate the distribution of the chance constraints while incurring a nominal cost.
Constructing the Desired Distribution:
We now describe how distributions , and can be constructed. While exact computations may be intractable, in this section, we provide a simple way of constructing an approximate estimate of these distributions. The basic procedure is as follows.
Given samples of we construct two sets , respectively containing samples of and samples of . For clarity of exposition, we choose , to identify samples from set , . Now, assume that the following holds.
By comparing (14) and (15), it can be inferred that the sets , are in fact sample approximations of the distributions and respectively. Furthermore, a set containing samples of can be taken as the sample approximation of the desired distribution .
One last piece of puzzle remains. We still do not know, however which samples of and samples of should be chosen to construct sets , . In particular, we need to ensure that the assumption (15) holds for the chosen samples. To this end, we follow the following process. We arbitrarily choose samples of and samples of and correspondingly obtain a suitable as a solution to the following optimization problem:
Note that satisfaction of (16b) ensures that the assumption (15) holds. Few points are worth noting about the above optimization. First, it is a deterministic problem whose complexity primarily depends on the algebraic nature of . Second, the desired distribution can always be constructed if we have access to sets , . The construction of these two sets is guaranteed as long as we can obtain a feasible solution to (16a)-(16c). Third, the computational burden of solving the optimization problem can be significantly reduced by some clever sampling. For example, in our implementation, we compute the left hand side of (16b) for different combination of samples and then choose the set which leads to the least violation of the constraints (16b). Finally, (16a)-(16c) is precisely the so-called scenario approximation for chance constrained optimization (1a)-(1c). Conventionally, scenario approximation is solved with a large (typically ) in order to obtain a solution that satisfy chance constraints (1b) with a high (). In contrast, we use (16a)-(16c) to estimate the desired distribution and thus for our purpose, a small sample size in the range of proves to be sufficient in practice.
The RKHS embedding of these distributions can be obtained in the following manner:
Where, are constants derived from the reduced set methods described in Section II-D.
Iii-C Chance-Constrained Optimization as a Moment Matching Problem
where, refers to the order upto which the moments of and are similar. The above theorem suggests that the difference between two distributions can be bounded by a non-negative function which decreases with an increasing order of moment . Authors in  also show that this bound is particularly tight near the tail end of the distribution. Now, recalling that almost the entire mass of lies to the left of , it is clear that as we make the tail of and similar by matching higher order moments, we ensure that more and more of the mass of gets shifted to the left of . This, leads to the satisfaction of chance constraints (1b) with a higher . Theorem 1 lays the foundation for the following optimization problem which can act as a substitute for the original chance-constrained optimization (1a)-(1c).
where, is a cost function that measures the similarity between the first moments of and . A low value of would imply that the first moments of and are very similar.
Accommodating Chance Constraint Probability : Optimization (20a)-(20b), accommodates the chance constraint probability in an implicit manner. Thus, the process of obtaining solutions with different level of robustness based on is more indirect and involved than the original optimization (1a)-(1c). In (20a)-(20b), the similarity between the tail of and not only depends on the residual of but also on the moment order used to construct . Fixing weights and and increasing increases the similarity near the tail end and thus leads to the satisfaction of chance constraints with higher . A similar goal can be achieved by fixing and and increasing .
Iii-D Reformulating Distribution/Moment Matching through RKHS Embedding
The optimization (20a)-(20b) is still challenging to solve as it is not clear how to derive a suitable analytical form for . To the best of our knowledge, there is no mapping that directly quantify the similarity between the first moments of two given distributions. Here, we present a workaround based on the concept of RKHS embedding and MMD distance. Our key idea is based on the following results from , 
If , then moments of and upto order become similar.
That is, decreasing the residual of MMD distance becomes a way of matching the first moments of the distribution and . Theorem 2 suggest that the MMD distance can be used either as a measure of similarity between the first moments of the two distributions. In other words, MMD with polynomial kernel can act as a surrogate for . Using this insight, we present the following optimization problem which can act as a surrogate for (20a)-(20b).
Iii-E Simplification Based on Kernel Trick
We now use the so called ”kernel trick” to obtain a simplified form for the optimization (21a)-(21b) and highlight that the computational complexity of solving (21a)-(21b) is comparable to solving a deterministic variant of the original chance-constrained optimization. For ease of exposition, we consider a specific instance from the definition of constraint function (11) with i.e. .
Expanding , we get
Using the kernel trick, (5) reduces to the following expression
Following a similar process, the second term, reduces to