## 1 Introduction

Robustification against uncertain events is at the core of modern optimization and control. From the classic S-lemma to recent advances in distributionally robust optimization (DRO), we have witnessed computational tools giving rise to new robustification designs.

The classic “worst-case” approach sets out to robustify constraints against all realizations of disturbances in a mathematical model, resulting in the often over-conservativeness. Consider the illustrative example of a sample-based approach to solving constrained stochastic control problem in Figure 1. In this case, we sample different numbers of realizations of the uncertainty in (left) and (right), and then solve the control problem under those realized scenarios. Intuitively, one may expect the controller associated with more scenarios in (left) to be more robust against constraint violation than (right). However, this results in conservative designs which may be reflected in high cost.

The central idea of modern data-driven robust optimization (e.g., [bertsimas2018data]), put in lay terms, is to use data samples to form empirical understanding of the true distribution and robustify only against this empirical understanding, instead of against the whole support. One concrete relevance to our discussion, for example in Figure 1, is that fewer scenarios translate to fewer constraints, which in turn lead to reduced cost. Naturally, this is a *trade-off between optimality and feasibility*.

In this paper, we show that constraint removal can be formulated as an optimization problem aiming to form a new distribution close to the empirical data distribution. Our contributions are (1) We formulate the constraint removal in stochastic programming and control as a tractable convex optimization problem with reproducing kernel Hilbert space (RKHS)-distance regularization or constraint. This formulation is well motivated in theory and effective in numerical studies. (2)

To our knowledge, this is the first use of RKHS-embedding reduced-set method in stochastic optimization and scenario approaches to control. Its implication is a connection between stochastic control and probability-metric-constrained DRO.

*Notation.*
In this work, symbol often denotes a reproducing kernel Hilbert space (RKHS). We write

to denote that the random variable or vector (RV)

follows the distribution law . By empirical distribution of the data, we mean the linear combination of Dirac-measures of the seen data where is the data set.## 2 Background & related work

### 2.1 Stochastic programming and scenario optimization for control

In this paper, the problem of interest is the (chance-constrained) stochastic programming (SP; also known as stochastic optimization) in the following canonical formulation.

(1) |

As is assumed to be an RV, program (1) may be intuitively understood as making decision under uncertainty originated from . We consider the following sample-based SP (a.k.a. scenario approach).

Suppose we have a set of realizations of , we solve the sample-based program

(2) |

If and are convex in , measurable in , it can be shown that this formulation is a convex approximation to the original SP (1).
As , the solution recovers that of the SP with level .
However, with a large , the solution to (2) is overly conservative— it aims to satisfy the constraints *almost everywhere* in the distribution of .
Therefore, the size trades off the conservativeness with constraint-satisfaction. Extensive research (e.g.,[calafiore2006scenario, dentcheva2000concavity, luedtke2010integer] ) has focused on approaches to remove a subset of sampled constraints to reduce conservativeness of the solution.

Most relevant to this paper, [campi2011sampling] established guaranteed bounds for the constraint satisfaction probability and the number of removal constraint (out of total ). Our method is built upon their sampling-and-discarding framework. [campi2013random] used regularization to encourage sparsity in decision variables, which is different from our sparsity in RKHS expansion terms.

For readers who are interested in sample-based stochastic programming, good text references are given by Ch.5 of [shapiro2009lectures] and Ch.9 of [birge2011introduction].

### 2.2 Reproducing kernel Hilbert space (RKHS) embeddings

This section establishes necessary tools from kernel methods. It is by no means a comprehensive survey. For readers who are not familiar with RKHS embeddings, we refer to [zhu2019new] for an accessible introduction in the context of stochastic systems and [scholkopf2002learning, Muandet2017] for an extensive coverage.

A positive definite kernel is a real-valued bivariate, symmetric function such that for any , , and .
One may intuitively think as a generalized similarity measure (inner product) between and after mapping them into the feature space ,
We refer to as *feature map* associated with the kernel , and the associated RKHS.
A canonical kernel is the Gaussian kernel
where is a bandwidth parameter.

RKHS embedding, or *kernel mean embedding* (KME) [[Smola07Hilbert]

] maps probability distributions to (deterministic) elements of a Hilbert space. Mathematically, the KME of a random variable

is given by the function, which is a member of the RKHS. For example, the RKHS associated with the second-order polynomial kernel consists of quadratic functions whose coefficients preserve statistical mean and variance. Gaussian kernel embeddings, on the other hand, preserve richer information up to infinite order.

*Reduced-set expansion method using RKHS embeddings.*
Given a data set , the sample-based KME is given by
where one can simple choose . It has been shown that one may use fewer than the total data sample to represent the distribution. This is the idea of reduced-set approximation. (cf. [scholkopf2002learning]) Mathematically, this method seeks to find an embedding with fewer expansion terms
where the approximation is in the sense of RKHS distance measure. The reduced-set method forms the backbone of our approach.
We also note that there are other related approximation methods such as those of [chen2012super, bach2012equivalence].
Recently, [zhu2019new] considered recursive applications of reduced-set method to uncertainty in stochastic systems.

## 3 Method

### 3.1 Stochastic programming with reduced-set expansion of RKHS embeddings

We consider the sample-based formulation of the stochastic programming problem (2). Our main idea is to perform constraint removal systematically using the aforementioned RKHS embedding reduced-set methods. Typically, constraint removal discards low-probability scenarios to reduce conservativeness of the resulting solution. Given a set of realized scenarios and positive definite kernel , we formulate optimization problem as

(3) |

where denotes scaling vector for the -penalty. This can often be set to reflect specific concerns, such as the distance of states to the constraint. The KME expansion weights need not sum to one.

denotes the empirical KME estimator of the distribution

. We further write down the equivalent Lagrangian form.(4) |

The resulting solution is sparse due to the sparsity-inducing term. We then discard the points with the index set . Finally, the we re-solve the stochastic programming problem with the reduced-set scenarios

(5) |

The *intuition* of the optimization formulation (3) and (4) is to produce a subset of data whose distribution is close to the empirical data in the sense of RKHS-embedding distance . Meanwhile, the weighted -penalty incentivizes the solution to become sparse. Therefore, the solution to (4) discards the “corner” cases while maintaining the statistical information.
We outline the algorithmic procedure in Algorithm 1.

(6) |

*Remark.* In Step 2 of Algorithm 1, we may also use the reduced set expansion of any transformations of random variables This generalization, termed as kernel probabilistic programming ([Scholkopf2015]), is often of interest as in our numerical examples. The statistical consistency is justified by [SimonGabriel2016]. See also Section 2.2 of [zhu2019new] for an accessible discussion on this.
The following lemma shows formulation (4) is computationally tractable.
If in problem (4) is the RKHS associated with a positive definite kernel, the objective of optimization problem is convex.
*Proof sketch.*
To see this, we summon the sample based estimator for the RKHS distance.
Using the kernel trick (cf. [scholkopf2002learning]), this objective is simply

(7) |

is a constant vector. is the gram matrix associated with the positive definite kernel, which implies . Using convexity of , the conclusion follows.

*Remark.* (Relation to *distributionally robust optimization*, DRO)
We can equivalently write the constraint of program (3) in the form of maximum mean discrepancy, .
where is the empirical distribution of the data samples and , the distribution induced by the reduced-set embedding. Then the distribution associated with reduced-set embedding can be viewed as an -perturbation of the empirical distribution, i.e., must lie within an *MMD-ambiguity set*
Optimization problems with such constraints are often referred to as

*generalized moment problems*

### 3.2 Application to stochastic optimal control

Let us consider the following sample-based (scenario) formulation of stochastic optimal control problem (OCP).

(8) | ||||||

where are uncertain variables and their realizations. The uncertainty in the initial state is particularly relevant to MPC designs. After proper transcription and discretization, this OCP subsequently becomes the same form as the sample-based SP (2), solvable by Algorithm 1.

*Remark.* For conciseness, we restrict the uncertainty to the initial states in OCP (2). Reduced-set RKHS embedding of more general process disturbances has been discussed in [zhu2019new].

## 4 Numerical Experiments

### 4.1 Min-max robust regression

We first consider a synthetic stochastic programming problem given in the form of the following min-max robust regression. A similar example was visited in [campi2019scenario].

(9) |

For simplicity, we consider scalars and generated randomly according to the distributions. where is the (unknown) true parameter drawn from Uniform().

Given the computed solution to the full program (9) , let us consider the quantity of interest , which is an RV due to the uncertainty in and . We now apply Algorithm 1 to find the reduced-set embedding of , In step 3, in solving program (4), we used the scaling factor to incentivize the removal of “corner” points ( may be thought of as the “softness” parameter of this softmax scaling factor). We then remove the constraints with identified index set , from the stochastic program and re-compute a solution. Following our discussion in the previous sections, this embedding captures the distribution information while discarding the rare scenarios. This is done by solving the sparse optimization problem (3). The results are illustrated in the Figure 2. As we can see, scenarios associated with “corner” data points are not selected, causing the reduction in conservativeness.

Figure 3 illustrates the effect of constraint removal on the optimal objective value and constraint violation. See the caption for detailed description.

### 4.2 Stochastic control

We now consider the Van der Pol oscilator model

(10) |

The goal of the control design is to steer the system state to a certain level. This is formulated as the following OCP.

(11) |

We sample the i.i.d. uncertainty realizations , where . Because of the nonlinear dynamics, we cannot propagate the uncertainty in a tractable manner as in LQG without resorting to approximations. We use the sampled scenarios to form the OCP (8). The continuous-time dynamics is transcribed using multiple-shooting with CVodes (interfaced with CasADi) integrator. We then solve the discretized OCP with IPOPT to obtain the optimal control. An example of states associated with the solution is given in Figure 1 (left). The total time horizon is s and we consider control steps in this experiment.

Let us consider the quantity of interest , the distance from state position to the upper bound constraint. This quantity reflects how close we are to be infeasible. It is random due to the states being a function of RV . In Step 2 of Algorithm 1, the scaling factor is taken to be to encourage the removal of close-to-constraint trajectories.

We are now ready to apply Algorithm 1 to find the reduced-set embedding of , , where is a vector comprising at all time steps. Finally, we re-solve the subsequent reduced-set SP—OCP. Figure 1 (right) illustrates the reduced number of scenarios.

After we applied Algorithm 1, we obtain the “optimistic” controller . To evaluate the performance of this controller, we use large-sample Monte Carlo simulation to estimate the constraint violation probability, i.e., in the chance-constrained SP (1), as well as the expected cost over the large-sample simulation . We plot the state trajectories with different number of removed constraints in Figure 4. The trade of between those is illustrated in Figure 5. The result makes intuitive sense that the more constraints we remove, the less the conservativeness, but with higher violation probability.

## 5 Discussion

This paper proposed a distributional-distance-regularized optimization formulation for stochastic programming under the framework of sampling-and-discarding. We demonstrated effective conservativeness reduction in data-driven optimization and control tasks. Although we did not study the guaranteed bounds, all analysis in [campi2011sampling] applies to our case. However, as our approach produces new distributions that are close to the empirical distribution, the sample complexity is likely to be less conservative.

One particular interesting aspect is the interpretation of perturbing empirical data distribution within an ambiguity set in the sense of RKHS distance (as remarked in Section 3.1). This is worthy of further investigation.

We owe the inspiration for the second numerical example to Joris Gillis. We also thank Wittawat Jitkrittum for his helpful feedbacks. This project has received funding from the European Union’s Horizon 2020 research and innovation programme under the Marie Skłodowska-Curie grant agreement No 798321, the German Federal Ministry for Economic Affairs and Energy (BMWi) via eco4wind (0324125B) and DyConPV (0324166B), and by DFG via Research Unit FOR 2401.

Comments

There are no comments yet.