A New Distribution-Free Concept for Representing, Comparing, and Propagating Uncertainty in Dynamical Systems with Kernel Probabilistic Programming

This work presents the concept of kernel mean embedding and kernel probabilistic programming in the context of stochastic systems. We propose formulations to represent, compare, and propagate uncertainties for fairly general stochastic dynamics in a distribution-free manner. The new tools enjoy sound theory rooted in functional analysis and wide applicability as demonstrated in distinct numerical examples. The implication of this new concept is a new mode of thinking about the statistical nature of uncertainty in dynamical systems.



There are no comments yet.


page 1

page 2

page 3

page 4


A Kernel Two-sample Test for Dynamical Systems

A kernel two-sample test is developed for deciding whether two dynamical...

Probabilistic Evolution of Stochastic Dynamical Systems: A Meso-scale Perspective

Stochastic dynamical systems arise naturally across nearly all areas of ...

Adaptive CVaR Optimization for Dynamical Systems with Path Space Stochastic Search

We present a general framework for optimizing the Conditional Value-at-R...

A Dynamical Systems Approach for Static Evaluation in Go

In the paper arguments are given why the concept of static evaluation ha...

Flow-driven spectral chaos (FSC) method for long-time integration of second-order stochastic dynamical systems

For decades, uncertainty quantification techniques based on the spectral...

A mathematical approach to resilience

In this paper, we evolve from sparsity, a key concept in robust statisti...

A note on tools for prediction under uncertainty and identifiability of SIR-like dynamical systems for epidemiology

We provide an overview of the methods that can be used for prediction un...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Classic stochastic control methods such as LQG hinge on the mathematical fact that the family of Gaussian distributions is closed under an affine transformation. This allows uncertainty to be propagated in a tractable manner under the Gaussianity assumption. Robust control methods, such as the classic tube model predictive control, also rely on the linearity of dynamics to propagate the polytopic uncertainty. However, when we move beyond those assumptions to the territories of nonlinear dynamics and non-Gaussian noise, uncertainty propagation becomes difficult or even intractable.

A central component in robust and stochastic control is how to represent

the system uncertainty. To this end, point estimate, ellipsoidal uncertainty set, Gaussian distribution, or randomized computer simulations have been used in various applications. Along with those, a few families of

uncertainty propagation methods have been proposed, e.g., generalized polynomial chaos approximation, Gaussian processes. The representation and propagation affect control as well as system identification. For example, a parameter estimation problem typically uses the likelihood function as its criterion for goodness-of-fit, which requires assuming the distribution family.

Consider an illustrative sketch in Figure 1. How can we measure the differences between system models? If the dynamical system of interest is deterministic (left), then we can simply compare the solutions of the systems in the sense of Euclidean distance.

Figure 1: Illustrative example of deterministic (left) and stochastic (right) dynamical systems. Notice the system in (right, red) is stochastic and bi-modal.

Now we consider Figure 1 (right), which illustrates the evolution of two stochastic dynamical systems (with arbitrary distribution). Due to the stochasticity, the solutions are distributions. Comparison in Euclidean distance is no longer feasible. How do we incorporate statistical information without imposing strong distributional assumptions? The question we address is how to represent, compare, and propagate the randomness in dynamical systems in a quantitative way.

This work proposes to embed uncertain state distributions into the reproducing kernel Hilbert spaces (RKHS), enabling quantification and algebraic operations therein. The main contributions of this work are the following.

  • We introduce the RKHS embedding method of kernel probabilistic programming in the context of uncertain dynamical systems. The resulting formulations of representation, comparison, and propagation methods are our main contributions.

  • Through distinct numerical examples, we demonstrate the flexible and distribution-free nature of our method as a unifying tool for dynamical systems.

  • We propose a recursive reduced set method to propagate the system uncertainty by iteratively solving an optimization problem. This forms fixed-size kernel expansions rather than allowing the number of uncertainty realizations to explode over time, as is commonly encountered in robust control [e.g., Scokaert and Mayne (1998); Tempo et al. (2012) Chapter 12; Campi et al. (2019)].

1.0.1 Notation:

In this work, the state of uncertain dynamical systems are denoted by , where is time and is the uncertainty in the system, e.g., a disturbance process. The Stieltjes integral

denotes the integration w.r.t. either a continuous or a discrete probability measure

. Symbol often denotes a nonempty set and a reproducing kernel Hilbert space (RKHS). We write

to denote that the random variable (RV)

follows the distribution law .

2 Background & related work

2.1 Uncertainty quantification in dynamical systems

Stochastic dynamical system refers to a system whose evolution is affected by non-deterministic disturbances. To illustrate our idea concretely, let us consider a simple example of stochastic ODE that was studied in Xiu and Karniadakis (2002):


where the parameter

is uncertain. This uncertainty may enter as (a) time-invariant, or, more generally (b) time-varying stochastic process. For a moment, let us consider case (a) and the parameter follows a certain distribution law

. This is hence an initial value problem (IVP) with uncertain parameter. The problem of quantifying the distribution of the solution to the IVP (1) is typically referred to as the uncertainty quantification in dynamical systems.

One somewhat trivial approach of quantifying the solution is the Monte Carlo sampling that samples  from the underlying distribution. Then we deterministically integrate the ODE with those realizations of uncertain parameter to obtain solutions . We may later extract the moment statistics of by its Monte Carlo estimation.

In numerical analysis, one representative thread of works in uncertainty quantification is the generalized polynomial chaos (gPC) method. It expands the solution of IVP (1) as a series in orthogonal polynomial basis functions ,

This expansion is mathematically elegant in that the basis ’s account for the stochasticity and the expansion coefficients ’s are deterministic. From this point onward, we can either follow Galerkin’s method propagate the coefficients through the dynamical systems as in Xiu and Karniadakis (2002), or, use the so-called stochastic collocation to numerically integrate the IVP (1) at certain collocation nodes  as in Xiu (2009).

Another well-developed methodology in Bayesian statistics is the Gaussian process (GP). Intuitively, GP generalizes the Gaussian distribution to the distribution of functions. For example, dynamics described by a difference equation can be modeled by a GP prior, i.e.,

where is a shorthand for

. Given data samples, it can be shown that the posterior predictive distribution for an unseen state is also Gaussian. This allows us to quantify the distribution of solution

. We refer to Rasmussen and Williams (2006) for an accessible introduction.

Mathematically speaking, gPC and GP are both surrogate function classes enabled by function approximation theory. This paper does not focus on analyzing the function approximation aspect. Instead, the embedding method we shall propose in Section 3 calls for a shift in our ways of thinking about statistical distributions.

2.2 Goodness-of-fit measure for system identification

System identification studies how to construct mathematical models of dynamical systems from observed data. While this paper does not directly propose a system identification algorithm, we show how the proposed concept can impact how we analyze and compare the models in terms of goodness-of-fit. To give readers a concrete example for this new concept, we revisit the parameter and variability estimation (PVE) proposed by Mohammadi et al. (2015) in Section 4.2

. This may be thought of as an alternative to the least square (LSQ) estimation, whose underlying assumption is that the system is following an additive Gaussian distribution with fixed variance,

In contrast, PVE is based on the robust optimization idea that the disturbance should lie within an ellipsoidal uncertainty set but not making any assumptions on the distribution family. The PVE method then formulates the identification of the uncertain ellipsoid as a semidefinite programming (SDP) problem. More details are provided in that paper.

It can be relatively difficult to compare an LSQ point estimate (or MLE in general) with e.g. PVE because they do not share the same likelihood . This paper proposes such a unifying framework of comparing different system models’ goodness-of-fit.

2.3 Reproducing kernel Hilbert space (RKHS) embedding

A kernel is a real-valued bivariate function . It is said to be positive definite if for any , , and . In addition, it is a reproducing kernel of an RKHS if (i) , and (ii) , . The latter is known as the reproducing property of . Choosing for some and applying the reproducing property yield the kernel trick


That is, we can view the kernel evaluation as a generalized similarity measure between and after mapping them into the feature space . We refer to defined above as a canonical feature map associated with the kernel . One of the most common kernels on is the Gaussian kernel


where is a bandwidth parameter.

An important application of kernel methods is in representing probability measures via kernel mean embedding (KME) [Smola et al. (2007)]. This line of work can be thought of as a systematic way of endowing unstructured data with representations in a Hilbert space to provide the ability to perform algebraic operations therein. Mathematically, we define the KME of a random variable as follows.

Definition 2.1

(Kernel mean embedding) Given random variable and kernel for which , we define the kernel mean embedding of as a function

This function is a member of the RKHS, associated with the kernel .

In the rest of the paper, we follow the convention in kernel machine learning to simply write the function

as to emphasize that it is an element of the RKHS . It has been shown that if belongs to a class of kernels known as characteristic kernels, then uniquely determines the distribution ; see, e.g., Fukumizu et al. (2004); Sriperumbudur et al. (2010).

To help readers get a concrete understanding, we outline common kernels and the information their KME preserve in Table 1. We then give examples of the explicit forms of the KMEs.

Linear Mean of distribution
Polynomial Up to -th moments
Gaussian All information
Exponential All information
Table 1: Common kernels and what statistical information their KME preserve. More examples can be found in Muandet et al. (2017)

2.3.1 Example (Second-order polynomial kernel embedding)

Suppose the kernel function in question is the polynomial kernel of order two , the KME is given by


This shows that the RKHS associated with this KME consists of quadratic functions whose coefficients preserve statistical information up to the second order (mean and variance), but not higher. In general, -th order polynomial kernel embeddings preserve information up to the -th order. A richer kernel embedding, e.g. Gaussian kernel embedding, may preserve information up to infinite order.

Remark Readers familiar with polynomial approximation may recognize that the integrand in (4) can be expanded in certain Wiener-Askey polynomial bases. However, our method differs from the philosophy of gPC expansion in that it does not seek to use finite-order truncation for approximating functions. Rather, we make use of the kernel trick (2) to represent similarity in data even in infinite dimensional feature space. This gives rise to the power of kernel machine learning.

2.3.2 Example (Exponential kernel embedding)

Given the exponential kernel , the KME is

This is the moment-generating function. Notably, replacing

by yields the Laplace transform.

The following result in Song (2008) gives the consistency of a sample-based estimator for .

Lemma 2.1

(Smola et al. (2007); Song (2008); estimator for KME) Let us denote by a random variable and its i.i.d samples. Then,


as with probability .

Furthermore, Schölkopf et al. (2015); Simon-Gabriel et al. (2016) proved the estimation consistency for KME of transformations of RVs in more general conditions. They term their approach kernel probabilistic programming (KPP).

Lemma 2.2

(KME consistency) Suppose is a continuous function, are continuous kernels on and . Under mild conditions [cf. Theorem 1 of Simon-Gabriel et al. (2016)] , the following is true.

Notably, we do not require samples to be i.i.d. This result equips us with algebraic tools to learn an embedding of directly from that of . In the rest of the paper, by KPP we mean the RKHS embedding method that performs algebraic operations on KMEs via transformations of random samples.

In the context of reinforcement learning, e.g., in

Nishiyama et al. (2012); Grünewälder et al. (2012); Boots et al. (2013), RKHS embeddings of conditional distributions, which is different from ours, were used to learn dynamics. We share the common thread of using RKHS embeddings while differ in a few important aspects, e.g., our use of KPP, numerical integration for continuous-time systems. See Song et al. (2013) for more details and references.

3 Approach

3.1 Representing uncertainty with RKHS embeddings

KPP introduced in the last section gives us a powerful tool to represent distributions without any parametric assumption. In the context of dynamical systems, if we view the evolution of the system uncertainty as transformations of random variables (RV), then KPP naturally becomes a tool to propagate system uncertainty. An important motivation of our methodology is that it shall be distribution-free, i.e., it shall not impose assumptions on the uncertainty distributions (e.g., Gaussianity).

In a nutshell, we represent the distribution of , the state of the dynamical systems (continuous or discrete time), by its KME and the corresponding sample-based estimator given by


where a simple choice is . As discussed in Section 2.3, KME with second-order polynomial kernel preserves (nominal state) and second (variance) order information commonly used in stochastic control. In this light, we may view our method as a generalization of Monte Carlo moment estimation.

3.2 Goodness-of-fit measure for uncertain system models

As suggested in Figure 1 (right), it may be difficult to quantitatively compare stochastic system models directly. For example, say we have identified an LSQ point estimate and another estimation described by a distribution in uncertain parameter based on two different system identification methods. We cannot simply compare the goodness-of-fit by comparing the likelihood objectives as they might differ for different identification methods. Furthermore, the parameter descriptions may also differ (e.g. point estimation vs. set description) In addition, can we be certain, in a quantitative manner, that the systems behave differently after the propagation through dynamics? 111We note the Kullback-Leibler (KL) divergence is not distribution-free: we need distribution functions to calculate its estimate.

We summon the strength of KME to endow almost arbitrary data types the meaning of distance through the Hilbert space embedding. This allows us to compare systems by performing statistical two-sample tests [cf. Gretton et al. (2012)] using the simulation samples.

Given the state distribution embeddings of two different systems computed as in (6), we may measure how different two stochastic systems are by straightforwardly computing their distance in the embedding Hilbert space, The above quantity is also known as a maximum mean discrepancy (MMD). Using kernel trick (2) and the estimator in (6), we obtain the following.

Lemma 3.1

(Sample-based estimator for RKHS distance; MMD) Given two sets of samples and from simulations of two dynamical systems, a sample-based estimator for is given by


where we omit time index for conciseness. More details can be found in Schölkopf et al. (2002).

We propose that the goodness-of-fit may again be straightforwardly measured by the RKHS distance where and are two state distributions under two uncertain parameter descriptions.

This is powerful in that it can compare arbitrary (unknown) uncertain systems. We demonstrate this flexibility in Section 4.2.

3.3 Uncertainty propagation via KPP

We have thus far discussed the use of KME as representation and goodness-of-fit measure for uncertainty in stochastic systems. In this section, we propose to use KPP for uncertainty propagation, which is at the core of many stochastic control algorithms.

We first present two different views of uncertainty propagation in systems. From a statistical standpoint, they correspond to the diagonal and U-statistics estimation. In particular, the U-statistic estimator is known to have lower variance than the diagonal estimator but to require more samples. We then show a novel recursive application of the so-called reduced set method in propagating stochastic dynamics forward.

Direct propagation via KPP:

The main idea of this algorithm is simple: sample a realization of the uncertainty, and evolve the system as it is deterministic. The steps are presented in Algorithm 1. By doing so, we view the deterministic evolution as algebraic operations performed on the uncertainty.

1:  Given: initial state , (uncertain) dynamical system .
2:  Output: KME estimate at time .
3:  Choose realization of the uncertain variable nodal set  either via collocation or sampling.
4:  Evolve the deterministic system forward, either via difference equation or numerical integration, obtain the states at time .
5:  Compute KME estimate by (6).
Algorithm 1 Direct KPP for uncertainty propagation

In Step 4, if the underlying system model is continuous-time, we rely on numerical integration to propagate the samples forward. Let us consider the integral of the dynamics function (deterministic or random) over the time period In practice, this integral is often intractable. Numerical integration is performed to approximate its value, where may denote the step size of an one-step numerical integration rule.

One immediate question is, how will the integration error affect the embedding estimate? I.e., is the following true?


By virtue of Lemma 2.2, we obtain the following result.

Lemma 3.2

(Consistency 222With a slight overload of terminology, we note the term consistent is used in both statistics and numerical analysis community. In both fields, they refer to the asymptotic convergence of statistical estimator and numerical integration respectively. of KPP estimation with numerical integration) Suppose is a continuous positive-definite kernel,  is chosen either via i.i.d sampling or collocation rules. The KPP estimator produced by a one-step numerical integration rule with step size in Algorithm 1 is consistent, i.e.,


The proof (given in the appendix) is a direct consequence of the consistency of numerical integration and that of the KPP estimator. Similar propagation methods are used in stochastic collocation in conjunction with gPC [Xiu (2009)]. In the above algorithm, the propagated samples are used to represent the distributions via the RKHS embeddings. In the next section, we shall see a non-trivial generalization of the above algorithm.

3.3.1 Recursive reduced set KPP for uncertainty propagation:

One nuance is encountered when considering more general descriptions of uncertainty other than the simple parameter uncertainty. For simplicity, let us restrict the discussion to discrete-time dynamics and assume that the uncertainty enters as discrete realizations of stochastic processes at each time . In this case, the system state at time is a function of all previous-step uncertainties,

From a statistical standpoint, this is a transformation of multiple RVs. To estimate the KME, one can either use the diagonal estimator which corresponds to the already-discussed Algorithm 1,

or the U-statistics estimator which delivers smaller variance


The downside of the U-statistics estimator is that it may involve exponentially many samples of the uncertain random variable (disturbances). To relieve this sample complexity while still capturing the statistical distribution, Schölkopf et al. (2015) proposed to use the reduced set method to compute multi-step transformations of RV with only a subset of samples. Intuitively, the reduced set method seeks to find a (small) set of expansion points and weights such that the expansion approximates a U-statistics estimation such as in (10).

We propose the recursive reduced set kernel probabilistic programming for uncertainty propagation in Algorithm 2.

1:  Given: initial state , (uncertain) dynamical system , desired size for reduced-set Output: Reduced set expansion for KME at time
2:  loop
3:     At time , given the reduced set expansion for embedding the current state distribution, sample realizations of the uncertain process
4:     Compute KME of next state with U-statistics
5:     Construct the reduced set for the next time step
by minimizing the following optimization criterion
6:  end loop
Algorithm 2 Recursive reduced set KPP for uncertainty propagation

Intuitively, at every time step, we look for a subset of all samples (of the U-statistics samples) to serve as new expansion points. One step of our recursion is similar to the basic idea of efficient quadrature rule [Gauss (1815)]. The optimization problem in Step 5 has two main tasks, finding the expansion points and weights simultaneously. There is a wide range of techniques for treating this problem (see Chapter 18 of Schölkopf et al. (2002)) In Section 4, we give an numerical example as a proof of concept for this procedure.

4 Numerical experiments

We present numerical examples that vary in their uncertain system descriptions and types of tasks, showcasing the flexibility of the proposed framework.

4.1 Uncertain ODE

Let us revisit the example of stochastic ODE in (1),

In this experiment, the uncertain is drawn from a mixture-of-Gaussian distribution. We then draw another from a unimodal Gaussian. Its mean and variance are chosen such that the first two moments match those of , i.e.,


Their distributions are illustrated in Figure 2 (top).

parameter valuetime

Figure 2: (top). Histogram approximating the uncertain parameter density for IVP example. (Red) the histogram represents the initial parameter , (Gray) the histogram represents

. The two distributions match up to the second moment. However, the GMM is skewed. (bottom). Two stochastic ODE states with parameter

(red) and (gray). The states are obtained as a result of forward Euler integration.

We wish to answer: can we quantify the different behaviors of those systems without imposing distribution assumptions? To this end, we apply Algorithm 1 to obtain two sets of propagated system states and . The state KME estimator and are computed using (6).

As illustrated in Figure 2 (bottom), the two moment-matched parameter distributions result in distinct system behaviors over time. To quantitatively compare the behaviors, we compute the RKHS-distance over time between those two embeddings using (7).

To understand how different kernels preserve statistical properties of the system, we also plotted the embedding distances associated with polynomial kernel of order in Figure 3 (bottom). We clearly observe that different polynomial kernels capture different orders of moment information. Notably, the two system states seem to have similar means and therefore the first order polynomial kernel does not differentiate the two systems. As we increase the order of the polynomial kernel, the difference becomes evident.


Figure 3: (top): RKHS distance over time between two embeddings associated with Gaussian kernel of different bandwidth. (bottom): Embedding distances associated with polynomial kernel of order .

We then show the Gaussian kernel embeddings in Figure 3 (top), where we vary the bandwidth parameter . It can be shown that the Gaussian kernel keeps track of statistical moments up to infinite order. If bandwidth is large, the kernel treats everything the same so RKHS distance is small. On the other hand, if bandwidth is small, the kernel treats everything as different. In the limiting case as the bandwidth , it can be shown the KME estimation reduces to the usual Monte Carlo estimation [cf. Section 3.3 in Schölkopf et al. (2015)].

4.2 Distribution-free goodness-of-fit for identification

To demonstrate the proposed technique is a good measure of goodness-of-fit for arbitrary uncertainty distributions, we apply it to PVE example in Mohammadi et al. (2015). Different from the uncertainty description in Section 4.1, this is typically a “worst-case” scenario where the model needs to account for the worst case realization of disturbances, described by ellipsoidal uncertainty sets.

In this example, the model to be identified is a time-varying autoregressive exogenous-input model (ARX) with variable parameter.

The uncertain time-varying parameters are . To generate the data, we follow the setup of Mohammadi et al. (2015) to choose the true uncertain parameters uniformly randomly333We do not use this information during our measuring goodness-of-fit. In addition, we note the original data generation procedure may be extended beyond uniform sampling as robust optimization only concerns the distribution support. within the true ellipsoidal uncertainty set . The identification was performed according to that paper. As a result, we obtain the estimated ellipsoid and the LSQ parameter , as illustrated in Figure 4 (left).


Figure 4: (left) plots the true uncertainty set (blue), estimated ellipsoidal uncertainty set (red), as well as the LSQ point estimate (black). (right) plots the result of evolving the system for 600 steps using true ellipsoid (blue), estimated variability ellipsoid using PVE (red), Gaussian distributed parameter variability from LSQ (black).

We evolve the system for 600 steps using the true model and those two different uncertain parameter models (PVE vs. Gaussian distributed parameter resulting from the LSQ assumption, where is estimated by the sample variance as in that paper). The resulting state trajectories are shown in Figure 4 (right). We apply Algorithm 1 and compute the RKHS distance between the embeddings of the estimated model and the true model. Figure 5 demonstrates that the PVE model (red) matches the true model better than the LSQ model (black) in the sense of RKHS metric. This comparison is performed under no distributional assumptions.

What is interesting is that, in the original paper, it is obviously LSQ did not deliver the parameter variability that fits the true generating distribution. However, one may ask, is the LSQ estimated parameter nonetheless able to deliver similar system behaviors? But there was no unified way to compare them. This paper provides an unifying framework to quantitatively answer this question.


Figure 5: We show goodness-of-fit using for estimated parameter variability using PVE (red), and for Gaussian-distributed parameter from LSQ estimation (black). They correspond to the system evolution over time in Figure 4 (right).

4.3 Recursive reduced set KPP for uncertainty propagation

Relying on the statistical consistency results in Section 2, uncertainty propagation can be performed straightforwardly using Algorithm 1. In this section, we demonstrate the use of Algorithm 2. As a proof of concept, let us consider a simple discrete-time stochastic system

We emphasize the distribution of could be made fairly arbitrary (and non-stationary)— the proposed propagation method does not place restrictions on this distribution. The system evolution is illustrated in Figure 6 (left).

timenumber of samplesalg. 2alg. 1

Figure 6: (left) Uncertainty propagation using reduced set method. The uncertainty is forward propagated for 10 steps, using the reduced set methods of size parameter 10. The red lines are Monte Carlo simulations of the system in question. The black dots are chosen reduced set to represent the state distribution at each time step. (right) Comparing approximation errors of the recursive reduced set algorithm (red, Alg. 2), , with that of the diagonal estimate of KME (blue, Alg. 1)). The dynamics are run for 10 steps. Each method is run 10 times to produce the error bar plot. The state is sampled at time step . In calculating the embedding , we use a Gaussian kernel with bandwidth .

At every time step, Algorithm 2 is applied to find an embedding of the current state , where denotes the reduced-set indices. Then the dynamics is propagated forward again. Note we use the naive reduced set method following Simon-Gabriel et al. (2016) which simply samples expansion points from the full set and then solves a quadratic minimization problem for coefficients as in Step 5. A more sophisticated method will be introduced in future work. As illustrated in Figure 6 (left), This procedure is repeated for 10 time steps. We compare the RKHS approximation error of the recursive reduced set method in Algorithm 2, against that of the diagonal estimate of Algorithm 1. While we do not know the true embedding , we approximate it with a large-sample estimate using 500 samples. The approximation error in RKHS metric are illustrated in Figure 6 (right). We observe a faster convergence in both mean and variance by the recursive reduced set method in Algorithm 2.

5 Discussion

This paper proposed to use kernel probabilistic programming as a unifying tool for treating uncertainties in dynamical systems. We demonstrated concrete numerical procedures of such applications. This new concept calls for a change in the mode of thinking. KPP characterizes uncertainties in systems. Thus propagating the embeddings is propagating the uncertainty in dynamics.

It is our on-going endeavor to investigate more sophisticated optimization procedures as well as mathematically rigorous analysis of convergence in Algorithm 2 with more general numerical integration. Another important direction is robust control design and state estimation using RKHS embeddings.

Compared with existing popular methods such as gPC or GP, RKHS embedding methods for dynamical systems are still in the early development. This paper serves as a call to action for their wider applications to robust and stochastic control.

We would like to thank Mario Zanon for providing the code to reproduce the PVE experiment.


  • B. Boots, G. Gordon, and A. Gretton (2013) Hilbert space embeddings of predictive state representations. arXiv preprint arXiv:1309.6819. Cited by: §2.3.2.
  • M. C. Campi, S. Garatti, and M. Prandini (2019) Scenario optimization for mpc. In Handbook of Model Predictive Control, pp. 445–463. Cited by: 3rd item.
  • K. Fukumizu, F. Bach, and M. Jordan (2004)

    Dimensionality reduction for supervised learning with reproducing kernel Hilbert spaces

    Journal of Machine Learning Research 5, pp. 73–99. Cited by: §2.3.
  • C. F. Gauss (1815) Methodus nova integralium valores per approximationem inveniendi. apvd Henricvm Dieterich. Cited by: §3.3.1.
  • A. Gretton, K. Borgwardt, M. Rasch, B. Schölkopf, and A. Smola (2012) A kernel two-sample test. Journal of Machine Learning Research 13, pp. 723–773. Cited by: §3.2.
  • S. Grünewälder, G. Lever, L. Baldassarre, M. Pontil, and A. Gretton (2012) Modelling transition dynamics in MDPs with RKHS embeddings. In Proceedings of the 29th International Conference on Machine Learning, pp. 535–542. Cited by: §2.3.2.
  • A. Mohammadi, M. Diehl, and M. Zanon (2015) Estimation of uncertain ARX models with ellipsoidal parameter variability. 2015 European Control Conference, ECC 2015 (1), pp. 1766–1771. External Links: Document, ISBN 9783952426937 Cited by: §2.2, §4.2, §4.2.
  • K. Muandet, K. Fukumizu, B. Sriperumbudur, and B. Schölkopf (2017) Kernel mean embedding of distributions: a review and beyond. Foundations and Trends in Machine Learning 10 (1-2), pp. 1–141. Cited by: Table 1.
  • Y. Nishiyama, A. Boularias, A. Gretton, and K. Fukumizu (2012) Hilbert space embeddings of POMDPs. In

    Proceedings of the 28th Conference on Uncertainty in Artificial Intelligence

    pp. 644–653. Cited by: §2.3.2.
  • C. E. Rasmussen and C. K. Williams (2006) Gaussian processes for machine learning. Gaussian Processes for Machine Learning, by CE Rasmussen and CKI Williams. ISBN-13 978-0-262-18253-9. Cited by: §2.1.
  • B. Schölkopf, K. Muandet, K. Fukumizu, S. Harmeling, and J. Peters (2015) Computing functions of random variables via reproducing kernel Hilbert space representations. Statistics and Computing 25 (4), pp. 755–766. External Links: Document, ISSN 1573-1375 Cited by: §2.3.2, §3.3.1, §4.1.
  • B. Schölkopf, A. J. Smola, F. Bach, et al. (2002)

    Learning with kernels: support vector machines, regularization, optimization, and beyond

    MIT press. Cited by: §3.2, §3.3.1.
  • P. O. Scokaert and D. Q. Mayne (1998) Min-max feedback model predictive control for constrained linear systems. IEEE Transactions on Automatic control 43 (8), pp. 1136–1142. Cited by: 3rd item.
  • C. J. Simon-Gabriel, A. Ścibior, I. Tolstikhin, and B. Schölkopf (2016) Consistent kernel mean estimation for functions of random variables. Advances in Neural Information Processing Systems 1 (i), pp. 1740–1748. External Links: ISSN 10495258 Cited by: §2.3.2, Lemma 2.2, §4.3.
  • A. J. Smola, A. Gretton, L. Song, and B. Schölkopf (2007) A Hilbert space embedding for distributions. In Proceedings of the 18th International Conference on Algorithmic Learning Theory (ALT), pp. 13–31. Cited by: §2.3, Lemma 2.1.
  • L. Song (2008) Learning via Hilbert space embedding of distributions. Ph.D. Thesis, The University of Sydney. Cited by: §2.3.2, Lemma 2.1.
  • L. Song, K. Fukumizu, and A. Gretton (2013) Kernel embeddings of conditional distributions: a unified kernel framework for nonparametric inference in graphical models.. IEEE Signal Processing Magazine 30 (4), pp. 98–111. Cited by: §2.3.2.
  • B. Sriperumbudur, A. Gretton, K. Fukumizu, B. Schölkopf, and G. Lanckriet (2010) Hilbert space embeddings and metrics on probability measures. Journal of Machine Learning Research 99, pp. 1517–1561. Cited by: §2.3.
  • R. Tempo, G. Calafiore, and F. Dabbene (2012) Randomized algorithms for analysis and control of uncertain systems: with applications. Springer Science & Business Media. Cited by: 3rd item.
  • D. Xiu and G. E. Karniadakis (2002) The wiener–askey polynomial chaos for stochastic differential equations. SIAM journal on scientific computing 24 (2), pp. 619–644. Cited by: §2.1, §2.1.
  • D. Xiu (2009) Fast numerical methods for stochastic computations: a review. Communications in computational physics 5 (2-4), pp. 242–272. Cited by: §2.1, §3.3.

Appendix A Proof of Lemma 3.2

We provide a proof for the consistency of Algorithm 1.


In the last inequality, The first term is due to the continuity of kernel . In the last line, the first term converges to due to the consistency of numerical integration whereas the second term converges due to the consistency of KPP in Proposition 3.2.

The convergence rate is . The first term is the result of p-th order one-step numerical integration rule (e.g. RK-p) with step size . The second term is triggered by The KPP estimator finite-sample convergence. is the sample size used by the KPP algorithm.