Metric on random dynamical systems with vector-valued reproducing kernel Hilbert spaces

The development of a metric on structural data-generating mechanisms is fundamental in machine learning and the related fields. In this paper, we consider a general framework to construct metrics on random nonlinear dynamical systems, which are defined with the Perron-Frobenius operators in vector-valued reproducing kernel Hilbert spaces (vvRKHSs). Here, vvRKHSs are employed to design mathematically manageable metrics and also to introduce L^2(Ω)-valued kernels, which are necessary to handle the randomness in systems. Our metric is a natural extension of existing metrics for deterministic systems, and can give a specification of the kernel maximal mean discrepancy of random processes. Moreover, by considering the time-wise independence of random processes, we discuss the connection between our metric and the independence criteria with kernels such as Hilbert-Schmidt independence criteria. We empirically illustrate our metric with synthetic data, and evaluate it in the context of the independence test for random processes.

Authors

• 13 publications
• 5 publications
• 16 publications
• 28 publications
05/31/2018

Metric on Nonlinear Dynamical Systems with Koopman Operators

The development of a metric for structural data is a long-term problem i...
03/02/2020

Analysis via Orthonormal Systems in Reproducing Kernel Hilbert C^*-Modules and Applications

Kernel methods have been among the most popular techniques in machine le...
05/31/2021

Control Occupation Kernel Regression for Nonlinear Control-Affine Systems

This manuscript presents an algorithm for obtaining an approximation of ...
09/19/2021

Topology, Convergence, and Reconstruction of Predictive States

Predictive equivalence in discrete stochastic processes have been applie...
06/25/2016

Large-Scale Kernel Methods for Independence Testing

Representations of probability measures in reproducing kernel Hilbert sp...
06/06/2016

Unsupervised classification of children's bodies using currents

Object classification according to their shape and size is of key import...
11/19/2014

Learning nonparametric differential equations with operator-valued kernels and gradient matching

Modeling dynamical systems with ordinary differential equations implies ...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

The development of a metric on data-generating mechanisms is fundamental in machine learning and the related fields. This is because the development of an algorithm for respective learning problems according to the type of data structures is basically reduced to the design of an appropriate metric or kernel. As for the context of dynamical system, the majority of existing metrics for dynamical systems have been developed with principal angles between some appropriate subspaces such as column subspaces of observability matrices Martin00 ; DeCock-DeMoor02 ; VSV07 . On the other hand, several metrics on dynamical systems are developed with transfer operators such as Koopman operator and Perron-Frobenius operator. Mezic et al.  MB04 ; Mezic16 propose metrics of dynamical systems in the context of ergodic theory via Koopman operators on -spaces. Fujii et al. Fujii17 developed metrics with Koopman operators as the generalization of the ones with Binet-Cauchy theorem proposed by Vishwanathan et al. VSV07 . And, Ishikawa et al. IFI+18 give metrics on nonlinear dynamical systems with Perron-Frobenius operators in RKHSs, which generalize the classical ones with principal angles mentioned above.

However, the above existing metrics are basically defined for deterministic dynamical systems. And, to the best of our knowledge, few existing literature has addressed the design of metrics for random dynamical systems or stochastic processes. Vishwanathan et al mentioned their metrics for cases where systems include random noises by taking expectations over the randomness VSV07

. And, Chwialkowski and Gretton developed non-parametric test statistics for random processes by extending the Hilbert Schmidt independence criteria

CG14 .

In this paper, we consider a general framework to construct metrics on random nonlinear dynamical systems, which are defined with the Perron-Frobenius operators in vector-valued reproducing kernel Hilbert spaces (vvRKHSs). Here, vvRKHSs are employed to design mathematically manageable metrics and also to introduce -valued RKHS, which are necessary to handle the randomness in systems. We first define the Perron-Frobenius operators in vvRKHSs and construct a dynamical system in a canonical way. And based on these, we define a metric on random dynamical systems as a positive definite -valued kernel. Our metric is a natural extension of existing metrics for deterministic systems, and can give a specification of the kernel maximal mean discrepancy of random processes. Moreover, by considering the time-wise independence of random processes, we discuss the connection between our metric and the independence criteria with kernels such as Hilbert-Schmidt independence criteria. We empirically illustrate our metric using synthetic data from noisy rotation dynamics in the unit disk in the complex plane, and evaluate it in the context of the independence test for random processes.

The remainder of this paper is organized as follows. In Section 2, we first briefly review the notions necessary in this paper such as the Perron-Frobenius operators in RKHSs, and a positive definite kernel on random processes by means of the kernel mean embedding. In Section 3, we define the Perron-Frobenius operators in vvRKHSs and construct a dynamical system in a canonical way. Then, in Section 4, we give the definition of our metric for random dynamical systems. In Section 5, we describe the connection of our metric to the Hilbert Schmidt independence criteria. Finally, we investigate empirically our metric using synthetic data in Section 6, and then conclude the paper in Section 7. All proofs are given in Appendix A of the Supplementary document.

2 Background

In this section, we briefly review the Perron-Frobenius operators in RKHSs in Subsection 2.1, and then describe a straightforward way of defining metric for comparing two random processes with the kernel mean embeddings in Subsection 2.2.

2.1 Perron-Frobenius Opertors on RKHSs

Let be a state space and be a positive definite kernel on . For any , we denote by the function on defined by . By Moore-Aronszajn’ theorem, there exists a unique Hilbert space composed of functions in such that for any , the function is contained in and the reproducing property holds, namely, for any , . The Gaussian kernel for is a typical example of the positive definite kernel, which is used in our empirical illustration (Section 6.1).

Let , , , or . We call a map a dynamical system if and for any and , . For , we define the Perron-Frobenius operator by a linear operator with a dense domain, , by . As in the same manner as Proposition 2.1 in IFI+18 , is the adjoint operator of the Koopman operator on , which is a linear operator allocating to . Note that, although the contents in IFI+18 are considered only for the discrete time case, i.e., or , we here consider the general case . Ishikawa et al. IFI+18 define a positive definite kernel using the Perron-Frobenius operators for comparing deterministic nonlinear dynamical systems, which generalizes many of the existing metrics for dynamical systems.

2.2 Comparison of Two Random Processes

Here we describe a straightforward method to define a metric for comparing two random processes. It gives a natural positive definite kernel on random processes by means of the kernel mean embedding.

Let

be a probability space, where

is a measurable space, and is a probability measure. Let be a stochastic process with continuous pass (for simplicity, we only consider continuous pass in the case of or ), We define the low of by the push-forward measure on , and denote it by . A basic strategy to define a metric between stochastic processes is to define the metric between the lows of the stochastic processes by means of various types of metric of probability measures such as kernel maximal mean discrepancy (KMMD), Wasserstein distance, and Kullback-Leibler (KL) divergence (cf. GS02 ).

Let be a positive definite kernel on . For a probability measure , we denote by the kernel mean embedding of , which is given by . Then for two stochastic processes and , we have

 ⟨μL(X),μL(Y)⟩Hκ=∫Ωκ(X(⋅,ω),Y(⋅,η))dP(ω)dP(η).

If a positive definite kernel on is given, we naturally define a positive definite kernel on by

. As a result, KMMD for random processes is calculated using this inner product as well as the case of random variables.

3 Big Dynamical Systems and Perron-Frobenius Operators in vvRKHS Associated with Random Dynamical Systems

In this section, we define a Perron-Frobenius operator in vvRKHS,which is a natural generalization of the operator defined in IFI+18 in Subsection 3.1. Vector-valued RKHSs are employed to introduce positive definite -kernels, which are necessary to incorporate the effects of random variables. Then, we introduce the notion of random dynamical systems, and construct a dynamical system in a canonical way in Subsection 3.2. This construction is a natural generalization of the corresponding deterministic case (namely, the case where is an one point set).

3.1 Perron-Frobenius Operators for RKHS of Positive Definite V-Kernels

Let be a set and be a Hilbert space. We denote by the space of bounded linear operators in . We define a positive definite -kernel as a map satisfying the following two conditions: (1) for any , , and (2) for any , , and , . We note that “positive definite -kernel” is equivalent to “-valued kernel of positive type” in CVT06 . We define a linear map by for . We note that positive definite -kernel is the equivalent notion to the positive definite kernel in Section 2.1.

For any positive definite -kernel, it is well known that there uniquely exists a Hilbert space in such that for any and , , and for any , (see Proposition 2.3 in CVT06 ). We call the vector-valued reproducing kernel Hilbert space associated with or the -valued reproducing kernel Hilbert space associated with .

We note that in the case where is a trace class operator for each , is a positive definite kernel since the trace of is given by , where is an orthonormal basis of .

For any subset and any closed subspace , we define a closed subspace as the closure of .

Definition 3.1.

Let be a RKHS associated with a positive definite -kernel on . Let be a subset and let be a dynamical system. Let be a closed subspace. For , the -th Perron-Frobenius operator is a linear operator with the domain, such that for any .

We note that does not always exist, and its existence rather depends on the choice of the subspace .

We remark the relation between existing operators in vvRKHS and our operator. In the case where a dynamical system is deterministic, discrete-time ( or ) and is a positive definite (or )-kernel, then is the same one defined in IFI+18 . Let and . In FK19 , they define the Koopman operator for a discrete-time dynamical system by the linear operator for , where we put . As stated in the following proposition, their operator is given as the adjoint of the Perron-Frobenius operator:

Proposition 3.2.

We have , where is the Koopman operator defined in FK19 (we give a rigorous definition of the Koopman operator in the proof of this proposition).

3.2 Big Dynamical Systems and L2(Ω)-valued RKHSs Associated with Random Dynamical Systems

We fix a probability space . Here, we construct a big dynamical system and an -valued RKHS for a given random dynamical system in a canonical way. This big dynamical system enables us to study random dynamical systems in terms of theories of deterministic dynamical systems. We also need the -valued RKHS to scrutinize the random effect in the systems via Perron-Frobenius operators and the kernel method. Moreover, we finally describe our Perron-Frobenius operators generalize the existing operators for random processes.

We fix a bounded positive definite kernel on and let be the corresponding RKHS.

Let , , , or , and be a state space. We fix a semi-group of measure preserving measurable maps on , namely, such that , , and for all . We note that, for , the Koopman operator induces a bounded and isometric operator on .

Definition 3.3.

Let be an open subset. A random dynamical system on with respect to is a measurable map

 Φ:T×Ω×M→M

such that and for any .

Random dynamical systems include many kinds of stochastic processes, and typically appear as solutions of stochastic differential equations. In the case where is an one point set, a random dynamical system is reduced to a deterministic dynamical system.

Example 3.4.

An auto-regressive (AR) model “” is given as a special case of the random dynamical system as follows. Let , , be the classical Wiener measure with , be the Wiener process and . Fix , and for , set . Then

is an i.i.d sequence of probability variables with a Gaussian distribution, and if we define

, then the function

 Φ(n,ω,x):=A(A(⋯(Ax+v0(ω))+v1(ω))+⋯)+vn−1(ω))

is a random dynamical system with respect to . Therefore, the -th sample determined by the AR model “” is given by .

Let be the set of measurable maps such that the Koopman operator induces a bounded linear operator on . For each random dynamical system , we construct a dynamical system in

in a canonical way, which is same as the push-forward of the skew product

(Arno98, , 1.1.8) of the random dynamical system.

Definition 3.5.

Let be a random dynamical system in with respect to . We define by , where .

Let be the positive definite kernel fixed at the beginning of this subsection. We define the -kernel on by

 k(γ,γ′)f(ω):=k(γX(ω),γ′X(ω))f(ω),

where and . We note that actually induces a bounded linear operator since we are assuming is a bounded function, and and are bounded operators. Then we obtain the vvRKHS and the Perron-Fobenius operators for in Section 3.1. We define by , and regard as a closed subspace of

We remark that our operator is a generalization of the Koopman operator defined in CZMM19 , namely, the Perron-Frobenius operator completely recovers the classical transfer operator:

Proposition 3.6.

Let be a subspace including constant functions. For any , we have

 ι∗kKtφ,Wιk(kx)=∫ΩkΦ(t,ω,x)dP(ω).

4 Metrics on Random Dynamical Systems

In this section, we construct a metric to compare two random dynamical systems. At first, we specify the rigorous definition of the domain where our metric is defined, which we call triples of random dynamical systems with respect to . Then we define the metric on the triples of the random dynamical systems. Our metric is given as a positive definite -kernel for some Hilbert space (we specify later), namely it is a linear operator on . When we evaluate this metric with a linear functional, for example the trace, it becomes a usual positive definite kernel. In the end of this section, we constraint ourselves to special situations. Then we see our metric gives a generalization of KMMD for random processes introduced in Section 2.2, and define metrics and , which we use in empirical computation in Section 6.

Let be a Borel measure on , and let and be Hilbert spaces. We define triples of a random dynamical system with respect to and by the triple , where is a random dynamical system on , and and are linear operators such that is a Hilbert-Schmidt operator for some , and for any , the fucntion . We call and an initial value operator and an observable operator, respectively. Intuitively, the operator corresponds to an observable that gives an output at , and describes an initial condition for data. We denote by the set of the triples of random dynamical systems.

Now, we give the definition of our metric on random dynamical systems as follows:

Definition 4.1.

For , we fix a triple . For , , we define a Hilbert-Schmidt operator by

 K(m)k(D1,D2):=m⋀∫T(L2Ktφ2,W2I2)∗L1Ktφ1,W1I1dν(t), (1)

where is the -th exterior product (see A of Supplemental of IFI+18 ).

Then we have the following theorem:

Theorem 4.2.

The kernel is a positive definite -kernel on for each .

We define by the positive definite kernel on . Then, is the positive definite kernel introduced in IFI+18 , i.e., in the special case where random dynamical systems are not random but deterministic.

Let . In the case of , , , we define

 lTm((Φ1,x1),(Φ2,x2)):=KTm((ι∗k,Φ1,I1),(ι∗k,Φ2,I2)),

where . By the formula (4) in IFI+18 , we have the computation formula as follows:

 lTm((Φ1,x1),(Φ2,x2))=∫[0,T]m×Ωm×Ωmdet(k(Φ1(ti,ωi,xi1),Φ2(tj,ηj,xj2)))i,j=1,…,mdtdωdη. (2)

In particular, our metric is a natural generalization of KMMD for an integral type kernel, which is given through the following theorem:

Theorem 4.3.

Let be an integral type kernel on , then we have

 lT1((Φ1,x1),(Φ2,x2))=⟨μL(Φ1(⋅,⋅,x11)),μL(Φ2(⋅,⋅,x12))⟩Hκ.

This theorem implies for general , the metric is a reasonable generalization for two random dynamical systems in the context of KMMD. However, the formula (2) needs a heavy computation for higher . To improve the drawback of (2), we construct another metric as follows: Let , and define

 ˜lTm((Φ1,x1),(Φ2,x2)):=KTm((idHk,Φ1,I1),(idHk,Φ2,I2)).

By the formula (4) in IFI+18 again, we have the following formula:

 ˜lTm((Φ1,x1),(Φ2,x2))=∫[0,T]m×Ωmdet(k(Φ1(ti,ωi,xi1),Φ2(tj,ωj,xj2)))i,j=1,…,mdtdP(ω). (3)

5 Connection to Hilbert-Schmidt Independence Criteria

In this section, we argue the relation between our positive definite kernel and Hilbert-Schmidt independence criteria (HSIC). We first define a independence criterion for random dynamical systems based on the above contexts, and then give its estimator. Our independence criterion measures a pairwise independence of random processes. Thus, although our metric is constructed in the context of the dynamical system, it can be used to extract the information about the independence of two random processes. This is one of the main reasons to introduce the

-RKHS above.

We first briefly review HSIC here GBSS05 . Let be two random variables. And, let be a positive definite kernel on . Here, we assume that is universal. Also, we define the cross covariance operator by

 Ck(X,Y):=∫Ω(kX(ω)−μX)⊗(kY(ω)−μY)dP(ω),

where are the kernel mean embeddings of the lows of and of , respectively, and we regard any element of as a bounded linear operator on , namely, for any , we define by . We note that via the identification, is equal to the space of Hilbert-Schmidt operators, in particular, the cross-covariance operator is also a Hilbert-Schmidt operator. Straightforward computations show that , where . The universality of shows that if and only if and are independent. The HSIC is defined to be , which is the Hilbert-Schmidt norm of . The value can be estimated via the evaluation of kernel functions over samples (see (GBSS05, , Lemma 1)).

Now, let us consider the independence of random dynamical systems in our context. For , let be a random dynamical system on with respect to . We fix , and let , be random processes. We set and . We impose exists. We define . Set . We define the independent criteria for random dynamical systems by

 Ck((Φ1,x1,ν1),(Φ2,x2,ν2)):=tr(K(1)k(D1,D1)K(1)k(D2,D2)). (4)

Then, we have the following relation:

Theorem 5.1.

We have

 Ck((Φ1,x1,ν1),(Φ2,x2,ν2))=∫T∫TCk(Xs,Yt)dν1(s)dν2(t).

Next, we consider the estimation of . Let and be independent sample passes for and , respectively. Put or , or , , and . We define

 ˆk(n)D(t;x,y):=k(x,y)−1nn∑i=1k(x,Z(i)t)−1nn∑i=1k(Z(i)t,y)+1n2n∑i,j=1k(Z(i)t,Z(j)t), (5)

Let be a matrix of size . By Theorem 5.1, and Theorem 1 in GBSS05 , we have an estimator of as

 ˆCk((Φ1,x1,ν1),(Φ2,x2,ν2)):=1(n−1)2#S1#S2∑(s,t)∈S1×S2tr(G(n)k,D1(s)G(n)k,D2(t)), (6)

where and are finite samples according to and , respectively.

6 Empirical Evaluations

We empirically illustrate how our metric behaves using synthetic data from noisy rotation dynamics on the unit disk in a complex plane in Subsection 6.1, and then evaluate it in the context of the independence test for random processes in Subsection 6.2. The codes for generating the results are included in the supplementary.

6.1 Illustrative Example with Noisy Rotation Dynamics

We used synthetic data from the noisy rotation dynamics on the unit disk in the complex plane defined by a complex number and variance of the noise, i.e. for with and (i.i.d.), . We prepared combination of parameters with and . The graphs in Figure 2 show the deterministic case and 10 independent paths for the case with from the identical initial condition with , where the lines of different colors show different sample paths. Then, we calculated the normalized variant of our metric defined by

 LTm((Φ1,x1),(Φ2,x2))=limϵ→+0|lTm((Φ1,x1),(Φ2,x2))+ϵ|2|lTm((Φ1,x1),(Φ2,x2))+ϵ|⋅|lTm((Φ1,x1),(Φ2,x2))+ϵ| (7)

with empirical approximation of defined in (2). We also define by replacing with , whose empirical estimation is given by (3). For the reason of computational costs, we computed and here. The graphs in Figure 2 show numerical results for several cases. As can be seen in (b), if the number of samples, , for approximating in the definition of is rather small compared with the strength of the noise , it seems that only captures the similarity roughly and judges all dynamics are different. However, this looks improved in (c) where the number of samples is larger. Also, seems to give similar results with the deterministic case. And, as for (d) where the noise level is stronger, again seems to judge all dynamics are different.

6.2 Independence between Two Time-series Data

We empirically evaluated the effectiveness of our metric as an independence criterion, i.e., . For this purpose, we first generated a pair of complex valued time-series data with total time by

 xt+1=0.9e2πi3xt(1−xt)+(ϵXt+iδXt),ϵXt,δXt∼0.1∗N(0,1), and (8) yt+1=0.3e2πi3yt(1−yt)+(ϵYt+iδYt),ϵYt,δYt∼0.1∗N(0,1). (9)

We denote by and the generated sequences. Then, we created diferent data pairs as and . From the definitions, and are independent for , and correlated for . The graphs of the left-hand side in Figure 3 show 10 independently generated samples for and , where the lines of different colors show different sample paths. And, the graph of the right-hand side in Figure 3 show the calculated with these sample paths.

7 Conclusions

In this paper, we developed a general framework for constructing metrics on random nonlinear dynamical systems with the Perron-Frobenius operators in vvRKHSs. vvRKHSs were employed to design mathematically manageable metrics and also to introduce -valued kernels, which are necessary to handle the randomness in systems. Our metric is a natural extension of the existing metrics for deterministic systems. Also, we described the connection of our metric to the Hilbert-Schmidt independence criteria. We empirically showed the effectiveness of our metric using an example of noisy rotation dynamics in the unit disk in the complex plane and evaluated it in the context of the independence test for random processes.

References

• (1) L. Arnold. Random dynamical systems. Springer, 1998.
• (2) E. D. Vito C. Carmeli and A. Toigo. Vector valued reproducing kernel hilbert spaces of integrable functions and mercer theorem. Analysis and Applications, 4(4):377–408, 2006.
• (3) K. Chwialkowski and A. Gretton. A kernel independence test for random processes. Proceedings of the 31st International Conference on Machine Learning, 32(2):1422–1430.
• (4) K. De Cock and B. De Moor. Subspace angles between ARMA models. Systems & Control Letters 46, pages 265–270, 2002.
• (5) N. Črnjarić-Žic, S. Maćešić, and I. Mezić. Koopman operator spectrum for random dynamical systems. arXiv:1711.03146, 2019.
• (6) K. Fujii, Y. Inaba, and Y. Kawahara. Koopman spectral kernels for comparing complex dynamics: Application to multiagent sport plays. In Proc. of the 2017 European Conf. on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML-PKDD’17), pages 127–139. 2017.
• (7) K. Fujii and Y. Kawahara. Dynamic mode decomposition in vector-valued reproducing kernel hilbert spaces for extracting dynamical structure among observables. Neural Networks, 2019.
• (8) A. L. Gibbs and F.E. Su. On choosing and bounding probability metrics. International Statistical Review, 70(3):419–435, 2002.
• (9) A. Gretton, O. Bousquet, A. Smola, and B. Scholkopf. Measuring statistical dependence with hilbert-schmidt norms. 16th Intern. Conf. Algorithmic Learning Theory, pages 63–77, 2005.
• (10) I. Ishikawa, K. Fujii, M. Ikeda, Y. Hashimoto, and Y. Kawahara. Metric on nonlinear dynamical systems with Perron-Frobenius operators. In Advances in Neural Information Processing Systems 31, pages 911–919. 2018.
• (11) R.J. Martin. A metric for ARMA processes. IEEE Trans. Signal Process. 48, page 1164â1170, 2000.
• (12) I. Mezic. Comparison of dynamics of dissipative finite- time systems using koopman operator methods. IFAC-PaperOnline 49-18, page 454â461, 2016.
• (13) I. Mezic and A. Banaszuk. Comparison of systems with complex behavior. Physica D, 197:101â133, 2004.
• (14) S.V.N. Vishwanathan, A.J. Smola, and R. Vidal. Binet-Cauchy kernels on dynamical systems and its application to the analysis of dynamic scenes.

Int’l J. of Computer Vision

, 73(1):95–119, 2007.

Appendix A Proofs

a.1 Proposition 3.2

Set . The definition of the Koopman operator for vvRKHS is given as follows: is a linear operator with domain such that for any ,

 Kkh=h∘f.

Then we see that for any , , and ,

 ⟨K1f,Ckxv,h⟩Hk =⟨h(f(x)),v⟩Hk=⟨kxv,Kkh⟩Hk.

Thus we see that .

a.2 Proposition 3.6

For any and , we claim that . . In fact, denote by the right hand side of this claim. Then, by straightforward computations, we have

 ⟨kx,ι∗k(kγv)⟩Hk =⟨kx∘γX,v⟩L2(Ω) =∫Ωk(x,γX(ω))¯¯¯¯¯¯¯¯¯¯v(ω)dP(ω) =⟨kx,α⟩H, (10)

which proves the claim. Since , for any and , we have . By combining this with (10), we see that .

a.3 Theorem 4.2

We denote by the space of -integrable -valued functions with respect to the measure , where is any Hilbert space. Let . Let . Then we see that the adjoint operator of is given by

 R∗Dih=∫TQi(t)∗h(t)dν(t).

In fact, for any , the identities hold:

 ⟨R∗Dih,v⟩Hin =∫T⟨h(t),Qi(t)v⟩Hobdν(t)

Therefore, we see that

 K(1)k(D1,D2)=R∗D2RD1.

For general , let . Then we see that . Therefore, we have