1 Introduction
The development of a metric on datagenerating mechanisms is fundamental in machine learning and the related fields. This is because the development of an algorithm for respective learning problems according to the type of data structures is basically reduced to the design of an appropriate metric or kernel. As for the context of dynamical system, the majority of existing metrics for dynamical systems have been developed with principal angles between some appropriate subspaces such as column subspaces of observability matrices Martin00 ; DeCockDeMoor02 ; VSV07 . On the other hand, several metrics on dynamical systems are developed with transfer operators such as Koopman operator and PerronFrobenius operator. Mezic et al. MB04 ; Mezic16 propose metrics of dynamical systems in the context of ergodic theory via Koopman operators on spaces. Fujii et al. Fujii17 developed metrics with Koopman operators as the generalization of the ones with BinetCauchy theorem proposed by Vishwanathan et al. VSV07 . And, Ishikawa et al. IFI+18 give metrics on nonlinear dynamical systems with PerronFrobenius operators in RKHSs, which generalize the classical ones with principal angles mentioned above.
However, the above existing metrics are basically defined for deterministic dynamical systems. And, to the best of our knowledge, few existing literature has addressed the design of metrics for random dynamical systems or stochastic processes. Vishwanathan et al mentioned their metrics for cases where systems include random noises by taking expectations over the randomness VSV07
. And, Chwialkowski and Gretton developed nonparametric test statistics for random processes by extending the Hilbert Schmidt independence criteria
CG14 .In this paper, we consider a general framework to construct metrics on random nonlinear dynamical systems, which are defined with the PerronFrobenius operators in vectorvalued reproducing kernel Hilbert spaces (vvRKHSs). Here, vvRKHSs are employed to design mathematically manageable metrics and also to introduce valued RKHS, which are necessary to handle the randomness in systems. We first define the PerronFrobenius operators in vvRKHSs and construct a dynamical system in a canonical way. And based on these, we define a metric on random dynamical systems as a positive definite valued kernel. Our metric is a natural extension of existing metrics for deterministic systems, and can give a specification of the kernel maximal mean discrepancy of random processes. Moreover, by considering the timewise independence of random processes, we discuss the connection between our metric and the independence criteria with kernels such as HilbertSchmidt independence criteria. We empirically illustrate our metric using synthetic data from noisy rotation dynamics in the unit disk in the complex plane, and evaluate it in the context of the independence test for random processes.
The remainder of this paper is organized as follows. In Section 2, we first briefly review the notions necessary in this paper such as the PerronFrobenius operators in RKHSs, and a positive definite kernel on random processes by means of the kernel mean embedding. In Section 3, we define the PerronFrobenius operators in vvRKHSs and construct a dynamical system in a canonical way. Then, in Section 4, we give the definition of our metric for random dynamical systems. In Section 5, we describe the connection of our metric to the Hilbert Schmidt independence criteria. Finally, we investigate empirically our metric using synthetic data in Section 6, and then conclude the paper in Section 7. All proofs are given in Appendix A of the Supplementary document.
2 Background
In this section, we briefly review the PerronFrobenius operators in RKHSs in Subsection 2.1, and then describe a straightforward way of defining metric for comparing two random processes with the kernel mean embeddings in Subsection 2.2.
2.1 PerronFrobenius Opertors on RKHSs
Let be a state space and be a positive definite kernel on . For any , we denote by the function on defined by . By MooreAronszajn’ theorem, there exists a unique Hilbert space composed of functions in such that for any , the function is contained in and the reproducing property holds, namely, for any , . The Gaussian kernel for is a typical example of the positive definite kernel, which is used in our empirical illustration (Section 6.1).
Let , , , or . We call a map a dynamical system if and for any and , . For , we define the PerronFrobenius operator by a linear operator with a dense domain, , by . As in the same manner as Proposition 2.1 in IFI+18 , is the adjoint operator of the Koopman operator on , which is a linear operator allocating to . Note that, although the contents in IFI+18 are considered only for the discrete time case, i.e., or , we here consider the general case . Ishikawa et al. IFI+18 define a positive definite kernel using the PerronFrobenius operators for comparing deterministic nonlinear dynamical systems, which generalizes many of the existing metrics for dynamical systems.
2.2 Comparison of Two Random Processes
Here we describe a straightforward method to define a metric for comparing two random processes. It gives a natural positive definite kernel on random processes by means of the kernel mean embedding.
Let
be a probability space, where
is a measurable space, and is a probability measure. Let be a stochastic process with continuous pass (for simplicity, we only consider continuous pass in the case of or ), We define the low of by the pushforward measure on , and denote it by . A basic strategy to define a metric between stochastic processes is to define the metric between the lows of the stochastic processes by means of various types of metric of probability measures such as kernel maximal mean discrepancy (KMMD), Wasserstein distance, and KullbackLeibler (KL) divergence (cf. GS02 ).Let be a positive definite kernel on . For a probability measure , we denote by the kernel mean embedding of , which is given by . Then for two stochastic processes and , we have
If a positive definite kernel on is given, we naturally define a positive definite kernel on by
. As a result, KMMD for random processes is calculated using this inner product as well as the case of random variables.
3 Big Dynamical Systems and PerronFrobenius Operators in vvRKHS Associated with Random Dynamical Systems
In this section, we define a PerronFrobenius operator in vvRKHS,which is a natural generalization of the operator defined in IFI+18 in Subsection 3.1. Vectorvalued RKHSs are employed to introduce positive definite kernels, which are necessary to incorporate the effects of random variables. Then, we introduce the notion of random dynamical systems, and construct a dynamical system in a canonical way in Subsection 3.2. This construction is a natural generalization of the corresponding deterministic case (namely, the case where is an one point set).
3.1 PerronFrobenius Operators for RKHS of Positive Definite Kernels
Let be a set and be a Hilbert space. We denote by the space of bounded linear operators in . We define a positive definite kernel as a map satisfying the following two conditions: (1) for any , , and (2) for any , , and , . We note that “positive definite kernel” is equivalent to “valued kernel of positive type” in CVT06 . We define a linear map by for . We note that positive definite kernel is the equivalent notion to the positive definite kernel in Section 2.1.
For any positive definite kernel, it is well known that there uniquely exists a Hilbert space in such that for any and , , and for any , (see Proposition 2.3 in CVT06 ). We call the vectorvalued reproducing kernel Hilbert space associated with or the valued reproducing kernel Hilbert space associated with .
We note that in the case where is a trace class operator for each , is a positive definite kernel since the trace of is given by , where is an orthonormal basis of .
For any subset and any closed subspace , we define a closed subspace as the closure of .
Definition 3.1.
Let be a RKHS associated with a positive definite kernel on . Let be a subset and let be a dynamical system. Let be a closed subspace. For , the th PerronFrobenius operator is a linear operator with the domain, such that for any .
We note that does not always exist, and its existence rather depends on the choice of the subspace .
We remark the relation between existing operators in vvRKHS and our operator. In the case where a dynamical system is deterministic, discretetime ( or ) and is a positive definite (or )kernel, then is the same one defined in IFI+18 . Let and . In FK19 , they define the Koopman operator for a discretetime dynamical system by the linear operator for , where we put . As stated in the following proposition, their operator is given as the adjoint of the PerronFrobenius operator:
Proposition 3.2.
We have , where is the Koopman operator defined in FK19 (we give a rigorous definition of the Koopman operator in the proof of this proposition).
3.2 Big Dynamical Systems and valued RKHSs Associated with Random Dynamical Systems
We fix a probability space . Here, we construct a big dynamical system and an valued RKHS for a given random dynamical system in a canonical way. This big dynamical system enables us to study random dynamical systems in terms of theories of deterministic dynamical systems. We also need the valued RKHS to scrutinize the random effect in the systems via PerronFrobenius operators and the kernel method. Moreover, we finally describe our PerronFrobenius operators generalize the existing operators for random processes.
We fix a bounded positive definite kernel on and let be the corresponding RKHS.
Let , , , or , and be a state space. We fix a semigroup of measure preserving measurable maps on , namely, such that , , and for all . We note that, for , the Koopman operator induces a bounded and isometric operator on .
Definition 3.3.
Let be an open subset. A random dynamical system on with respect to is a measurable map
such that and for any .
Random dynamical systems include many kinds of stochastic processes, and typically appear as solutions of stochastic differential equations. In the case where is an one point set, a random dynamical system is reduced to a deterministic dynamical system.
Example 3.4.
An autoregressive (AR) model “” is given as a special case of the random dynamical system as follows. Let , , be the classical Wiener measure with , be the Wiener process and . Fix , and for , set . Then
is an i.i.d sequence of probability variables with a Gaussian distribution, and if we define
, then the functionis a random dynamical system with respect to . Therefore, the th sample determined by the AR model “” is given by .
Let be the set of measurable maps such that the Koopman operator induces a bounded linear operator on . For each random dynamical system , we construct a dynamical system in
in a canonical way, which is same as the pushforward of the skew product
(Arno98, , 1.1.8) of the random dynamical system.Definition 3.5.
Let be a random dynamical system in with respect to . We define by , where .
Let be the positive definite kernel fixed at the beginning of this subsection. We define the kernel on by
where and . We note that actually induces a bounded linear operator since we are assuming is a bounded function, and and are bounded operators. Then we obtain the vvRKHS and the PerronFobenius operators for in Section 3.1. We define by , and regard as a closed subspace of
We remark that our operator is a generalization of the Koopman operator defined in CZMM19 , namely, the PerronFrobenius operator completely recovers the classical transfer operator:
Proposition 3.6.
Let be a subspace including constant functions. For any , we have
4 Metrics on Random Dynamical Systems
In this section, we construct a metric to compare two random dynamical systems. At first, we specify the rigorous definition of the domain where our metric is defined, which we call triples of random dynamical systems with respect to . Then we define the metric on the triples of the random dynamical systems. Our metric is given as a positive definite kernel for some Hilbert space (we specify later), namely it is a linear operator on . When we evaluate this metric with a linear functional, for example the trace, it becomes a usual positive definite kernel. In the end of this section, we constraint ourselves to special situations. Then we see our metric gives a generalization of KMMD for random processes introduced in Section 2.2, and define metrics and , which we use in empirical computation in Section 6.
Let be a Borel measure on , and let and be Hilbert spaces. We define triples of a random dynamical system with respect to and by the triple , where is a random dynamical system on , and and are linear operators such that is a HilbertSchmidt operator for some , and for any , the fucntion . We call and an initial value operator and an observable operator, respectively. Intuitively, the operator corresponds to an observable that gives an output at , and describes an initial condition for data. We denote by the set of the triples of random dynamical systems.
Now, we give the definition of our metric on random dynamical systems as follows:
Definition 4.1.
For , we fix a triple . For , , we define a HilbertSchmidt operator by
(1) 
where is the th exterior product (see A of Supplemental of IFI+18 ).
Then we have the following theorem:
Theorem 4.2.
The kernel is a positive definite kernel on for each .
We define by the positive definite kernel on . Then, is the positive definite kernel introduced in IFI+18 , i.e., in the special case where random dynamical systems are not random but deterministic.
Let . In the case of , , , we define
where . By the formula (4) in IFI+18 , we have the computation formula as follows:
(2) 
In particular, our metric is a natural generalization of KMMD for an integral type kernel, which is given through the following theorem:
Theorem 4.3.
Let be an integral type kernel on , then we have
This theorem implies for general , the metric is a reasonable generalization for two random dynamical systems in the context of KMMD. However, the formula (2) needs a heavy computation for higher . To improve the drawback of (2), we construct another metric as follows: Let , and define
By the formula (4) in IFI+18 again, we have the following formula:
(3) 
5 Connection to HilbertSchmidt Independence Criteria
In this section, we argue the relation between our positive definite kernel and HilbertSchmidt independence criteria (HSIC). We first define a independence criterion for random dynamical systems based on the above contexts, and then give its estimator. Our independence criterion measures a pairwise independence of random processes. Thus, although our metric is constructed in the context of the dynamical system, it can be used to extract the information about the independence of two random processes. This is one of the main reasons to introduce the
RKHS above.We first briefly review HSIC here GBSS05 . Let be two random variables. And, let be a positive definite kernel on . Here, we assume that is universal. Also, we define the cross covariance operator by
where are the kernel mean embeddings of the lows of and of , respectively, and we regard any element of as a bounded linear operator on , namely, for any , we define by . We note that via the identification, is equal to the space of HilbertSchmidt operators, in particular, the crosscovariance operator is also a HilbertSchmidt operator. Straightforward computations show that , where . The universality of shows that if and only if and are independent. The HSIC is defined to be , which is the HilbertSchmidt norm of . The value can be estimated via the evaluation of kernel functions over samples (see (GBSS05, , Lemma 1)).
Now, let us consider the independence of random dynamical systems in our context. For , let be a random dynamical system on with respect to . We fix , and let , be random processes. We set and . We impose exists. We define . Set . We define the independent criteria for random dynamical systems by
(4) 
Then, we have the following relation:
Theorem 5.1.
We have
6 Empirical Evaluations
We empirically illustrate how our metric behaves using synthetic data from noisy rotation dynamics on the unit disk in a complex plane in Subsection 6.1, and then evaluate it in the context of the independence test for random processes in Subsection 6.2. The codes for generating the results are included in the supplementary.
6.1 Illustrative Example with Noisy Rotation Dynamics
We used synthetic data from the noisy rotation dynamics on the unit disk in the complex plane defined by a complex number and variance of the noise, i.e. for with and (i.i.d.), . We prepared combination of parameters with and . The graphs in Figure 2 show the deterministic case and 10 independent paths for the case with from the identical initial condition with , where the lines of different colors show different sample paths. Then, we calculated the normalized variant of our metric defined by
(7) 
with empirical approximation of defined in (2). We also define by replacing with , whose empirical estimation is given by (3). For the reason of computational costs, we computed and here. The graphs in Figure 2 show numerical results for several cases. As can be seen in (b), if the number of samples, , for approximating in the definition of is rather small compared with the strength of the noise , it seems that only captures the similarity roughly and judges all dynamics are different. However, this looks improved in (c) where the number of samples is larger. Also, seems to give similar results with the deterministic case. And, as for (d) where the noise level is stronger, again seems to judge all dynamics are different.
6.2 Independence between Two Timeseries Data
We empirically evaluated the effectiveness of our metric as an independence criterion, i.e., . For this purpose, we first generated a pair of complex valued timeseries data with total time by
(8)  
(9) 
We denote by and the generated sequences. Then, we created diferent data pairs as and . From the definitions, and are independent for , and correlated for . The graphs of the lefthand side in Figure 3 show 10 independently generated samples for and , where the lines of different colors show different sample paths. And, the graph of the righthand side in Figure 3 show the calculated with these sample paths.
7 Conclusions
In this paper, we developed a general framework for constructing metrics on random nonlinear dynamical systems with the PerronFrobenius operators in vvRKHSs. vvRKHSs were employed to design mathematically manageable metrics and also to introduce valued kernels, which are necessary to handle the randomness in systems. Our metric is a natural extension of the existing metrics for deterministic systems. Also, we described the connection of our metric to the HilbertSchmidt independence criteria. We empirically showed the effectiveness of our metric using an example of noisy rotation dynamics in the unit disk in the complex plane and evaluated it in the context of the independence test for random processes.
References
 (1) L. Arnold. Random dynamical systems. Springer, 1998.
 (2) E. D. Vito C. Carmeli and A. Toigo. Vector valued reproducing kernel hilbert spaces of integrable functions and mercer theorem. Analysis and Applications, 4(4):377–408, 2006.
 (3) K. Chwialkowski and A. Gretton. A kernel independence test for random processes. Proceedings of the 31st International Conference on Machine Learning, 32(2):1422–1430.
 (4) K. De Cock and B. De Moor. Subspace angles between ARMA models. Systems & Control Letters 46, pages 265–270, 2002.
 (5) N. ČrnjarićŽic, S. Maćešić, and I. Mezić. Koopman operator spectrum for random dynamical systems. arXiv:1711.03146, 2019.
 (6) K. Fujii, Y. Inaba, and Y. Kawahara. Koopman spectral kernels for comparing complex dynamics: Application to multiagent sport plays. In Proc. of the 2017 European Conf. on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECMLPKDD’17), pages 127–139. 2017.
 (7) K. Fujii and Y. Kawahara. Dynamic mode decomposition in vectorvalued reproducing kernel hilbert spaces for extracting dynamical structure among observables. Neural Networks, 2019.
 (8) A. L. Gibbs and F.E. Su. On choosing and bounding probability metrics. International Statistical Review, 70(3):419–435, 2002.
 (9) A. Gretton, O. Bousquet, A. Smola, and B. Scholkopf. Measuring statistical dependence with hilbertschmidt norms. 16th Intern. Conf. Algorithmic Learning Theory, pages 63–77, 2005.
 (10) I. Ishikawa, K. Fujii, M. Ikeda, Y. Hashimoto, and Y. Kawahara. Metric on nonlinear dynamical systems with PerronFrobenius operators. In Advances in Neural Information Processing Systems 31, pages 911–919. 2018.
 (11) R.J. Martin. A metric for ARMA processes. IEEE Trans. Signal Process. 48, page 1164â1170, 2000.
 (12) I. Mezic. Comparison of dynamics of dissipative finite time systems using koopman operator methods. IFACPaperOnline 4918, page 454â461, 2016.
 (13) I. Mezic and A. Banaszuk. Comparison of systems with complex behavior. Physica D, 197:101â133, 2004.

(14)
S.V.N. Vishwanathan, A.J. Smola, and R. Vidal.
BinetCauchy kernels on dynamical systems and its application to
the analysis of dynamic scenes.
Int’l J. of Computer Vision
, 73(1):95–119, 2007.
Appendix A Proofs
a.1 Proposition 3.2
Set . The definition of the Koopman operator for vvRKHS is given as follows: is a linear operator with domain such that for any ,
Then we see that for any , , and ,
Thus we see that .
a.2 Proposition 3.6
For any and , we claim that . . In fact, denote by the right hand side of this claim. Then, by straightforward computations, we have
(10) 
which proves the claim. Since , for any and , we have . By combining this with (10), we see that .
a.3 Theorem 4.2
We denote by the space of integrable valued functions with respect to the measure , where is any Hilbert space. Let . Let . Then we see that the adjoint operator of is given by
In fact, for any , the identities hold:
Therefore, we see that
For general , let . Then we see that . Therefore, we have , and for