Classification and recognition has been one of the main focuses of research in machine learning for the past decades. When dealing with some structural data other than vector-valued ones, the development of an algorithm for this problem according to the type of the structure is basically reduced to the design of an appropriate metric or kernel. However, not much of the existing literature has addressed the design of metrics in the context of dynamical systems. To the best of our knowledge, the metric for ARMA models based on comparing their cepstrum coefficientsMartin00 is one of the first papers to address this problem. De Cock and De Moor extended this to linear state-space models by considering the subspace angles between the observability subspaces DeCock-DeMoor02 . Meanwhile, Vishwanathan et al. developed a family of kernels for dynamical systems based on the Binet-Cauchy theorem VSV07 . Chaudhry and Vidal extended this to incorporate the invariance on initial conditions CV13 .
As mentioned in some of the above literature, the existing metrics for dynamical systems that have been developed are defined with principal angles between some appropriate subspaces such as column subspaces of observability matrices. However, those are basically restricted to linear dynamical systems although Vishwanathan et al. mentioned an extension with reproducing kernels for some specific metrics VSV07 . Recently, Fujii et al. discussed a more general extension of these metrics to nonlinear systems with Koopman operator Fujii17 . Mezic et al. propose metrics of dynamcal systems in the context of ergodic theory via Koopman operators on -spacesMezic16 ; MB04 . The Koopman operator, also known as the composition operator, is a linear operator on an observable for a nonlinear dynamical system Koopman31 . Thus, by analyzing the operator in place of directly nonlinear dynamics, one could extract more easily some properties about the dynamics. In particular, spectral analysis of Koopman operator has attracted attention with its empirical procedure called dynamic mode decomposition (DMD) in a variety of fields of science and engineering RMB+09 ; BSV+15 ; PE15 ; BJOK16 .
In this paper, we develop a general metric for nonlinear dynamical systems, which includes the existing fundamental metrics for dynamical systems mentioned above as its special cases. This metric is defined with Perron-Frobenius operators in reproducing kernel Hilbert spaces (RKHSs), which are shown to be essentially equivalent to Koopman operators, and allows us to compare a pair of datasets that are supposed to be generated from nonlinear systems. We also describe the estimation of our metric from finite data. We empirically illustrate our metric using an example of rotation dynamics in a unit disk in a complex plane, and evaluate the performance with real-world time-series data.
The remainder of this paper is organized as follows. In Section 2, we first briefly review the definition of Koopman operator, especially the one defined in RKHSs. In Section 3, we give the definition of our metric for comparing nonlinear dynamical systems (NLDSs) with Koopman operators and, then, describe the estimation of the metric from finite data. In Section 4, we describe the relation of our metric to the existing ones. In Section 5, we empirically illustrate our metric with synthetic data and evaluate the performance with real-world data. Finally, we conclude this paper in Section 6.
2 Perron-Frobenius operator in RKHS
Consider a discrete-time nonlinear dynamical system with time index and defined on a state space (i.e., ), where is the state vector and is a (possibly, nonlinear) state-transition function. Then, the Koopman operator (also known as the composition operator), which is denoted by here, is a linear operator in a function space defined by the rule
where is an element of . The domain of the Koopman operator is , where denotes the composition of with Koopman31 . The choice of depends on the problem considered. In this paper, we consider as an RKHS. The function is referred as an observable. We see that acts linearly on the function , even though the dynamics defined by may be nonlinear. In recent years, spectral decomposition methods for this operator has attracted attention in a variety of scientific and engineering fields because it could give a global modal description of a nonlinear dynamical system from data. In particular, a variant of estimation algorithms, called dynamic mode decomposition (DMD), has been successfully applied in many real-world problems, such as image processing Kutz16b , neuroscience BJOK16 , and system control Proctor16 . In the community of machine learning, several algorithmic improvements have been investigated by a formulation with reproducing kernels Kawahara16 and in a Bayesian framework Takeishi17b .
Now, let be the RKHS endowed with a dot product and a positive definite kernel (or ), where is a set. Here, is a function space on . The corresponding feature map is denoted by . Also, assume , and define the closed subspace by the closure of the vector space generated by for , i.e. . Then, the Perron-Frobenius operator in RKHS associated with (see Kawahara16 , note that is called Koopman operator on the feature map in the literature), , is defined as a linear operator with dense domain satisfying for all ,
Since is densely defined, there exists the adjoint operator . In the following proposition, we see that is essentially the same as Koopman operator .
Let be the RKHS associated with the positive definite kernel defined by the restriction of to , which is a function space on . Let be a linear isomorphism defined via the restriction of functions from to . Then, we have
where means the Hermitian transpose.
3 Metric on NLDSs with Perron-Frobenius Operators in RKHSs
We propose a general metric for the comparison of nonlinear dynamical systems, which is defined with Perron-Frobenius operators in RKHSs. Intuitively, the metric compares the behaviors of dynamical systems over infinite time. To ensure the convergence property, we consider the ratio of metrics, namely angles instead of directly considering exponential decay terms. We first give the definition in Subsection 3.1, and then derive an estimator of the metric from finite data in Subsection 3.2.
Let be a Hilbert space and a subset. Let be a map, often called an observable. We define the observable operator for by a linear operator such that . We give two examples here: First, in the case of and for some , the observable operator is . This situation appears, for example, in the context of DMD, where observed data is obtained by values of functions in RKHS. Secondly, in the case of and , the observable operator is . This situation appears when we can observe the state space , and we try to get more detailed information by observing data sent to RKHS via the feature map.
Let be a Hilbert space. we refer to as an initial value space. We call a linear operator an initial value operator on if is a bounded operator. Initial value operators are regarded as expressions of initial values in terms of linear operators. In fact, in the case of and let . Let be an initial value operator on , which is a linear operator defined by . Let be a Perron-Frobenius operator associated with a dynamical system . Then for any positive integer , we have , and is a linear operator including information at time of the orbits of the dynamical system with inital values .
Now, we define triples of dynamical systems. A triple of a dynamical system with respect to an initial value space and an observable space is a triple , where the first component is a dynamical system on a subset ( depends on ) with Perron-Frobenius operator , the second component is an observable with an observable operator , and the third component is an initial value operator on , such that for any , the composition is well-defined and a Hilbert Schmidt operator. We denote by the set of triples of dynamical systems with respect to an initial value space and an observable space .
For two triples , and for , we first define
where the symbol is the -th exterior product (see Appendix A). We note that since is bounded, we regard as a unique extension of to a bounded linear operator with domain .
The function is a positive definite kernel on .
See Appendix B ∎
Next, for positive number , we define with by
We remark that for , is a non-negative increasing sequence. Now, we denote by the Banach space of bounded sequences of complex numbers, and define by
Moreover, we introduce Banach limits for elements of . The Banach limit is a bounded linear functional satisfying , for any , and for any non-negative real sequence , namely for all . We remark that if converges a complex number , then for any Banach limit , . The existence of the Banach limits is first introduced by Banach Banach95 and proved through the Hahn-Banach theorem. In general, the Banach limit is not unique.
For an integer and a Banach limit , a positive definite kernel is defined by
We remark that positive definiteness of follows Proposition 3.1 and the properties of the Banach limit. We then simply denote by if converges since that is independent of the choice of .
In general, a Banach limit is hard to compute. However, under some assumption and suitable choice of , we prove that is computable in Proposition 3.6 below. Thus, we obtain an estimation formula of (see Sucheston64 , Sucheston67 , FL06 for other results on the estimation of Banach limit). In the following proposition, we show that we can construct a pseudo-metric from the positive definite kernel :
Let be a Banach limit. For , is a pseudo-metric on .
See Appendix C. ∎
Although we defined with RKHS, it can be defined in a more general situation as follows. Let , and be Hilbert spaces. For , let be a closed subspace, and linear operators, and let be a bounded operator. Then, we can define between the triples and in the similar manner.
3.2 Estimation from finite data
Now we derive an formula to compute the above metric from finite data, which allows us to compare several time-series data generated from dynamical systems just by evaluating the values of kernel functions. First, we argue the computability of and then state the formula for computation. In this section, the initial value space is of finite dimension: , and for . We define a linear operator by . We note that any linear operator is an initial value operator), and, by putting , we have .
Let . We call admissible if there exists ’s eigen-vectors with and for all such that and each is expressed as with , where .
The triple is semi-stable if is admissible and .
Then, we have the following asymptotic properties of .
Let . If and are semi-stable, then the sequence converges and the limit is equal to for any Banach limit . Similarly, let be the Cesàro operator, namely, is defined to be . If and are admissible, then converges and the limit is equal to for any Banach limit with .
See Appendix D. ∎
We note that it is proved that there exists a Banach limit with (SS10, , Theorem 4). The admissible or semi-stable condition holds in many cases, for example, in our illustrative example (Section 5.1).
Now, we derive an estimation formula of the above metric from finite time-series data. To this end, we first need the following lemma:
Let . Then we have the following formula:
See Appendix E. ∎
For , we consider time-series sequences in an observable space (), which are supposed to be generated from dynamical system on and observed via . That is, we consider, for , , and ,
Assume for , the triple is in . Then, from Lemma 3.7, we have
In the case of and , we see that . Therefore, by Proposition 3.6, if ’s are semi-stable or admissible, then we can compute an convergent estimator of through just by evaluating the values of kernel functions.
4 Relation to Existing Metrics on Dynamical Systems
In this section, we show that our metric covers the existing metrics defined in the previous works Martin00 ; DeCock-DeMoor02 ; VSV07 . That is, we describe the relation to the metric via subspace angles and Martin’s metric in Subsection 4.1 and the one to the Binet-Chaucy metric for dynamical systems in Subsection 4.2 as the special cases of our metric.
4.1 Relation to metric via principal angles and Martin’s metric
In this subsection, we show that in a certain situation, our metric reconstruct the metric (Definition 2 in Martin00 ) for the ARMA models introduced by Martin Martin00 and DeCock-DeMoor DeCock-DeMoor02
. Moreover, our formula generalizes their formula to the non-stable case, that is, we do not need to assume the eigenvalues are strictly smaller than 1.
We here consider two linear dynamical systems. That is, in Eqs. (3), let and be linear maps for with , which we respectively denote by and . Then, De Cock and De Moor propose to compare these two models by using the subspace angles as
where is the -th subspace angle between the column spaces of the extended observability matrices for . Meanwhile, Martin define a distance on AR models via cepstrum coefficients, which is later shown to be equivalent to the distance (5) DeCock-DeMoor02 .
Now, we regard . The positive definite kernel here is the usual inner product of and the associated RKHS is canonically isomorphic to . Let and . Note that for , , and for any linear maps and , and .
Then we have the following theorem:
The sequence converges. In the case that the systems are observable and stable, this limit is essentially equal to (5).
See Appendix F. ∎
Therefore, we can define a metric between linear dynamical systems with and by .
Moreover, the value captures an important characteristic of behavior of dynamical systems. We here illustrate it in the situation where the state space models come from AR models. We will see that has a sensitive behavior on the unit circle, and gives a reasonable generalization of Martin’s metric Martin00 to the non-stable case.
For , we consider an observable AR model:
where for . Let , and let be the companion matrix for . And, let be the roots of the equation . For simplicity, we assume these roots are distinct complex numbers. Then, we define
As a result, if , , and , we have
and, otherwise, . The detail of the derivation is in Appendix G.
Through this metric, we can observe a kind of “phase transition” of linear dynamical systems on the unit circle, and the metric has sensitive behavior when eigen values on it. We note that in the case of, the formula (7) is essentially equivalent to the distance (5) (see Theorem 4 in DeCock-DeMoor02 ).
4.2 Relation to the Binet-Cauchy metric on dynamical systems
Here, we discuss the relation between our metric and the Binet-Cauchy kernels on dynamical systems defined by Vishwanathan et al. (VSV07, , Section 5). Let us consider two linear dynamical systems as in Subsection 4.1. In (VSV07, , Section 5), they give two kernels to measure the distance between two systems (for simplicity, here we disregard the expectations over variables); the trace kernels and the determinant kernels , which are respectively defined by
where is a positive number satisfying to make the limits convergent. And and are initial state vectors, which affect the kernel values through the evolutions of the observation sequences. Vishwanathan et al. discussed a way of removing the effect of initial values by taking expectations over those by assuming some distributions.
These kernels can be described in terms of our notation as follows (see also Remark 3.3). That is, let us regard . For , we define , and . Then these are described as
Note that, introducing the exponential discounting is a way to construct a mathematically valid kernel to compare dynamical systems. However, in a certain situation, this method does not work effectively. In fact, if we consider three dynamical systems on : fix a small positive number and let , , and be linear dynamical systems. We choose as the initial value. Here, it would be natural to regard these dynamical systems are "different" each other even with almost zero . However, if we compute the kernel defined via the exponential discounting, these dynamical systems are judged to be similar or almost the same. Instead of introducing such an exponential discounting, our idea to construct a mathematically valid kernel is considering the limit of the ratio of kernels defined via finite series of the orbits of dynamical systems. As a consequence, we do not need to introduce the exponential discounting. It enables ones to deal with a wide range of dynamical systems, and capture the difference of the systems effectively. In fact, in the above example, our kernel judges these dynamical systems are completely different, i.e., the value of for each pair among them takes zero.
5 Empirical Evaluations
We empirically illustrate how our metric works with synthetic data of the rotation dynamics on the unit disk in a complex plane in Subsection 5.1, and then evaluate the discriminate performance of our metric with real-world time-series data in Subsection 5.2.
5.1 Illustrative example: Rotation on the unit disk
We use the rotation dynamics on the unit disk in the complex plane since we can compute the analytic solution of our metric for this dynamics. Here, we regard and let be the Szegö kernel for . The corresponding RKHS is the space of holomorphic functions on with the Taylor expansion such that . For , the inner product is defined by . Let and .
For with , let . We denote by the Koopman operator for RKHS defined by . We note that since is the adjoint of the composition operator defined by , by Littlewood subordination theorem, is bounded. Now, we define and . Then we define and .
For we have
where, is a positive scalar value described in Appendix I. From the above, we see that depends on the initial values of and , but could independently discriminate the dynamics.
Next, we show empirical results with Eq. (3.2) from finite data for this example.111The Matlab code is available at https://github.com/keisuke198619/metricNLDS For , we consider , where . And for , we consider and . The graphs in Figure 3 show the dynamics on the unit disk with and . For simplicity, all of the initial values were set so that .
), using radial basis function (Gaussian) kernel (Figure3, 3, 3, and 3), and the comparable previous metric (Figure 3 and 3) Fujii17 . For the Gaussian kernel, the kernel width was set as the median of the distances from data. The last metric called Koopman spectral kernels Fujii17 generalized the kernel defined by Vishwanathan et al. VSV07 to the nonlinear dynamical systems and outperformed the method. Among the above kernels, we used Koopman kernel of principal angle () between the subspaces of the estimated Koopman mode, showing the best discriminative performance Fujii17 .
The discriminative performance in when shown in Figure 3 converged to the analytic solution when considering in Figure 3 compared with that when in Figure 3. As guessed from the theoretical results, although did not discriminate the difference between the dynamics converging to the origin while rotating and that converging linearly, in Figure 3 did. using the Gaussian kernel () in Figure 3 achieved almost perfect discrimination, whereas using Gaussian kernel () in Figure 3 and in Figure 3 did not. Also, we examined the case of small initial values in Figure 3-3 so that for all the dynamics. (Figure 3, 3) discriminated the two dynamics, whereas the remaining metrics did not (Figure 3, 3, and 3).
5.2 Real-world time-series data
In this section, we evaluated our algorithm for discrimination using dynamical properties in time-series datasets from various real-world domains. We used the UCR time series classification archive as open-source real-world data Chen15 . It should be noted that our algorithm in this paper primarily target the deterministic dynamics; therefore, we selected the examples apparently with smaller noises and derived from some dynamics (For random dynamical systems, see e.g., Mezic05 ; Williams15 ; Takeishi17a ). From the above viewpoints, we selected two Sony AIBO robot surface (sensor data), star light curve (sensor data), computers (device data) datasets. We used by Proposition 3.6 because we confirmed that the data satisfying the semi-stable condition in Definition 3.5 using the approximation of defined in Kawahara16 .
We compared the discriminative performances by embedding of the distance matrices computed by the proposed metric and the conventional Koopman spectral kernel used above. For clear visualization, we randomly selected 20 sequences for each label from validation data, because our algorithms do not learn any hyper-parameters using training data. All of these data are one-dimensional time-series but for comparison, we used time-delay coordinates to create two-dimensional augmented time-series matrices. Note that it would be difficult to apply the basic estimation methods of Koopman modes assuming high-dimensional data, such as DMD and its variants. In addition, we evaluated the classification error using
-nearest neighbor classifier () for simplicity. We used 40 sequences for each label and computed averaged 10-fold cross-validation error (over 10 random trials).
Figure 4 shows examples of the embedding of the , , and using t-SNE Maaten08 for four time-series data. In the Sony AIBO robot surface datasets, D in Figure 4a,b,e,f (classification error: 0.025, 0.038, 0.213, and 0.150) had better discriminative performance than in Figure 4i,j (0.100 and 0.275). This tendency was also observed in the star light curve dataset in Figure 4c,g,k (0.150, 0.150, and 0.217), where one class (circle) was perfectly discriminated using and but the distinction in the remaining two class was less obvious. In computers dataset, , and in Figure 4h,l (0.450 and 0.450) show slightly better discrimination than in Figure 4d (0.500).
In this paper, we developed a general metric for comparing nonlinear dynamical systems that is defined with Koopman operator in RKHSs. We described that our metric includes Martin’s metric and Binet-Cauchy kernels for dynamical systems as its special cases. We also described the estimation of our metric from finite data. Finally, we empirically showed the effectiveness of our metric using an example of rotation dynamics in a unit disk in a complex plane and real-world time-series data.
Several perspectives to be further investigated related to this work would exist. For example, it would be interesting to see discriminate properties of the metric in more details with specific algorithms. Also, it would be important to develop models for prediction or dimensionality reduction for nonlinear time-series data based on mathematical schemes developed in this paper.
- (1) S. Banach. Théorie des óperations linéaires. Chelsea Publishing Co., 1995.
- (2) E. Berger, M. Sastuba, D. Vogt, B. Jung, and H.B. Amor. Estimation of perturbations in robotic behavior using dynamic mode decomposition. Advanced Robotics, 29(5):331–343, 2015.
- (3) B.W. Brunton, J.A. Johnson, J.G. Ojemann, and J