learning-kis
Demo implementation of Learning Koopman Invariant Subspaces for Dynamic Mode Decomposition
view repo
Spectral decomposition of the Koopman operator is attracting attention as a tool for the analysis of nonlinear dynamical systems. Dynamic mode decomposition is a popular numerical algorithm for Koopman spectral analysis; however, we often need to prepare nonlinear observables manually according to the underlying dynamics, which is not always possible since we may not have any a priori knowledge about them. In this paper, we propose a fully data-driven method for Koopman spectral analysis based on the principle of learning Koopman invariant subspaces from observed data. To this end, we propose minimization of the residual sum of squares of linear least-squares regression to estimate a set of functions that transforms data into a form in which the linear regression fits well. We introduce an implementation with neural networks and evaluate performance empirically using nonlinear dynamical systems and applications.
READ FULL TEXT VIEW PDFDemo implementation of Learning Koopman Invariant Subspaces for Dynamic Mode Decomposition
A variety of time-series data are generated from nonlinear dynamical systems, in which a state evolves according to a nonlinear map or differential equation. In summarization, regression, or classification of such time-series data, precise analysis of the underlying dynamical systems provides valuable information to generate appropriate features and to select an appropriate computation method. In applied mathematics and physics, the analysis of nonlinear dynamical systems has received significant interest because a wide range of complex phenomena, such as fluid flows and neural signals, can be described in terms of nonlinear dynamics. A classical but popular view of dynamical systems is based on state space models, wherein the behavior of the trajectories of a vector in state space is discussed (see, e.g.,
[1]). Time-series modeling based on a state space is also common in machine learning. However, when the dynamics are highly nonlinear, analysis based on state space models becomes challenging compared to the case of linear dynamics.
Recently, there is growing interest in operator-theoretic approaches for the analysis of dynamical systems. Operator-theoretic approaches are based on the Perron–Frobenius operator [2] or its adjoint, i.e., the Koopman operator (composition operator) [3, 4]. The Koopman operator defines the evolution of observation functions (observables) in a function space rather than state vectors in a state space. Based on the Koopman operator, the analysis of nonlinear dynamical systems can be lifted to a linear (but infinite-dimensional) regime. Consequently, we can consider modal decomposition, with which the global characteristics of nonlinear dynamics can be inspected [4, 5]. Such modal decomposition has been intensively used for scientific purposes to understand complex phenomena (e.g., [6, 7, 8, 9]) and also for engineering tasks, such as signal processing and machine learning. In fact, modal decomposition based on the Koopman operator has been utilized in various engineering tasks, including robotic control [10], image processing [11], and nonlinear system identification [12].
One of the most popular algorithms for modal decomposition based on the Koopman operator is dynamic mode decomposition (DMD) [6, 7, 13]. An important premise of DMD is that the target dataset is generated from a set of observables that spans a function space invariant to the Koopman operator (referred to as Koopman invariant subspace). However, when only the original state vectors are available as the dataset, we must prepare appropriate observables manually according to the underlying nonlinear dynamics. Several methods have been proposed to utilize such observables, including the use of basis functions [14] and reproducing kernels [15]. Note that these methods work well only if appropriate basis functions or kernels are prepared; however, it is not always possible to prepare such functions if we have no a priori knowledge about the underlying dynamics.
In this paper, we propose a fully data-driven method for modal decomposition via the Koopman operator based on the principle of learning Koopman invariant subspaces (LKIS) from scratch using observed data. To this end, we estimate a set of parametric functions by minimizing the residual sum of squares (RSS) of linear least-squares regression, so that the estimated set of functions transforms the original data into a form in which the linear regression fits well. In addition to the principle of LKIS, an implementation using neural networks is described. Moreover, we introduce empirical performance of DMD based on the LKIS framework with several nonlinear dynamical systems and applications, which proves the feasibility of LKIS-based DMD as a fully data-driven method for modal decomposition via the Koopman operator.
We focus on a (possibly nonlinear) discrete-time autonomous dynamical system
(1) |
where denotes the state space and
represents the associated probability space. In dynamical system (
1), Koopman operator [4, 5] is defined as an infinite-dimensional linear operator that acts on observables (or ), i.e.,(2) |
with which the analysis of nonlinear dynamics (1) can be lifted to a linear (but infinite-dimensional) regime. Since
is linear, let us consider a set of eigenfunctions
ofwith eigenvalues
, i.e., for , where and . Further, suppose that can be expressed as a linear combination of those infinite number of eigenfunctions, i.e., with a set of coefficients . By repeatedly applying to both sides of this equation, we obtain the following modal decomposition:(3) |
Here, the value of is decomposed into a sum of Koopman modes , each of which evolves over time with its frequency and decay rate respectively given by and , since is a complex value. The Koopman modes and their eigenvalues can be investigated to understand the dominant characteristics of complex phenomena that follow nonlinear dynamics. The above discussion can also be applied straightforwardly to continuous-time dynamical systems [4, 5].
Modal decomposition based on
, often referred to as Koopman spectral analysis, has been receiving attention in nonlinear physics and applied mathematics. In addition, it is a useful tool for engineering tasks including machine learning and pattern recognition; the spectra (eigenvalues) of
can be used as features of dynamical systems, the eigenfunctions are a useful representation of time-series for various tasks, such as regression and visualization, and itself can be used for prediction and optimal control. Several methods have been proposed to compute modal decomposition based on , such as generalized Laplace analysis [5, 16], the Ulam–Galerkin method [17], and DMD [6, 7, 13]. DMD, which is reviewed in more detail in the next subsection, has received significant attention and been utilized in various data analysis scenarios (e.g., [6, 7, 8, 9]).Note that the Koopman operator and modal decomposition based on it can be extended to random dynamical systems actuated by process noise [4, 14, 18]. In addition, Proctor et al. [19, 20] discussed Koopman analysis of systems with control signals. In this paper, we primarily target autonomous deterministic dynamics (e.g., Eq. (1)) for the sake of presentation clarity.
Let us review DMD, an algorithm for Koopman spectral analysis (further details are in the appendix). Consider a set of observables and let be a vector-valued observable. In addition, define two matrices generated by , and , i.e.,
(4) |
where is the number of snapshots in the dataset. The core functionality of DMD algorithms is computing the eigendecomposition of matrix [21, 13], where is the Moore–Penrose pseudoinverse of
. The eigenvectors of
are referred to as dynamic modes, and they coincide with the Koopman modes if the corresponding eigenfunctions of are in [21]. Alternatively (but nearly equivalently), the condition under which DMD works as a numerical realization of Koopman spectral analysis can be described as follows.Rather than calculating the infinite-dimensional directly, we can consider the restriction of to a finite-dimensional subspace. Assume the observables are elements of . The Koopman invariant subspace is defined as s.t. . If is spanned by a finite number of functions, then the restriction of to , which we denote , becomes a finite-dimensional linear operator. In the sequel, we assume the existence of such . If spans , then DMD’s matrix coincides with asymptotically, wherein is the realization of with regard to the frame (or basis) . For modal decomposition (3), the (vector-valued) Koopman modes are given by and the values of the eigenfunctions are obtained by , where and are the right- and left-eigenvectors of normalized such that [21, 14], and denotes the conjugate transpose of .
Here, an important problem in the practice of DMD arises, i.e., we often have no access to that spans a Koopman invariant subspace . In this case, for nonlinear dynamics, we must manually prepare adequate observables. Several researchers have addressed this issue; Williams et al. [14] leveraged a dictionary of predefined basis functions to transform original data, and Kawahara [15] defined Koopman spectral analysis in a reproducing kernel Hilbert space. Brunton et al. [22] proposed the use of observables selected in a data-driven manner [23] from a function dictionary. Note that, for these methods, we must select an appropriate function dictionary or kernel function according to the target dynamics. However, if we have no a priori knowledge about them, which is often the case, such existing methods do not have to be applied successfully to nonlinear dynamics.
In this paper, we propose a method to learn a set of observables that spans a Koopman invariant subspace , given a sequence of measurements as the dataset. In the following, we summarize desirable properties for such observables, upon which the proposed method is constructed.
Consider a set of square-integrable observables , and define a vector-valued observable . In addition, define a linear operator whose matrix form is given as . Then, if and only if spans a Koopman invariant subspace.
If , then for any ,
where denotes the -element of ; thus, is a Koopman invariant subspace. On the other hand, if spans a Koopman invariant subspace, there exists a linear operator such that ; thus, . Therefore, an instance of the matrix form of is obtained in the form of . ∎
According to Theorem 1, we should obtain that makes zero. However, such problems cannot be solved with finite data because is a function. Thus, we give the corresponding empirical risk minimization problem based on the assumption of ergodicity of and the convergence property of the empirical matrix as follows.
For dynamical system (1), the time-average and space-average of a function (or ) coincide in for almost all , i.e.,
Since is the minimum-norm solution of the linear least-squares regression from the columns of to those of , we constitute the learning problem to estimate a set of function that transforms the original data into a form in which the linear least-squares regression fits well. In particular, we minimize RSS, which measures the discrepancy between the data and the estimated regression model (i.e., linear least-squares in this case). We define the RSS loss as follows:
(5) |
which becomes zero when
spans a Koopman invariant subspace. If we implement a smooth parametric model on
, the local minima of can be found using gradient descent. We adopt that achieves a local minimum of as a set of observables that spans (approximately) a Koopman invariant subspace.In the previous subsection, we have presented an important part of the principle of LKIS, i.e., minimization of the RSS of linear least-squares regression. Note that, to define RSS loss (5), we need access to a sequence of the original states, i.e., , as a dataset. In practice, however, we cannot necessarily observe full states due to limited memory and sensor capabilities. In this case, only transformed (and possibly degenerated) measurements are available, which we denote with a measurement function . To define RSS loss (5) given only degenerated measurements, we must reconstruct the original states from the actual observations .
Here, we utilize delay-coordinate embedding, which has been widely used for state space reconstruction in the analysis of nonlinear dynamics. Consider a univariate time-series , which is a sequence of degenerated measurements . According to the well-known Taken’s theorem [25, 26], a faithful representation of that preserves the structure of the state space can be obtained by with some lag parameter and embedding dimension if is greater than . For a multivariate time-series, embedding with non-uniform lags provides better reconstruction [27]. For example, when we have a two-dimensional time-series , an embedding with non-uniform lags is similar to with each value of and . Several methods have been proposed for selection of and [27, 28, 29]; however, appropriate values may depend on the given application (attractor inspection, prediction, etc.).
In this paper, we propose to surrogate the parameter selection of the delay-coordinate embedding by learning a linear delay embedder from data. Formally, we learn embedder such that
(6) |
where , , and
is a hyperparameter of maximum lag. We estimate weight
as well as the parameters of by minimizing RSS loss (5), which is now defined using instead of . Learning from data yields an embedding that is suitable for learning a Koopman invariant subspace. Moreover, we can impose L1 regularization on weight to make it highly interpretable if necessary according to the given application.Simple minimization of may yield trivial , such as constant values. We should impose some constraints to prevent such trivial solutions. In the proposed framework, modal decomposition is first obtained in terms of learned observables ; thus, the values of must be back-projected to the space of the original measurements
to obtain a physically meaningful representation of the dynamic modes. Therefore, we modify the loss function by employing an additional term such that the original measurements
can be reconstructed from the values of by a reconstructor , i.e., . Such term is given as follows:(7) |
and, if is a smooth parametric model, this term can also be reduced using gradient descent. Finally, the objective function to be minimized becomes
(8) |
where is a parameter that controls the balance between and .
In Sections 3.1–3.3, we introduced the main concepts for the LKIS framework, i.e., RSS loss minimization, learning the linear delay embedder, and reconstruction of the original measurements. Here, we demonstrate an implementation of the LKIS framework using neural networks.
Figure 1 shows a schematic diagram of the implementation of the framework. We model and
using multi-layer perceptrons (MLPs) with a parametric ReLU activation function
[30]. Here, the sizes of the hidden layer of MLPs are defined by the arithmetic means of the sizes of the input and output layers of the MLPs. Thus, the remaining tunable hyperparameters are (maximum delay of ), (dimensionality of ), and (dimensionality of ). To obtain with dimensionality much greater than that of the original measurements, we found that it was useful to set even when full-state measurements (e.g., ) were available.After estimating the parameters of , , and , DMD can be performed normally by using the values of the learned , defining the data matrices in Eq. (4), and computing the eigendecomposition of ; the dynamic modes are obtained by , and the values of the eigenfunctions are obtained by , where and are the right- and left-eigenvectors of . See Section 2.2 for details.
In the numerical experiments described in Sections 5 and 6
, we performed optimization using first-order gradient descent. To stabilize optimization, batch normalization
[31] was imposed on the inputs of hidden layers. Note that, since RSS loss function (5) is notdecomposable with regard to data points, convergence of stochastic gradient descent (SGD) cannot be shown straightforwardly. However, we empirically found that the non-decomposable RSS loss was often reduced successfully, even with mini-batch SGD. Let us show an example; the full-batch RSS loss (denoted
) under the updates of the mini-batch SGD are plotted in the rightmost panel of Figure 13. Here, decreases rapidly and remains small. For SGD on non-decomposable losses, Kar et al. [32] provided guarantees for some cases; however, examining the behavior of more general non-decomposable losses under mini-batch updates remains an open problem.The proposed framework is motivated by the operator-theoretic view of nonlinear dynamical systems. In contrast, learning a generative (state-space) model for nonlinear dynamical systems directly has been actively studied in machine learning and optimal control communities, on which we mention a few examples. A classical but popular method for learning nonlinear dynamical systems is using an expectation-maximization algorithm with Bayesian filtering/smoothing (see, e.g.,
[33]). Recently, using approximate Bayesian inference with the variational autoencoder (VAE) technique
[34] to learn generative dynamical models has been actively researched. Chung et al. [35]proposed a recurrent neural network with random latent variables, Gao
et al. [36] utilized VAE-based inference for neural population models, and Johnson et al. [37] and Krishnan et al. [38] developed inference methods for structured models based on inference with a VAE. In addition, Karl et al. [39] proposed a method to obtain a more consistent estimation of nonlinear state space models. Moreover, Watter et al. [40] proposed a similar approach in the context of optimal control. Since generative models are intrinsically aware of process and observation noises, incorporating methodologies developed in such studies to the operator-theoretic perspective is an important open challenge to explicitly deal with uncertainty.We would like to mention some studies closely related to our method. After the first submission of this manuscript (in May 2017), several similar approaches to learning data transform for Koopman analysis have been proposed [41, 42, 43, 44, 45]. The relationships and relative advantages of these methods should be elaborated in the future.
In this section, we provide numerical examples of DMD based on the LKIS framework (LKIS-DMD) implemented using neural networks. We conducted experiments on three typical nonlinear dynamical systems: a fixed-point attractor, a limit-cycle attractor, and a system with multiple basins of attraction. We show the results of comparisons with other recent DMD algorithms, i.e., Hankel DMD [46, 47], extended DMD [14], and DMD with reproducing kernels [15]. The detailed setups of the experiments discussed in this section and the next section are described in the appendix.
Consider a two-dimensional nonlinear map on :
(9) |
which has a stable equilibrium at the origin if . The Koopman eigenvalues of system (9) include and , and the corresponding eigenfunctions are and , respectively. is also an eigenvalue with corresponding eigenfunction . A minimal Koopman invariant subspace of system (9) is , and the eigenvalues of the Koopman operator restricted to such subspace include , and . We generated a dataset using system (9) with and and applied LKIS-DMD (), linear Hankel DMD [46, 47] (delay 2), and DMD with basis expansion by , which corresponds to extended DMD [14] with a right and minimal observable dictionary. The estimated Koopman eigenvalues are shown in Figure 7, wherein LKIS-DMD successfully identifies the eigenvalues of the target invariant subspace. In Figure 7, we show eigenvalues estimated using data contaminated with white Gaussian observation noise (). The eigenvalues estimated by LKIS-DMD coincide with the true values even with the observation noise, whereas the results of DMD with basis expansion (i.e., extended DMD) are directly affected by the observation noise.
We generated data from the limit cycle of the FitzHugh–Nagumo equation
(10) |
where , , , and . Since trajectories in a limit-cycle are periodic, the (discrete-time) Koopman eigenvalues should lie near the unit circle. Figure 13 shows the eigenvalues estimated by LKIS-DMD (), linear Hankel DMD [46, 47] (delay 8), and DMDs with reproducing kernels [15] (polynomial kernel of degree 4 and RBF kernel of width 1). The eigenvalues produced by LKIS-DMD agree well with those produced by kernel DMDs, whereas linear Hankel DMD produces eigenvalues that would correspond to rapidly decaying modes.
Consider the unforced Duffing equation
(11) |
where , , and . States following (11) evolve toward or depending on which basin of attraction the initial value belongs to unless the initial state is on the stable manifold of the saddle. Generally, a Koopman eigenfunction whose continuous-time eigenvalue is zero takes a constant value in each basin of attraction [14]; thus, the contour plot of such an eigenfunction shows the boundary of the basins of attraction. We generated 1,000 episodes of time-series starting at different initial values uniformly sampled from . The left plot in Figure 17 shows the continuous-time Koopman eigenvalues estimated by LKIS-DMD (), all of which correspond to decaying modes (i.e., negative real parts) and agree with the property of the data. The center plot in Figure 17 shows the true basins of attraction of (11), and the right plot shows the estimated values of the eigenfunction corresponding to the eigenvalue of the smallest magnitude. The surface of the estimated eigenfunction agrees qualitatively with the true boundary of the basins of attractions, which indicates that LKIS-DMD successfully identifies the Koopman eigenfunction.
The numerical experiments in the previous section demonstrated the feasibility of the proposed method as a fully data-driven method for Koopman spectral analysis. Here, we introduce practical applications of LKIS-DMD.
Prediction of a chaotic time-series has received significant interest in nonlinear physics. We would like to perform the prediction of a chaotic time-series using DMD, since DMD can be naturally utilized for prediction as follows. Since is decomposed as and is obtained by where is a left-eigenvalue of , the next step of can be described in terms of the current step, i.e., . In addition, in the case of LKIS-DMD, the values of must be back-projected to using the learned . We generated two types of univariate time-series by extracting the series of the Lorenz attractor [48] and the Rossler attractor [49]. We simulated 25,000 steps for each attractor and used the first 10,000 steps for training, the next 5,000 steps for validation, and the last 10,000 steps for testing prediction accuracy. We examined the prediction accuracy of LKIS-DMD, a simple LSTM network, and linear Hankel DMD [46, 47], all of whose hyperparameters were tuned using the validation set. The prediction accuracy of every method and an example of the predicted series on the test set by LKIS-DMD are shown in Figure 27. As can be seen, the proposed LKIS-DMD achieves the smallest root-mean-square (RMS) errors in the 30-step prediction.
One of the most popular applications of DMD is the investigation of the global characteristics of dynamics by inspecting the spatial distribution of the dynamic modes. In addition to the spatial distribution, we can investigate the temporal profiles of mode activations by examining the values of corresponding eigenfunctions. For example, assume there is an eigenfunction that corresponds to a discrete-time eigenvalue whose magnitude is considerably smaller than one. Such a small eigenvalue indicates a rapidly decaying (i.e., unstable) mode; thus, we can detect occurrences of unstable phenomena by observing the values of . We applied LKIS-DMD () to a time-series generated by a far-infrared laser, which was obtained from the Santa Fe Time Series Competition Data [50]. We investigated the values of eigenfunction corresponding to the eigenvalue of the smallest magnitude. The original time-series and values of obtained by LKIS-DMD are shown in Figure 27. As can be seen, the activations of
coincide with sudden decays of the pulsation amplitudes. For comparison, we applied the novelty/change-point detection technique using one-class support vector machine (OC-SVM)
[51] and direct density-ratio estimation by relative unconstrained least-squares importance fitting (RuLSIF) [52]. We computed AUC, defining the sudden decays of the amplitudes as the points to be detected, which were 0.924, 0.799, and 0.803 for LKIS, OC-SVM, and RuLSIF, respectively.In this paper, we have proposed a framework for learning Koopman invariant subspaces, which is a fully data-driven numerical algorithm for Koopman spectral analysis. In contrast to existing approaches, the proposed method learns (approximately) a Koopman invariant subspace entirely from the available data based on the minimization of RSS loss. We have shown empirical results for several typical nonlinear dynamics and application examples.
We have also introduced an implementation using multi-layer perceptrons; however, one possible drawback of such an implementation is the local optima of the objective function, which makes it difficult to assess the adequacy of the obtained results. Rather than using neural networks, the observables to be learned could be modeled by a sparse combination of basis functions as in [23] but still utilizing optimization based on RSS loss. Another possible future research direction could be incorporating approximate Bayesian inference methods, such as VAE [34]. The proposed framework is based on a discriminative viewpoint, but inference methodologies for generative models could be used to modify the proposed framework to explicitly consider uncertainty in data.
This work was supported by JSPS KAKENHI Grant No. JP15J09172, JP26280086, JP16H01548, and JP26289320.
Delving deep into rectifiers: Surpassing human-level performance on imagenet classification.
InProceedings of the 2015 IEEE International Conference on Computer Vision
, pages 1026–1034, 2015.Proceedings of the 31st AAAI Conference on Artificial Intelligence
, pages 2101–2109, 2017.Deep variational Bayes filters: Unsupervised learning of state space models from raw data.
In Proceedings of the 5th International Conference on Learning Representations, 2017.Dynamic mode decomposition (DMD) was originally invented as a tool for inspecting fluid flows [6, 7], and it has been utilized in various fields other than fluid dynamics. An output of DMD coincides with Koopman spectral analysis if we have that spans a Koopman invariant subspace. The popular algorithm of DMD [21]
, which is based on the singular value decomposition (SVD) of a data matrix, is defined as follows.
Given a sequence of , define data matrices and .
Calculate the compact SVD of as , where is the rank of .
Compute a matrix .
Calculate eigendecomposition of , i.e., compute and such that .
In addition, calculate left-eigenvectors of .
Back-project the eigenvectors to the original space by and .
Normalize and such that , where if and otherwise.
Return dynamic modes and corresponding eigenvalues . In addition, return the values of corresponding eigenfunctions by .
The eigenvalues computed using the algorithm above are “discrete-time” ones, in the sense that they represent frequencies and decay rates in terms of discrete-time dynamical systems. The “continuous-time” counterparts can be computed easily by , where is the time interval in the discrete-time setting.
Note that the definition above presumes the access to an appropriate observable . In contrast, in the proposed method, the above-mentioned algorithm is run after applying the proposed framework to learn from observed data.
In this appendix section, the configurations of the numerical examples and applications, which were omitted in the main text, are described.
In each experiment, parameter was fixed at . Note that the quality of the results was not sensitive to the values of . We modeled and with multi-layer perceptrons by setting the number of hidden nodes (denoted ) as the arithmetic means of the input and output sizes, i.e., for and for , where , , and . Therefore, the remaining hyperparameters to be tuned were (maximum lag), , and . However, unless otherwise noted, we fixed by . Consequently, the independent hyperparameters were and .
One must not subtract the mean from the original data because subtracting something from the data may change the spectra of the underlying dynamical systems (see, e.g., [53]). If the absolute values of the data were too large, we simply divided the data by the maximum absolute value for each series.
In optimization, we found that the adaptive learning rate by SMORMS3 [54] achieved fast convergence compared to a fixed learning rate and other adaptation techniques. The maximum learning rate of SMORMS3 was selected from to in each experiment according to the amount of data. In some cases, optimization was performed in two stages: the parameters of , , and were updated in the first stage, and, in the second stage, the parameters of and were fixed and only was updated. This two-stage optimization was particularly useful for the application of prediction, where a precise reconstruction of the original measurements was necessary. Moreover, when the original states of the dynamical system were available and used without delay (i.e., and ), parameter
of the linear embedder was fixed to be an identity matrix (i.e., no embedder was used). Also, we set the mini-batch size from 100 to 500 because smaller mini-batches often led to an unstable computation of pseudo-inverse.
In the experiment using the fixed-point attractor, the data were generated with four initial values: , , , and , with the length of each episode being
. In the case of noisy dataset, the standard deviation of the observation noise was set to
. In both experiments (with and without observation noise), we set and to cover the minimal three-dimensional Koopman invariant subspace.The data were generated using MATLAB’s ode45 function [55], which was run with time-step and initial value for 2,000 steps. The hyperparameters of LKIS-DMD, linear Hankel DMD, and kernel DMDs were set such that they produced 16 eigenvalues, i.e., and for LKIS-DMD, and POD modes whose singular value was less than were disposed in kernel DMDs ( for the polynomial kernel and for the RBF kernel).
The data were generated using the settings provided in the literature [14]
; 1,000 initial values were drawn from the uniform distribution on
and each initial value was proceeded in time for 11 steps with . We used MATLAB’s ode45 function for numerical integration. For LKIS-DMD, we set and . Note that the values of the estimated eigenfunction were evaluated and plotted in consideration of each data point.The data were generated from the Lorenz attractor [48] (parameters , , and ) and the Rossler attractor [49] (parameters , , and ). We generated 25,000 steps for each attractor and divided them into training, validation, and test sets. For all methods, the delay dimension was fixed at , i.e.,
for LKIS-DMD and linear Hankel DMD, and backpropagation was truncated to length
to learn the LSTM network. We tuned of LKIS-DMD and the dimensionality of LSTM’s hidden state (denoted ) according to the 30-step prediction accuracies obtained using the validation set. Here, we obtained and for the Lorenz data and and for the Rossler data.In this experiment, LSTM was applied because it had been utilized for various nonlinear time-series, and Hankel DMD was used because it had been successfully utilized for analysis of chaotic systems [56].
The dataset was obtained from the Santa Fe Time Series Competition Data [50]. Note that the author’s [50] original web page was not available on the date of submission of this manuscript (May 2017); however, the dataset itself was still available online. The length of delay (or sliding window) was fixed to for all methods applied in this experiment. In addition, no intensive tuning of the other hyperparameters was conduct because the purpose was qualitative. The default settings of libsvm [57] were used for the one-class SVM (except for ). For the density-ratio estimation by RuLSIF, the default values of the implementation by the authors of [52] were used.
In this experiment, OC-SVM was applied because it was a kind of de facto standard for novelty/change-point detection, and RuLSIF ws used because it had achieved the best performance among methods based on density-ratio estimation [52].
Comments
There are no comments yet.