Learning Koopman Invariant Subspaces for Dynamic Mode Decomposition

Spectral decomposition of the Koopman operator is attracting attention as a tool for the analysis of nonlinear dynamical systems. Dynamic mode decomposition is a popular numerical algorithm for Koopman spectral analysis; however, we often need to prepare nonlinear observables manually according to the underlying dynamics, which is not always possible since we may not have any a priori knowledge about them. In this paper, we propose a fully data-driven method for Koopman spectral analysis based on the principle of learning Koopman invariant subspaces from observed data. To this end, we propose minimization of the residual sum of squares of linear least-squares regression to estimate a set of functions that transforms data into a form in which the linear regression fits well. We introduce an implementation with neural networks and evaluate performance empirically using nonlinear dynamical systems and applications.



There are no comments yet.


page 1

page 2

page 3

page 4


Neural Dynamic Mode Decomposition for End-to-End Modeling of Nonlinear Dynamics

Koopman spectral analysis has attracted attention for understanding nonl...

Application of Gaussian Process Regression to Koopman Mode Decomposition for Noisy Dynamic Data

Koopman Mode Decomposition (KMD) is a technique of nonlinear time-series...

A dynamic mode decomposition extension for the forecasting of parametric dynamical systems

Dynamic mode decomposition (DMD) has recently become a popular tool for ...

Identification of linear time-invariant systems with Dynamic Mode Decomposition

Dynamic mode decomposition (DMD) is a popular data-driven framework to e...

Supervised Learning for Dynamical System Learning

Recently there has been substantial interest in spectral methods for lea...

Geometric Considerations of a Good Dictionary for Koopman Analysis of Dynamical Systems

Representation of a dynamical system in terms of simplifying modes is a ...

Mode Decomposition for Homogeneous Symmetric Operators

Finding latent structures in data is drawing increasing attention in div...

Code Repositories


Demo implementation of Learning Koopman Invariant Subspaces for Dynamic Mode Decomposition

view repo
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

A variety of time-series data are generated from nonlinear dynamical systems, in which a state evolves according to a nonlinear map or differential equation. In summarization, regression, or classification of such time-series data, precise analysis of the underlying dynamical systems provides valuable information to generate appropriate features and to select an appropriate computation method. In applied mathematics and physics, the analysis of nonlinear dynamical systems has received significant interest because a wide range of complex phenomena, such as fluid flows and neural signals, can be described in terms of nonlinear dynamics. A classical but popular view of dynamical systems is based on state space models, wherein the behavior of the trajectories of a vector in state space is discussed (see, e.g.,


). Time-series modeling based on a state space is also common in machine learning. However, when the dynamics are highly nonlinear, analysis based on state space models becomes challenging compared to the case of linear dynamics.

Recently, there is growing interest in operator-theoretic approaches for the analysis of dynamical systems. Operator-theoretic approaches are based on the Perron–Frobenius operator [2] or its adjoint, i.e., the Koopman operator (composition operator) [3, 4]. The Koopman operator defines the evolution of observation functions (observables) in a function space rather than state vectors in a state space. Based on the Koopman operator, the analysis of nonlinear dynamical systems can be lifted to a linear (but infinite-dimensional) regime. Consequently, we can consider modal decomposition, with which the global characteristics of nonlinear dynamics can be inspected [4, 5]. Such modal decomposition has been intensively used for scientific purposes to understand complex phenomena (e.g., [6, 7, 8, 9]) and also for engineering tasks, such as signal processing and machine learning. In fact, modal decomposition based on the Koopman operator has been utilized in various engineering tasks, including robotic control [10], image processing [11], and nonlinear system identification [12].

One of the most popular algorithms for modal decomposition based on the Koopman operator is dynamic mode decomposition (DMD) [6, 7, 13]. An important premise of DMD is that the target dataset is generated from a set of observables that spans a function space invariant to the Koopman operator (referred to as Koopman invariant subspace). However, when only the original state vectors are available as the dataset, we must prepare appropriate observables manually according to the underlying nonlinear dynamics. Several methods have been proposed to utilize such observables, including the use of basis functions [14] and reproducing kernels [15]. Note that these methods work well only if appropriate basis functions or kernels are prepared; however, it is not always possible to prepare such functions if we have no a priori knowledge about the underlying dynamics.

In this paper, we propose a fully data-driven method for modal decomposition via the Koopman operator based on the principle of learning Koopman invariant subspaces (LKIS) from scratch using observed data. To this end, we estimate a set of parametric functions by minimizing the residual sum of squares (RSS) of linear least-squares regression, so that the estimated set of functions transforms the original data into a form in which the linear regression fits well. In addition to the principle of LKIS, an implementation using neural networks is described. Moreover, we introduce empirical performance of DMD based on the LKIS framework with several nonlinear dynamical systems and applications, which proves the feasibility of LKIS-based DMD as a fully data-driven method for modal decomposition via the Koopman operator.

2 Background

2.1 Koopman spectral analysis

We focus on a (possibly nonlinear) discrete-time autonomous dynamical system


where denotes the state space and

represents the associated probability space. In dynamical system (

1), Koopman operator [4, 5] is defined as an infinite-dimensional linear operator that acts on observables (or ), i.e.,


with which the analysis of nonlinear dynamics (1) can be lifted to a linear (but infinite-dimensional) regime. Since

is linear, let us consider a set of eigenfunctions


with eigenvalues

, i.e., for , where and . Further, suppose that can be expressed as a linear combination of those infinite number of eigenfunctions, i.e., with a set of coefficients . By repeatedly applying to both sides of this equation, we obtain the following modal decomposition:


Here, the value of is decomposed into a sum of Koopman modes , each of which evolves over time with its frequency and decay rate respectively given by and , since is a complex value. The Koopman modes and their eigenvalues can be investigated to understand the dominant characteristics of complex phenomena that follow nonlinear dynamics. The above discussion can also be applied straightforwardly to continuous-time dynamical systems [4, 5].

Modal decomposition based on

, often referred to as Koopman spectral analysis, has been receiving attention in nonlinear physics and applied mathematics. In addition, it is a useful tool for engineering tasks including machine learning and pattern recognition; the spectra (eigenvalues) of

can be used as features of dynamical systems, the eigenfunctions are a useful representation of time-series for various tasks, such as regression and visualization, and itself can be used for prediction and optimal control. Several methods have been proposed to compute modal decomposition based on , such as generalized Laplace analysis [5, 16], the Ulam–Galerkin method [17], and DMD [6, 7, 13]. DMD, which is reviewed in more detail in the next subsection, has received significant attention and been utilized in various data analysis scenarios (e.g., [6, 7, 8, 9]).

Note that the Koopman operator and modal decomposition based on it can be extended to random dynamical systems actuated by process noise [4, 14, 18]. In addition, Proctor et al. [19, 20] discussed Koopman analysis of systems with control signals. In this paper, we primarily target autonomous deterministic dynamics (e.g., Eq. (1)) for the sake of presentation clarity.

2.2 Dynamic mode decomposition and Koopman invariant subspace

Let us review DMD, an algorithm for Koopman spectral analysis (further details are in the appendix). Consider a set of observables and let be a vector-valued observable. In addition, define two matrices generated by , and , i.e.,


where is the number of snapshots in the dataset. The core functionality of DMD algorithms is computing the eigendecomposition of matrix [21, 13], where is the Moore–Penrose pseudoinverse of

. The eigenvectors of

are referred to as dynamic modes, and they coincide with the Koopman modes if the corresponding eigenfunctions of are in [21]. Alternatively (but nearly equivalently), the condition under which DMD works as a numerical realization of Koopman spectral analysis can be described as follows.

Rather than calculating the infinite-dimensional directly, we can consider the restriction of to a finite-dimensional subspace. Assume the observables are elements of . The Koopman invariant subspace is defined as s.t. . If is spanned by a finite number of functions, then the restriction of to , which we denote , becomes a finite-dimensional linear operator. In the sequel, we assume the existence of such . If spans , then DMD’s matrix coincides with asymptotically, wherein is the realization of with regard to the frame (or basis) . For modal decomposition (3), the (vector-valued) Koopman modes are given by and the values of the eigenfunctions are obtained by , where and are the right- and left-eigenvectors of normalized such that [21, 14], and denotes the conjugate transpose of .

Here, an important problem in the practice of DMD arises, i.e., we often have no access to that spans a Koopman invariant subspace . In this case, for nonlinear dynamics, we must manually prepare adequate observables. Several researchers have addressed this issue; Williams et al. [14] leveraged a dictionary of predefined basis functions to transform original data, and Kawahara [15] defined Koopman spectral analysis in a reproducing kernel Hilbert space. Brunton et al. [22] proposed the use of observables selected in a data-driven manner [23] from a function dictionary. Note that, for these methods, we must select an appropriate function dictionary or kernel function according to the target dynamics. However, if we have no a priori knowledge about them, which is often the case, such existing methods do not have to be applied successfully to nonlinear dynamics.

3 Learning Koopman invariant subspaces

3.1 Minimizing residual sum of squares of linear least-squares regression

In this paper, we propose a method to learn a set of observables that spans a Koopman invariant subspace , given a sequence of measurements as the dataset. In the following, we summarize desirable properties for such observables, upon which the proposed method is constructed.

Theorem 1.

Consider a set of square-integrable observables , and define a vector-valued observable . In addition, define a linear operator whose matrix form is given as . Then, if and only if spans a Koopman invariant subspace.


If , then for any ,

where denotes the -element of ; thus, is a Koopman invariant subspace. On the other hand, if spans a Koopman invariant subspace, there exists a linear operator such that ; thus, . Therefore, an instance of the matrix form of is obtained in the form of . ∎

According to Theorem 1, we should obtain that makes zero. However, such problems cannot be solved with finite data because is a function. Thus, we give the corresponding empirical risk minimization problem based on the assumption of ergodicity of and the convergence property of the empirical matrix as follows.

Assumption 1.

For dynamical system (1), the time-average and space-average of a function (or ) coincide in for almost all , i.e.,

Theorem 2.

Define and by Eq. (4) and suppose that Assumption 1 holds. If all modes are sufficiently excited in the data (i.e., ), then matrix almost surely converges to the matrix form of linear operator in .


From Assumption 1, and respectively converge to and for almost all . In addition, since the rank of is always , converges to in [24]. Consequently, in , almost surely converges to , which is the matrix form of linear operator . ∎

Since is the minimum-norm solution of the linear least-squares regression from the columns of to those of , we constitute the learning problem to estimate a set of function that transforms the original data into a form in which the linear least-squares regression fits well. In particular, we minimize RSS, which measures the discrepancy between the data and the estimated regression model (i.e., linear least-squares in this case). We define the RSS loss as follows:


which becomes zero when

spans a Koopman invariant subspace. If we implement a smooth parametric model on

, the local minima of can be found using gradient descent. We adopt that achieves a local minimum of as a set of observables that spans (approximately) a Koopman invariant subspace.

3.2 Linear delay embedder for state space reconstruction

In the previous subsection, we have presented an important part of the principle of LKIS, i.e., minimization of the RSS of linear least-squares regression. Note that, to define RSS loss (5), we need access to a sequence of the original states, i.e., , as a dataset. In practice, however, we cannot necessarily observe full states due to limited memory and sensor capabilities. In this case, only transformed (and possibly degenerated) measurements are available, which we denote with a measurement function . To define RSS loss (5) given only degenerated measurements, we must reconstruct the original states from the actual observations .

Here, we utilize delay-coordinate embedding, which has been widely used for state space reconstruction in the analysis of nonlinear dynamics. Consider a univariate time-series , which is a sequence of degenerated measurements . According to the well-known Taken’s theorem [25, 26], a faithful representation of that preserves the structure of the state space can be obtained by with some lag parameter and embedding dimension if is greater than . For a multivariate time-series, embedding with non-uniform lags provides better reconstruction [27]. For example, when we have a two-dimensional time-series , an embedding with non-uniform lags is similar to with each value of and . Several methods have been proposed for selection of and [27, 28, 29]; however, appropriate values may depend on the given application (attractor inspection, prediction, etc.).

In this paper, we propose to surrogate the parameter selection of the delay-coordinate embedding by learning a linear delay embedder from data. Formally, we learn embedder such that


where , , and

is a hyperparameter of maximum lag. We estimate weight

as well as the parameters of by minimizing RSS loss (5), which is now defined using instead of . Learning from data yields an embedding that is suitable for learning a Koopman invariant subspace. Moreover, we can impose L1 regularization on weight to make it highly interpretable if necessary according to the given application.

3.3 Reconstruction of original measurements

Simple minimization of may yield trivial , such as constant values. We should impose some constraints to prevent such trivial solutions. In the proposed framework, modal decomposition is first obtained in terms of learned observables ; thus, the values of must be back-projected to the space of the original measurements

to obtain a physically meaningful representation of the dynamic modes. Therefore, we modify the loss function by employing an additional term such that the original measurements

can be reconstructed from the values of by a reconstructor , i.e., . Such term is given as follows:


and, if is a smooth parametric model, this term can also be reduced using gradient descent. Finally, the objective function to be minimized becomes


where is a parameter that controls the balance between and .

3.4 Implementation using neural networks

Figure 1: An instance of LKIS framework, in which and are implemented by MLPs.

In Sections 3.13.3, we introduced the main concepts for the LKIS framework, i.e., RSS loss minimization, learning the linear delay embedder, and reconstruction of the original measurements. Here, we demonstrate an implementation of the LKIS framework using neural networks.

Figure 1 shows a schematic diagram of the implementation of the framework. We model and

using multi-layer perceptrons (MLPs) with a parametric ReLU activation function

[30]. Here, the sizes of the hidden layer of MLPs are defined by the arithmetic means of the sizes of the input and output layers of the MLPs. Thus, the remaining tunable hyperparameters are (maximum delay of ), (dimensionality of ), and (dimensionality of ). To obtain with dimensionality much greater than that of the original measurements, we found that it was useful to set even when full-state measurements (e.g., ) were available.

After estimating the parameters of , , and , DMD can be performed normally by using the values of the learned , defining the data matrices in Eq. (4), and computing the eigendecomposition of ; the dynamic modes are obtained by , and the values of the eigenfunctions are obtained by , where and are the right- and left-eigenvectors of . See Section 2.2 for details.

In the numerical experiments described in Sections 5 and 6

, we performed optimization using first-order gradient descent. To stabilize optimization, batch normalization

[31] was imposed on the inputs of hidden layers. Note that, since RSS loss function (5) is not

decomposable with regard to data points, convergence of stochastic gradient descent (SGD) cannot be shown straightforwardly. However, we empirically found that the non-decomposable RSS loss was often reduced successfully, even with mini-batch SGD. Let us show an example; the full-batch RSS loss (denoted

) under the updates of the mini-batch SGD are plotted in the rightmost panel of Figure 13. Here, decreases rapidly and remains small. For SGD on non-decomposable losses, Kar et al. [32] provided guarantees for some cases; however, examining the behavior of more general non-decomposable losses under mini-batch updates remains an open problem.

4 Related work

The proposed framework is motivated by the operator-theoretic view of nonlinear dynamical systems. In contrast, learning a generative (state-space) model for nonlinear dynamical systems directly has been actively studied in machine learning and optimal control communities, on which we mention a few examples. A classical but popular method for learning nonlinear dynamical systems is using an expectation-maximization algorithm with Bayesian filtering/smoothing (see, e.g.,


). Recently, using approximate Bayesian inference with the variational autoencoder (VAE) technique

[34] to learn generative dynamical models has been actively researched. Chung et al. [35]

proposed a recurrent neural network with random latent variables, Gao 

et al. [36] utilized VAE-based inference for neural population models, and Johnson et al. [37] and Krishnan et al. [38] developed inference methods for structured models based on inference with a VAE. In addition, Karl et al. [39] proposed a method to obtain a more consistent estimation of nonlinear state space models. Moreover, Watter et al. [40] proposed a similar approach in the context of optimal control. Since generative models are intrinsically aware of process and observation noises, incorporating methodologies developed in such studies to the operator-theoretic perspective is an important open challenge to explicitly deal with uncertainty.

We would like to mention some studies closely related to our method. After the first submission of this manuscript (in May 2017), several similar approaches to learning data transform for Koopman analysis have been proposed [41, 42, 43, 44, 45]. The relationships and relative advantages of these methods should be elaborated in the future.

5 Numerical examples

In this section, we provide numerical examples of DMD based on the LKIS framework (LKIS-DMD) implemented using neural networks. We conducted experiments on three typical nonlinear dynamical systems: a fixed-point attractor, a limit-cycle attractor, and a system with multiple basins of attraction. We show the results of comparisons with other recent DMD algorithms, i.e., Hankel DMD [46, 47], extended DMD [14], and DMD with reproducing kernels [15]. The detailed setups of the experiments discussed in this section and the next section are described in the appendix.

Fixed-point attractor

Consider a two-dimensional nonlinear map on :


which has a stable equilibrium at the origin if . The Koopman eigenvalues of system (9) include and , and the corresponding eigenfunctions are and , respectively. is also an eigenvalue with corresponding eigenfunction . A minimal Koopman invariant subspace of system (9) is , and the eigenvalues of the Koopman operator restricted to such subspace include , and . We generated a dataset using system (9) with and and applied LKIS-DMD (), linear Hankel DMD [46, 47] (delay 2), and DMD with basis expansion by , which corresponds to extended DMD [14] with a right and minimal observable dictionary. The estimated Koopman eigenvalues are shown in Figure 7, wherein LKIS-DMD successfully identifies the eigenvalues of the target invariant subspace. In Figure 7, we show eigenvalues estimated using data contaminated with white Gaussian observation noise (). The eigenvalues estimated by LKIS-DMD coincide with the true values even with the observation noise, whereas the results of DMD with basis expansion (i.e., extended DMD) are directly affected by the observation noise.

(a) (b)
Figure 4: (left) Data generated from system (9) and (right) the estimated Koopman eigenvalues. While linear Hankel DMD produces an inconsistent eigenvalue, LKIS-DMD successfully identifies , , , and .
(a) (b)
Figure 7: (left) Data generated from system (9) and white Gaussian observation noise and (right) the estimated Koopman eigenvalues. LKIS-DMD successfully identifies the eigenvalues even with the observation noise.
Limit-cycle attractor

We generated data from the limit cycle of the FitzHugh–Nagumo equation


where , , , and . Since trajectories in a limit-cycle are periodic, the (discrete-time) Koopman eigenvalues should lie near the unit circle. Figure 13 shows the eigenvalues estimated by LKIS-DMD (), linear Hankel DMD [46, 47] (delay 8), and DMDs with reproducing kernels [15] (polynomial kernel of degree 4 and RBF kernel of width 1). The eigenvalues produced by LKIS-DMD agree well with those produced by kernel DMDs, whereas linear Hankel DMD produces eigenvalues that would correspond to rapidly decaying modes.

Figure 13: The left four panels show the estimated Koopman eigenvalues on the limit-cycle of the FitzHugh-Nagumo equation by LKIS-DMD, linear Hankel DMD, and kernel DMDs with polynomial and RBF kernels. The hyperparameters of each DMD are set to produce 16 eigenvalues. The rightmost plot shows the full-batch (size 2,000) loss under mini-batch (size 200) SGD updates along iterations. Non-decomposable part decreases rapidly and remains small, even by SGD.
Multiple basins of attraction

Consider the unforced Duffing equation


where , , and . States following (11) evolve toward or depending on which basin of attraction the initial value belongs to unless the initial state is on the stable manifold of the saddle. Generally, a Koopman eigenfunction whose continuous-time eigenvalue is zero takes a constant value in each basin of attraction [14]; thus, the contour plot of such an eigenfunction shows the boundary of the basins of attraction. We generated 1,000 episodes of time-series starting at different initial values uniformly sampled from . The left plot in Figure 17 shows the continuous-time Koopman eigenvalues estimated by LKIS-DMD (), all of which correspond to decaying modes (i.e., negative real parts) and agree with the property of the data. The center plot in Figure 17 shows the true basins of attraction of (11), and the right plot shows the estimated values of the eigenfunction corresponding to the eigenvalue of the smallest magnitude. The surface of the estimated eigenfunction agrees qualitatively with the true boundary of the basins of attractions, which indicates that LKIS-DMD successfully identifies the Koopman eigenfunction.

Figure 17: (left) The continuous-time Koopman eigenvalues estimated by LKIS-DMD on the Duffing equation. (center) The true basins of attraction of the Duffing equation, wherein points in the blue region evolve toward and points in the red region evolve toward . Note that the stable manifold of the saddle point is not drawn precisely. (right) The values of the Koopman eigenfunction with a nearly zero eigenvalue computed by LKIS-DMD, whose level sets should correspond to the basins of attraction. There is rough agreement between the true boundary of the basins of attraction and the numerically computed boundary. The right two plots are best viewed in color.

6 Applications

The numerical experiments in the previous section demonstrated the feasibility of the proposed method as a fully data-driven method for Koopman spectral analysis. Here, we introduce practical applications of LKIS-DMD.

Chaotic time-series prediction

Prediction of a chaotic time-series has received significant interest in nonlinear physics. We would like to perform the prediction of a chaotic time-series using DMD, since DMD can be naturally utilized for prediction as follows. Since is decomposed as and is obtained by where is a left-eigenvalue of , the next step of can be described in terms of the current step, i.e., . In addition, in the case of LKIS-DMD, the values of must be back-projected to using the learned . We generated two types of univariate time-series by extracting the series of the Lorenz attractor [48] and the Rossler attractor [49]. We simulated 25,000 steps for each attractor and used the first 10,000 steps for training, the next 5,000 steps for validation, and the last 10,000 steps for testing prediction accuracy. We examined the prediction accuracy of LKIS-DMD, a simple LSTM network, and linear Hankel DMD [46, 47], all of whose hyperparameters were tuned using the validation set. The prediction accuracy of every method and an example of the predicted series on the test set by LKIS-DMD are shown in Figure 27. As can be seen, the proposed LKIS-DMD achieves the smallest root-mean-square (RMS) errors in the 30-step prediction.

Unstable phenomena detection

One of the most popular applications of DMD is the investigation of the global characteristics of dynamics by inspecting the spatial distribution of the dynamic modes. In addition to the spatial distribution, we can investigate the temporal profiles of mode activations by examining the values of corresponding eigenfunctions. For example, assume there is an eigenfunction that corresponds to a discrete-time eigenvalue whose magnitude is considerably smaller than one. Such a small eigenvalue indicates a rapidly decaying (i.e., unstable) mode; thus, we can detect occurrences of unstable phenomena by observing the values of . We applied LKIS-DMD () to a time-series generated by a far-infrared laser, which was obtained from the Santa Fe Time Series Competition Data [50]. We investigated the values of eigenfunction corresponding to the eigenvalue of the smallest magnitude. The original time-series and values of obtained by LKIS-DMD are shown in Figure 27. As can be seen, the activations of

coincide with sudden decays of the pulsation amplitudes. For comparison, we applied the novelty/change-point detection technique using one-class support vector machine (OC-SVM)

[51] and direct density-ratio estimation by relative unconstrained least-squares importance fitting (RuLSIF) [52]. We computed AUC, defining the sudden decays of the amplitudes as the points to be detected, which were 0.924, 0.799, and 0.803 for LKIS, OC-SVM, and RuLSIF, respectively.

(a) (b) (c) (d)
Figure 22: The left plot shows RMS errors from 1- to 30-step predictions, and the right plot shows a part of the 30-step prediction obtained by LKIS-DMD on (upper) the Lorenz- series and (lower) the Rossler- series.
(a) (b) (c) (d)
Figure 27: The top plot shows the raw time-series obtained by a far-infrared laser [50]. The other plots show the results of unstable phenomena detection, wherein the peaks should correspond to the occurrences of unstable phenomena.

7 Conclusion

In this paper, we have proposed a framework for learning Koopman invariant subspaces, which is a fully data-driven numerical algorithm for Koopman spectral analysis. In contrast to existing approaches, the proposed method learns (approximately) a Koopman invariant subspace entirely from the available data based on the minimization of RSS loss. We have shown empirical results for several typical nonlinear dynamics and application examples.

We have also introduced an implementation using multi-layer perceptrons; however, one possible drawback of such an implementation is the local optima of the objective function, which makes it difficult to assess the adequacy of the obtained results. Rather than using neural networks, the observables to be learned could be modeled by a sparse combination of basis functions as in [23] but still utilizing optimization based on RSS loss. Another possible future research direction could be incorporating approximate Bayesian inference methods, such as VAE [34]. The proposed framework is based on a discriminative viewpoint, but inference methodologies for generative models could be used to modify the proposed framework to explicitly consider uncertainty in data.


This work was supported by JSPS KAKENHI Grant No. JP15J09172, JP26280086, JP16H01548, and JP26289320.


  • Hirsch et al. [2013] Morris W. Hirsch, Stephen Smale, and Robert L. Devaney. Differential Equations, Dynamical Systems, and an Introduction to Chaos. Academic Press, 3rd edition, 2013.
  • Lasota and Mackey [1994] Andrzej Lasota and Michael C. Mackey. Chaos, Fractals, and Noise: Stochastic Aspects of Dynamics. Springer, 2nd edition, 1994.
  • Koopman [1931] B. O. Koopman. Hamiltonian systems and transformation in Hilbert space. Proceedings of the National Academy of Sciences of the United States of America, 17(5):315–318, 1931.
  • Mezić [2005] Igor Mezić. Spectral properties of dynamical systems, model reduction and decompositions. Nonlinear Dynamics, 41(1-3):309–325, 2005.
  • Budišić et al. [2012] Marko Budišić, Ryan Mohr, and Igor Mezić. Applied Koopmanism. Chaos, 22:047510, 2012.
  • Rowley et al. [2009] Clarence W. Rowley, Igor Mezić, Shervin Bagheri, Philipp Schlatter, and Dan S. Henningson. Spectral analysis of nonlinear flows. Journal of Fluid Mechanics, 641:115–127, 2009.
  • Schmid [2010] Peter J. Schmid. Dynamic mode decomposition of numerical and experimental data. Journal of Fluid Mechanics, 656:5–28, 2010.
  • Proctor and Eckhoff [2015] Joshua L. Proctor and Philipo A. Eckhoff. Discovering dynamic patterns from infectious disease data using dynamic mode decomposition. International Health, 7(2):139–145, 2015.
  • Brunton et al. [2016a] Bingni W. Brunton, Lise A. Johnson, Jeffrey G. Ojemann, and J. Nathan Kutz. Extracting spatial-temporal coherent patterns in large-scale neural recordings using dynamic mode decomposition. Journal of Neuroscience Methods, 258:1–15, 2016a.
  • Berger et al. [2015] Erik Berger, Mark Sastuba, David Vogt, Bernhard Jung, and Heni Ben Amor. Estimation of perturbations in robotic behavior using dynamic mode decomposition. Advanced Robotics, 29(5):331–343, 2015.
  • Kutz et al. [2016a] J. Nathan Kutz, Xing Fu, and Steven L. Brunton. Multiresolution dynamic mode decomposition. SIAM Journal on Applied Dynamical Systems, 15(2):713–735, 2016a.
  • Mauroy and Goncalves [2016] Alexandre Mauroy and Jorge Goncalves. Linear identification of nonlinear systems: A lifting technique based on the Koopman operator. In Proceedings of the 2016 IEEE 55th Conference on Decision and Control, pages 6500–6505, 2016.
  • Kutz et al. [2016b] J. Nathan Kutz, Steven L. Brunton, Bingni W. Brunton, and Joshua L. Proctor. Dynamic Mode Decomposition: Data-Driven Modeling of Complex Systems. SIAM, 2016b.
  • Williams et al. [2015] Matthew O. Williams, Ioannis G. Kevrekidis, and Clarence W. Rowley. A data-driven approximation of the Koopman operator: Extending dynamic mode decomposition. Journal of Nonlinear Science, 25(6):1307–1346, 2015.
  • Kawahara [2016] Yoshinobu Kawahara. Dynamic mode decomposition with reproducing kernels for Koopman spectral analysis. In Advances in Neural Information Processing Systems, volume 29, pages 911–919, 2016.
  • Mezić [2013] Igor Mezić. Analysis of fluid flows via spectral properties of the Koopman operator. Annual Review of Fluid Mechanics, 45:357–378, 2013.
  • Froyland et al. [2014] Gary Froyland, Georg A. Gottwald, and Andy Hammerlindl. A computational method to extract macroscopic variables and their dynamics in multiscale systems. SIAM Journal on Applied Dynamical Systems, 13(4):1816–1846, 2014.
  • Takeishi et al. [2017] Naoya Takeishi, Yoshinobu Kawahara, and Takehisa Yairi. Subspace dynamic mode decomposition for stochastic Koopman analysis. Physical Review E, 96(3):033310, 2017.
  • Proctor et al. [2016a] Joshua L. Proctor, Steven L. Brunton, and J. Nathan Kutz. Dynamic mode decomposition with control. SIAM Journal on Applied Dynamical Systems, 15(1):142–161, 2016a.
  • Proctor et al. [2016b] Joshua L. Proctor, Steven L. Brunton, and J. Nathan Kutz. Generalizing Koopman theory to allow for inputs and control. arXiv:1602.07647, 2016b.
  • Tu et al. [2014] Jonathan H. Tu, Clarence W. Rowley, Dirk M. Luchtenburg, Steven L. Brunton, and J. Nathan Kutz. On dynamic mode decomposition: Theory and applications. Journal of Computational Dynamics, 1(2):391–421, 2014.
  • Brunton et al. [2016b] Steven L. Brunton, Bingni W. Brunton, Joshua L. Proctor, and J. Nathan Kutz. Koopman invariant subspaces and finite linear representations of nonlinear dynamical systems for control. PLoS ONE, 11(2):e0150171, 2016b.
  • Brunton et al. [2016c] Steven L. Brunton, Joshua L. Proctor, and J. Nathan Kutz. Discovering governing equations from data by sparse identification of nonlinear dynamical systems. Proceedings of the National Academy of Sciences of the United States of America, 113(15):3932–3937, 2016c.
  • Rakočević [1997] V. Rakočević. On continuity of the Moore–Penrose and Drazin inverses. Matematički Vesnik, 49(3-4):163–172, 1997.
  • Takens [1981] Floris Takens. Detecting strange attractors in turbulence. In Dynamical Systems and Turbulence, Warwick 1980, volume 898 of Lecture Notes in Mathematics, pages 366–381, 1981.
  • Sauer et al. [1991] Tim Sauer, James A. Yorke, and Martin Casdagli. Embedology. Journal of Statistical Physics, 65(3-4):579–616, 1991.
  • Garcia and Almeida [2005] Sara P. Garcia and Jonas S. Almeida. Multivariate phase space reconstruction by nearest neighbor embedding with different time delays. Physical Review E, 72(2):027205, 2005.
  • Hirata et al. [2006] Yoshito Hirata, Hideyuki Suzuki, and Kazuyuki Aihara. Reconstructing state spaces from multivariate data using variable delays. Physical Review E, 74(2):026202, 2006.
  • Vlachos and Kugiumtzis [2010] Ioannis Vlachos and Dimitris Kugiumtzis. Nonuniform state-space reconstruction and coupling detection. Physical Review E, 82(1):016207, 2010.
  • He et al. [2015] K. He, X. Zhang, S. Ren, and J. Sun.

    Delving deep into rectifiers: Surpassing human-level performance on imagenet classification.


    Proceedings of the 2015 IEEE International Conference on Computer Vision

    , pages 1026–1034, 2015.
  • Ioffe and Szegedy [2015] Sergey Ioffe and Christian Szegedy. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In Proceedings of the 32nd International Conference on Machine Learning, volume 37 of Proceedings of Machine Learning Research, pages 448–456, 2015.
  • Kar et al. [2014] Purushottam Kar, Harikrishna Narasimhan, and Prateek Jain. Online and stochastic gradient methods for non-decomposable loss functions. In Advances in Neural Information Processing Systems, volume 27, pages 694–702, 2014.
  • Ghahramani and Roweis [1999] Z. Ghahramani and S. T. Roweis. Learning nonlinear dynamical systems using an EM algorithm. In Advances in Neural Information Processing Systems, volume 11, pages 431–437, 1999.
  • Kingma and Welling [2014] D. P. Kingma and M. Welling. Stochastic gradient VB and the variational auto-encoder. In Proceedings of the 2nd International Conference on Learning Representations, 2014.
  • Chung et al. [2015] Junyoung Chung, Kyle Kastner, Laurent Dinh, Kratarth Goel, Aaron C. Courville, and Yoshua Bengio. A recurrent latent variable model for sequential data. In Advances in Neural Information Processing Systems, volume 28, pages 2980–2988, 2015.
  • Gao et al. [2016] Yuanjun Gao, Evan W. Archer, Liam Paninski, and John P. Cunningham. Linear dynamical neural population models through nonlinear embeddings. In Advances in Neural Information Processing Systems, volume 29, pages 163–171, 2016.
  • Johnson et al. [2016] Matthew Johnson, David K. Duvenaud, Alex Wiltschko, Ryan P. Adams, and Sandeep R. Datta. Composing graphical models with neural networks for structured representations and fast inference. In Advances in Neural Information Processing Systems, volume 29, pages 2946–2954, 2016.
  • Krishnan et al. [2017] Rahul G. Krishnan, Uri Shalit, and David Sontag. Structured inference networks for nonlinear state space models. In

    Proceedings of the 31st AAAI Conference on Artificial Intelligence

    , pages 2101–2109, 2017.
  • Karl et al. [2017] Maximilian Karl, Maximilian Soelch, Justin Bayer, and Patrick van der Smagt.

    Deep variational Bayes filters: Unsupervised learning of state space models from raw data.

    In Proceedings of the 5th International Conference on Learning Representations, 2017.
  • Watter et al. [2015] Manuel Watter, Jost Springenberg, Joschka Boedecker, and Martin Riedmiller. Embed to control: A locally linear latent dynamics model for control from raw images. In Advances in Neural Information Processing Systems, volume 28, pages 2746–2754, 2015.
  • Li et al. [2017] Qianxiao Li, Felix Dietrich, Erik M. Bollt, and Ioannis G. Kevrekidis. Extended dynamic mode decomposition with dictionary learning: A data-driven adaptive spectral decomposition of the Koopman operator. Chaos, 27:103111, 2017.
  • Yeung et al. [2017] Enoch Yeung, Soumya Kundu, and Nathan Hodas. Learning deep neural network representations for Koopman operators of nonlinear dynamical systems. arXiv:1708.06850, 2017.
  • Mardt et al. [2017] Andreas Mardt, Luca Pasquali, Hao Wu, and Frank Noé. VAMPnets: Deep learning of molecular kinetics. arXiv:1710.06012, 2017.
  • Otto and Rowley [2017] Samuel E. Otto and Clarence W. Rowley. Linearly-recurrent autoencoder networks for learning dynamics. arXiv:1712.01378, 2017.
  • Lusch et al. [2017] Bethany Lusch, J. Nathan Kutz, and Steven L. Brunton. Deep learning for universal linear embeddings of nonlinear dynamics. arXiv:1712.09707, 2017.
  • Arbabi and Mezić [2017] Hassan Arbabi and Igor Mezić. Ergodic theory, dynamic mode decomposition and computation of spectral properties of the Koopman operator. SIAM Journal on Applied Dynamical Systems, 16(4):2096–2126, 2017.
  • Susuki and Mezić [2015] Yoshihiko Susuki and Igor Mezić. A Prony approximation of Koopman mode decomposition. In Proceedings of the 2015 IEEE 54th Conference on Decision and Control, pages 7022–7027, 2015.
  • Lorenz [1963] Edward N. Lorenz. Deterministic nonperiodic flow. Journal of the Atmospheric Sciences, 20(2):130–141, 1963.
  • Rössler [1976] O. E. Rössler. An equation for continuous chaos. Physical Letters, 57A(5):397–398, 1976.
  • Weigend and Gershenfeld [1993] Andreas S. Weigend and Neil A. Gershenfeld, editors. Time Series Prediction: Forecasting The Future And Understanding The Past. Santa Fe Institute Series. Westview Press, 1993.
  • Canu and Smola [2006] Stéphane Canu and Alex Smola. Kernel methods and the exponential family. Neurocomputing, 69(7-9):714–720, 2006.
  • Liu et al. [2013] Song Liu, Makoto Yamada, Nigel Collier, and Masashi Sugiyama. Change-point detection in time-series data by relative density-ratio estimation. Neural Networks, 43:72–83, 2013.
  • Chen et al. [2012] Kevin K. Chen, Jonathan H. Tu, and Clarence W. Rowley. Variants of dynamic mode decomposition: Boundary condition, Koopman, and Fourier analyses. Journal of Nonlinear Science, 22(6):887–915, 2012.
  • Funk [2015] Simon Funk. RMSprop loses to SMORMS3 - beware the epsilon! available online, 2015. Retrieved 29 April-2017.
  • Shampine and Reichelt [1997] Lawrence F. Shampine and Mark W. Reichelt. The MATLAB ODE suite. SIAM Journal on Scientific Computing, 18(1):1–22, 1997.
  • Brunton et al. [2017] Steven L. Brunton, Bingni W. Brunton, Joshua L. Proctor, Eurika Kaiser, and J. Nathan Kutz. Chaos as an intermittently forced linear system. Nature Communications, 8:19, 2017.
  • Chang and Lin [2011] Chih-Chung Chang and Chih-Jen Lin. LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology, 2(3):27, 2011.

Appendix A Algorithm of dynamic mode decomposition

Dynamic mode decomposition (DMD) was originally invented as a tool for inspecting fluid flows [6, 7], and it has been utilized in various fields other than fluid dynamics. An output of DMD coincides with Koopman spectral analysis if we have that spans a Koopman invariant subspace. The popular algorithm of DMD [21]

, which is based on the singular value decomposition (SVD) of a data matrix, is defined as follows.

Algorithm (Dmd [21]).
  1. Given a sequence of , define data matrices and .

  2. Calculate the compact SVD of as , where is the rank of .

  3. Compute a matrix .

  4. Calculate eigendecomposition of , i.e., compute and such that .

  5. In addition, calculate left-eigenvectors of .

  6. Back-project the eigenvectors to the original space by and .

  7. Normalize and such that , where if and otherwise.

  8. Return dynamic modes and corresponding eigenvalues . In addition, return the values of corresponding eigenfunctions by .

The eigenvalues computed using the algorithm above are “discrete-time” ones, in the sense that they represent frequencies and decay rates in terms of discrete-time dynamical systems. The “continuous-time” counterparts can be computed easily by , where is the time interval in the discrete-time setting.

Note that the definition above presumes the access to an appropriate observable . In contrast, in the proposed method, the above-mentioned algorithm is run after applying the proposed framework to learn from observed data.

Appendix B Detailed experimental setup

In this appendix section, the configurations of the numerical examples and applications, which were omitted in the main text, are described.

b.1 General settings


In each experiment, parameter was fixed at . Note that the quality of the results was not sensitive to the values of . We modeled and with multi-layer perceptrons by setting the number of hidden nodes (denoted ) as the arithmetic means of the input and output sizes, i.e., for and for , where , , and . Therefore, the remaining hyperparameters to be tuned were (maximum lag), , and . However, unless otherwise noted, we fixed by . Consequently, the independent hyperparameters were and .


One must not subtract the mean from the original data because subtracting something from the data may change the spectra of the underlying dynamical systems (see, e.g., [53]). If the absolute values of the data were too large, we simply divided the data by the maximum absolute value for each series.


In optimization, we found that the adaptive learning rate by SMORMS3 [54] achieved fast convergence compared to a fixed learning rate and other adaptation techniques. The maximum learning rate of SMORMS3 was selected from to in each experiment according to the amount of data. In some cases, optimization was performed in two stages: the parameters of , , and were updated in the first stage, and, in the second stage, the parameters of and were fixed and only was updated. This two-stage optimization was particularly useful for the application of prediction, where a precise reconstruction of the original measurements was necessary. Moreover, when the original states of the dynamical system were available and used without delay (i.e., and ), parameter

of the linear embedder was fixed to be an identity matrix (i.e., no embedder was used). Also, we set the mini-batch size from 100 to 500 because smaller mini-batches often led to an unstable computation of pseudo-inverse.

b.2 Fixed-point attractor experiment

In the experiment using the fixed-point attractor, the data were generated with four initial values: , , , and , with the length of each episode being

. In the case of noisy dataset, the standard deviation of the observation noise was set to

. In both experiments (with and without observation noise), we set and to cover the minimal three-dimensional Koopman invariant subspace.

b.3 Limit-cycle attractor experiment

The data were generated using MATLAB’s ode45 function [55], which was run with time-step and initial value for 2,000 steps. The hyperparameters of LKIS-DMD, linear Hankel DMD, and kernel DMDs were set such that they produced 16 eigenvalues, i.e., and for LKIS-DMD, and POD modes whose singular value was less than were disposed in kernel DMDs ( for the polynomial kernel and for the RBF kernel).

b.4 Multiple basins of attraction experiment

The data were generated using the settings provided in the literature [14]

; 1,000 initial values were drawn from the uniform distribution on

and each initial value was proceeded in time for 11 steps with . We used MATLAB’s ode45 function for numerical integration. For LKIS-DMD, we set and . Note that the values of the estimated eigenfunction were evaluated and plotted in consideration of each data point.

b.5 Chaotic time-series prediction experiment

The data were generated from the Lorenz attractor [48] (parameters , , and ) and the Rossler attractor [49] (parameters , , and ). We generated 25,000 steps for each attractor and divided them into training, validation, and test sets. For all methods, the delay dimension was fixed at , i.e.,

for LKIS-DMD and linear Hankel DMD, and backpropagation was truncated to length

to learn the LSTM network. We tuned of LKIS-DMD and the dimensionality of LSTM’s hidden state (denoted ) according to the 30-step prediction accuracies obtained using the validation set. Here, we obtained and for the Lorenz data and and for the Rossler data.

In this experiment, LSTM was applied because it had been utilized for various nonlinear time-series, and Hankel DMD was used because it had been successfully utilized for analysis of chaotic systems [56].

b.6 Unstable phenomena detection experiment

The dataset was obtained from the Santa Fe Time Series Competition Data [50]. Note that the author’s [50] original web page was not available on the date of submission of this manuscript (May 2017); however, the dataset itself was still available online. The length of delay (or sliding window) was fixed to for all methods applied in this experiment. In addition, no intensive tuning of the other hyperparameters was conduct because the purpose was qualitative. The default settings of libsvm [57] were used for the one-class SVM (except for ). For the density-ratio estimation by RuLSIF, the default values of the implementation by the authors of [52] were used.

In this experiment, OC-SVM was applied because it was a kind of de facto standard for novelty/change-point detection, and RuLSIF ws used because it had achieved the best performance among methods based on density-ratio estimation [52].