1 Introduction
The Dynamic Mode Decomposition (DMD) algorithm was invented by P. Schmid as a method for extracting dynamic information from temporal measurements of a multivariate fluid flow vector
(Schmid, 2010). The dynamic modes extracted are the nonorthogonal eigenvectors of a nonnormal matrix that best linearizes the onestep evolution of the measured vector.Schmid showed that the dynamic modes recovered by DMD correspond to the globally stable modes in the flow (Schmid, 2010). The nonorthogonality of the recovered dynamic modes reveals spatial structure in the temporal evolution of the measured fluid flows in a way that other second order spatial correlation based methods, such as the Proper Orthogonal Decomposition (POD), do not (Kerschen et al., 2005). This spurred followon work on other applications and extensions of DMD to understanding dynamical systems from measurements.
1.1 Previous work on DMD and the analysis of dynamical systems
After the initial work proposing DMD, the first major analyses of the algorithm drew connections between the DMD modes and the eigenfunctions of the Koopman operator from dynamical system theory. Rowley et al. and Mezić et al. showed that under certain conditions, the DMD modes approximate the eigenfunctions of the Koopman operator for a given system
(Rowley et al., 2009; Mezić, 2013). Related work in Bagheri (2013) studied the Koopman operator directly, analyzed its spectrum, and compared it against the spectrum of the matrix decomposed in DMD. The work in Rowley et al. (2009) also explained how the linear DMD modes can elucidate the structure in the temporal evolution in nonlinear fluid flows. The work in ČrnjarićŽic et al. (2017)provided a further analysis of the Koopman operator and more connections to DMD. More recently, Lusch et al. have shown how deep learning can be combined with DMD to extract modes for a nonlinearly evolving dynamical system
(Lusch et al., 2018).There have been several extensions of DMD. The authors in Chen et al. (2012) developed a method to improve the robusness of DMD to noise. Jovanovic et al. proposed a sparsityinducing formulation of DMD that allowed fewer dynamic modes to better capture the dynamical system (Jovanović et al., 2014). Tu et al. developed a DMD variant that takes into account systematic measurement errors and measurement noise (Tu et al., 2014); this framework was extended in Hemati et al. (2017). A Bayesian, probabilistic variant of DMD was developed in Takeishi et al. (2017), where a Gibbs sampler for the modes and a sparsityinducing prior were proposed. Another recent extension of DMD includes an online (or streaming) version of DMD (Zhang et al., 2017).
Additionally, there have been applications of DMD to other domains besides computational fluid mechanics. The work in Bai et al. (2017) applied DMD to compressed sensing settings. A related work applied DMD to model the background in a streaming video (Pendergrass et al., 2017). The authors in Mann and Kutz (2016) applied DMD to finance, by using the predicted modes and temporal variations to forecast future market trends. The authors in Berger et al. (2015)
brought DMD to the field of robotics, and used DMD to estimate perturbations in the motion of a robot. DMD has also been applied to power systems analysis, where it has been used to analyze transients in large power grids
(Barocio et al., 2015). There are many more applications and extensions, and we point the interested reader to the recent book by Kutz et al. (2016).1.2 Our main finding: DMD unmixes lag1 (or higher lag) uncorrelated time series
Suppose that we are given multivariate observations modeled as
(1) 
where is a nonsingular mixing matrix, and is the latent vector of signals (or sources). The matrix has unitnorm columns and is related to by
(2) 
Setting entries of the diagonal matrix as ensures that as in (1).
In what follows, we will adopt the following notational convention: we shall use boldface to denote vectors such as . Matrices, such as , will be denoted by nonboldface uppercase letters; and scalars, such as , will be denoted by lowercase symbols.
We assume, without loss of generality, that
(3) 
The lag covariance matrix of is defined as
(4) 
where is a nonnegative integer.
If we are able to form a reliable estimate of the mixing matrix from the multivariate observations then, via Eq. (1), we can unmix the latent signals by computing . Inferring and computing will also similarly unmix the signals. Inferring the mixing matrix and unmixing the signals (or sources) is referred to as blind source separation (Choi et al., 2005).
Our key finding is that when the lag1 covariance matrix in (4) is diagonal, corresponding to the setting where the latent signals are lag1 uncorrelated weakly stationary time series, and there are sufficiently many samples of , then the DMD algorithm in (23) produces a nonnormal matrix whose nonorthogonal eigenvectors are reliably good estimates of in (1). In other words, DMD unmixes lag1 uncorrelated signals and weakly stationary time series. Theorems 3 and 3.2 summarize our findings in this regard by stating the problem in terms of the estimation of the matrix .
Our findings reveal that a straightforward extension of DMD, described in Section 4 and (37), allows DMD to unmix lag uncorrelated signals and time series. This brings up the possibility of using a higher lag to unmix signals that might exhibit a more favorable correlation at larger lag than at a lag of one. Indeed, in Figure 4 we provide one such example where DMD provides a better estimate of than does DMD.
Our main contribution, which builds on our previous work in Prasadan and Nadakuditi (2018), is the analysis of the unmixing performance of DMD and DMD (introduced in Section 4), when unmixing deterministic signals and random, weakly stationary time series in the finite sample regime and in the setting where there is randomly missing data in the observations .
1.3 New insight: DMD can unmix ergodic time series that ICA cannot
Independent Component Analysis (ICA) is a classical algorithm for blind source separation (Lee, 1998; Mitsui et al., 2017) that is often used for the cocktail party problem of unmixing mixed audio signals. Our analysis reveals that DMD can be succesfully applied to this problem as well because independent audio sources are well modeled as onelag (or higher lag) uncorrelated (see Figure 8).
It is known that ICA fails when more then one of the independent, latent signals is normally distributed. A consequence of this is that ICA will fail to unmix mixed independent, ergodic time series with Gaussian marginal distributions. Our analysis reveals that DMD will succeed in this setting, even as ICA fails; see Figure
1 for an illustration where ICA fails to unmix two mixed, independent Gaussian AR(1) processes while DMD succeeds. Thus, DMD can and should be used by practitioners to reanalyze multivariate time series data for which the use of ICA has not revealed any insights.1.4 New insight: DMD can unmix mixed Fourier series that PCA/SVD cannot
The eigenwalker model, described in Troje (2002b), is a linear model for human motion. The model is a linear combination of vectors, via
(5) 
The vectors are the modes or principal components of the motion, and each has a sinusoidal temporal variation. We generate our model as follows:
for to , where
This model has been decomposed with ICA, and used for video motion editing and analysis (Shapiro et al., 2006). Here, we apply PCA (the SVD), and compare it to DMD. In Figure 2, we display the results of unmixing with the SVD and with DMD. We observe that DMD successfully unmixes the cosines, while the SVD fails: note that unless the are orthogonal, there is no hope of a successful unmixing. Moreover, the estimation of of from the SVD fails, as we find that
which has a squared error of , while the estimate from DMD has a squared error of , where the error is computed according to (29).
1.5 Connection with other algorithms for time series blind source separation
Let
be the singular value decomposition (SVD) of
. Then, we have that and(6) 
Given and , we can compute the whitened vector
(7) 
whose covariance matrix is given by . Then from (1) and (6) we have that
(8) 
where the mixing matrix
is an orthogonal matrix because the
and matrices, which correspond to the left and right singular vector matrices of in (1) are orthogonal.Equation (8) reveals that we can solve the blind source separation problem and unmix from observations of if we can infer the orthogonal mixing matrix from data. To that end, we note that
(9) 
Equation (9) reveals that when the latent signals are lag1 uncorrelated, i.e., is a diagonal matrix, then the lag1 covariance matrix of the whitened vector will be diagonalized by the orthogonal matrix . The sample lag1 covariance matrix computed from finite data will, in general, not be symmetric and so we might infer from the eigenvectors of the symmetric part: this leads to the AMUSE (Algorithm for Multiple Unknown Signals Extraction) method (Tong et al., 1990).
A deeper inspection of (9) reveals that if are second order, weakly stationary time series that are uncorrelated for multiple values of (corresponding to multiple lags), then we can infer (which, incidentally corresponds to the polar part of the polar decomposition of the mixing matrix in (1)) by posing it as jointdiagonalization of for lags corresponding to . This is the basis of the secondorder blind identification (SOBI) method (Belouchrani et al., 1997) where the joint diagonlization problem is addressed by finding the orthogonal matrix that minimizes the sumsofsquares of the offdiagonal entries of . Numerically, this problem is solvable via the JADE method (Cardoso and Souloumiac, 1993; Miettinen et al., 2017, 2016).
Miettinen et al analyze the performance of a symmetric variant of the SOBI method in Miettinen et al. (2016)
and the problem of determining the number of latent signals that are distinct from white noise in
Matilainen et al. (2018). Their results for the performance are asymptotic and distributional. That is, the limiting distribution of the estimated matrix is computed, when the input signals are realizations of some time series, with zero mean and diagonal autocorrelations at every lag . As will be seen in what follows, these assumptions are very similar to those that we impose on DMD. Our analysis for the missing data ssetting is new and has no counterpart in the SOBI or AMUSE performance analysis literature.In Table 1, we summarize the various algorithms for unmixing of stationary time series. Table 1 brings into sharp focus the manner in which DMD and DMD are similar to and different from the AMUSE and SOBI algorithms. All algorithms diagonalize a matrix; SOBI and AMUSE estimate orthogonal matrices while DMD and DMD estimate nonorthogonal matrices. The SOBI and AMUSE algorithms diagonalize crosscovariance matrices formed from whitened time series data while DMD and DMD works on the time series data directly. Thus SOBI and AMUSE explicity whiten the data while DMD implicitly whitens the data. SOBI and DMD exhibit similar performance (see Figure. 7) – a more detailed theoretical study comparing their performance in the noisy setting is warranted.
Algorithm  Key Matrix  Fit for Key Matrix  Numerical Method 
DMD  , nonorthogonal  NonSymmetric Eig.  
DMD  , nonorthogonal  Nonsymmetric Eig.  
AMUSE  , orthogonal  Eig. of Symmetric part  
SOBI  ,  , orthogonal  Joint Diagonalization 
A contribution of this is a nonasymptotic finite sample performance analysis for the DMD and DMD algorithm in the setting where the mixed deterministic signals or erdodic time series are approximately (or exactly) one or higher lag approximately uncorrelated.
1.6 Organization
The remainder of this paper is organized as follows. In Section 2, we introduce the time series data matrix model and describe the DMD algorithm in Section 2.2. We provide a DMD performance guarantee for unmixing deterministic signals in Section 3; a corollary of that result in Section 3.1 explains why DMD is particularly apt for unmixing multivariate mixtures of Fourier series such as the “eigenwalker” model. We extend our analysis to stationary, ergodic time series data in Section 3.2. We describe and analyze the performance of the higher lag extension of DMD, which we call DMD, in Section 4 and extend its analysis to unmixing ergodic time series in Section 5. We analyze the setting where the time series data matrix has randomly missing data in Section 6. We validate our theoretical results with numerical simulations in Section 7. In Section 8, we describe how a time series matrix can be factorized using DMD to obtain a Dynamic Mode Factorization (DMF) involving the product of the DMD estimate of the (columnwise normalized) mixing matrix and the coordinates, which represent the unmixed latent signals. We show how DMF can be applied to the cocktail party problem in Choi et al. (2005) in Section 8 and how unmixing the latent series via DMF can help improve time series change point detection in Section 8.2. We offer some concluding remarks in Section 9. The proofs of our results are deferred to the appendices.
2 Model and Setup
Suppose that, at time , we are given a dimensional time series vector
where an individual entry , for , of is modeled as
(10) 
and is the entry of a dimensional vector . Each is the entry of an dimensional vector , and the are samples of a time series. Equation (10) can be succinctly written in vector form as
(11) 
where the matrix is defined as
We are given samples corresponding to uniformly spaced time instances . In what follows, without loss of generality, we assume that . Let be the matrix defined as
(12) 
We define the matrix with columns as
(13) 
Consequently, we have that
(14) 
where is the “latent time series” matrix given by (13). Equation (14) reveals that the multivariate time series matrix is a linear combination of rows of the latent time series matrix.
Suppose that for ,
(15) 
and the matrices
(16) 
Then, from (14), and from the definition of and , it can be shown that
(17) 
where, for ,
(18) 
We will define
(19) 
and assume that, without loss of generality, the and hence the , , , and are ordered so that
(20) 
Note that by construction, in (17), the columns of the matrices and have unit norm. In what follows, we assume that and have linearly independent columns, that , that the columns of have zero mean, and that the columns of are canonically nonrandom and nonorthogonal. Our goal in what follows is to estimate the columns of the matrices and .
2.1 Technical Assumptions
We will require the following set of technical assumptions on the data.

Assume that is fixed, with
(21a) 
Assume that the are linearly independent, so that is a finite quantity:
(21b) Here, denotes the singular value of . Essentially, the conditioning of the is independent of and . Moreover, the are canonically nonrandom and nonorthogonal.

Assume that
(21c) i.e., that the limit of the ratio is finite.

Assume that columns of (the ) each have zero mean (the sum of each column is zero), and that they are linearly independent. Moreover, assume that there exists an such that
(21d) I.e., the are not too sparse.
2.2 Dynamic Mode Decomposition (DMD)
From (11), we see that the columns of represent a multivariate time series. We first partition the matrix into two matrices
(22a)  
and  
(22b) 
We then compute the matrix via the solution of the optimization problem
(23) 
The minimum norm solution to (23) is given by
(24) 
where the superscript denotes the MoorePenrose pseudoinverse. Note that will be a nonsymmetric matrix with a rank of at most because , from which and are derived, has rank from the construction in (17). Let
(25) 
be its eigenvalue decomposition. In (
25), is a diagonal matrix, where the , ordered as , are the, possibly complex, eigenvalues of and is a matrix of, generically nonorthogonal, unitnorm eigenvectors, denoted by .3 Performance Guarantee
We now establish a recovery condition for the setting where the setting where in (13) are deterministic. To that end, for as in (17), a central object that governs when we are able to successfully recover is the lag covariance matrix defined as
(26) 
Note that we can succinctly express as where is the matrix formed by taking the identity matrix and circularly right shifting the columns by one. We are now ready to state the DMD recovery condition.
In the following result and in all subsequent results, there is an ambiguity or mismatch between the ordering of the , , , and with that of the and . Formally, there exists a permutation that reorders the and to correspond to the and other quantities, such that the error is minimal. In the statement of our results, without loss of generality, we will assume that , i.e., that it is the identity permutation.
For as in (17) and defined as in (26), suppose that the conditions in (21) hold. Further suppose that
(27a)  
Moreover, assume that for we have that  
(27b)  
for some such that  
(27c) 
Let  
(28a) 
a) Then, we have that
(29a)  
where is given by  
(29b)  
b) Moreover, for each , we have that  
(29c) 
3.1 Application of Theorem 3: DMD Unmixes Multivariate Mixed Fourier Series
Consider the setting where in (10) is modeled as
(30) 
The is thus a linear mixture of Fourier series. This model frequently comes up in many applications such as the eigenwalker model for human motion: Troje (2002a, Equations (1) and (2)), Troje (2002b) and Unuma et al. (1995, Equations (1) and (2)).
This model fits into the framework of Theorem 3 via an application of Corollary 3.1 below. This implies the the DMD modes will correctly correspond to the nonorthogonal mixing modes. Using the SVD in this setting would recover orthogonal modes that would be linear combinations of the latent nonorthogonal dynamic modes.
[Mixtures of Cosines] Assume that the are given by (30). Then we have that
(31a)  
where  
(31b)  
and that for each , we have that  
(31c) 
Corollary 3.1 explains why DMD successfully unmixes the eigenwalker data in Figure 2. In that setting, PCA/SVD does not succeed because it returns an orthogonal matix as an estimate of the nonorthogonal mixing matrix. The ability of DMD to reliably unmix nonorthogonally mixed multivariate Fourier series, and the fact that the eigenvalues are cosines of the frequencies, provides some context for the statement that DMD is a spectral algorithm where the eigenspectra reveal information on Fourier spectra (Rowley et al., 2009).
3.2 An Extension of Theorem 3: Stationary, Ergodic Time Series
We now consider the setting where are elements of a stationary, ergodic time series and the , thus formed. Consider the matrix
(32) 
When is diagonal, then DMD asymptotically unmixes the time series, as expressed in the Theorem below. We will require the assumptions from (21), with the following updates:

Assume that the , , , and are ordered so that
(33a) where
(33b) 
Assume that
(33c) i.e., that the limit of the ratio is finite.
[Stationary, Ergodic Time Series] Let the be as described above, and let be as defined in (32). Assume that , , and . Suppose that the conditions in (33) hold, in addition to conditions (1, 2, 4) from (21).
a) For some and , we have that
(34a)  
Then for ,  
(34b)  
with probability at least  
(34c) 
If the are samples from a stationary, ergodic ARMA process, we may simplify the results of Theorem 3.2 slightly. [ARMA Processes] Assume that the are samples from an ARMA process. Then (34a) may be replaced with
(35) 
The iterated logarithmic rate in our error bounds and accompanying probability, are consequences of the classical time series results in HongZhi et al. (1982). Here, we have stated a result that is similar in spirit to that for SOBI, given in Miettinen et al. (2016). Our result says that time series that are uncorrelated at lags and can be unmixed, provided that they are not sparse. The result for SOBI requires uncorrelatedness at all integral lags, and states an asymptotic distributional result; our result relies on looser assumptions, and is a finite sample guarantee. It should be noted that at the expense of using a single lag, our result is slightly weaker than the convergence described in Miettinen et al. (2016, Theorem 1).
4 Dmd
We have previously described and analyzed the DMD algorithm at a lag of . That is, we let and differ by one timestep. The proof of Theorem (3) reveals that DMD, in this formulation, unmixes signals that are uncorrelated at a lag of one timestep, but have nonvanishing autocorrelations at this lag. However, we may easily imagine and construct signals that have nonvanishing autocorrelations or autocorrelation functions that do not peak at a lag of , or signals that are uncorrelated at later lags. This is the basis for DMD which we describe next.
From (11), we recall that the columns of represent a multivariate time series. We first partition the matrix into two matrices:
(36a)  
and  
(36b) 
We then compute the matrix via the solution of the optimization problem
(37) 
The minimum norm solution to (37) is given by
(38) 
Note that will be a nonsymmetric matrix with a rank of at most because , from which and are derived, has rank from the construction in (17). Once again, let
(39) 
be its eigenvalue decomposition. In (55), is a diagonal matrix, where are the (possibly complex) eigenvalues of and is a matrix of, generically nonorthogonal, unitnorm eigenvectors that are denoted by .
In what follows, we will refer to the computation of (38) and the subsequent decomposition (55) as the lag DMD algorithm.
Similarly to the standard DMD algorithm, our lag DMD algorithm depends on a lag cross covariance matrix. Let the lag covariance matrix defined as
(40) 
Note that we can succinctly express as where is the matrix formed by taking the identity matrix and circularly right shifting the columns by one. After defining the following updates to the assumptions in (21, we are ready to state the DMD recovery condition.

Assume that is fixed, with
(41a) 
Assume that
(41b) and for large . I.e., is small relative to .
[lag DMD] For as in (17) and defined as in (40), suppose that the conditions in (41) hold along with conditions (2, 3, 4) from (21). Further suppose that
(42a)  
Moreover, assume that for we have that  
(42b)  
for some such that  
(42c) 
a) Then, assuming that is given by (28a), we have that
(43a)  
where is given by  
(43b)  
b) Moreover, for each , we have that  
(43c) 
4.1 Extensions of Theorem 4: Stationary, Ergodic Time Series
In Theorem 3.2, we provided a result for DMD applied to a stationary, ergodic time series. This result generalizes very simply to higher lags. Once again, we must define the expectation of the lag matrix:
(44) 
[Stationary, Ergodic Time Series at Lag ] Suppose that the conditions in (33) hold, in addition to conditions (2, 4) from (21). Moreover, let satisfy the conditions in (41) in addition to
(45a)  
for some value of . 
Let the be as described above, and let be as defined in (44). Assume that , , and . Then, we have that
a) For some and , we have that
(45b) 
Then,
(45c) 
with probability at least
(45d) 
If the are samples from a stationary, ergodic ARMA process, we may simplify the results of Theorem 44 slightly. [ARMA Processes at Lag ] Assume that the are samples from an ARMA process. Then (45a) may be replaced with
(46) 
for some , and (45b) may be replaced with
(47) 
These results for stationary time series can be thought of as a finitesample analogue of the SOBI results given in Miettinen et al. (2016). Our assumptions are similar, in that we also assume that the latent time series have a diagonal autocorrelation for and . We differ here in that we allow for , but require nonsparsity of the , as well as requiring only two lags at which the time series are uncorrelated. As before, at the expense of using a single lag, our result is slightly weaker than the convergence described in Miettinen et al. (2016, Theorem 1).
5 Estimating the temporal behavior:
We now establish a recovery condition for deterministic .
[Extending the bounds to ] Assume that the conditions of Theorem 4 hold for a lag with a bound for the squared estimation error of the . Moreover, assume that
Then, given an estimate of the top left eigenvectors of , denoted by the rows of the matrix , let be formed by normalizing the columns of . The columns of are denoted by , and let
Then, we have that
(48) 
This result translates the results for the mixing matrix to the estimation of the signals . For the practitioner intending to estimate the latent signals instead of the mixing matrix, this final result has a greater utility.
5.1 Applications of Theorem 5: Cosines
[Cosines] Assume that the are given by (30). Then we have that
(49a)  
where  
(49b) 
6 Missing Data Analysis
We now consider the randomly missing data setting. We assume that the data is modeled as
(50) 
where is a masking matrix, whose entries are drawn uniformly at random:
(51) 
The notation represents the Hadamard or elementwise matrix product.
6.1 The tSVDDMD algorithm
A natural, and perhaps the simplest, choice to ‘fillin’ the missing entries in is to use a lowrank approximation, also known as a truncated SVD (Davenport and Romberg, 2016; Eckart and Young, 1936). That is, given , we compute the SVD
and then the rank truncation
(52) 
where the columns of and are the and , respectively, and the are the nonzero entries of . In what follows, , , and will denote the singular vectors and values of . We assume that the number of sources is known apriori.
After ‘fillingin’ the missing entries of and computing , we may apply the DMD algorithm to . If has columns
we may define
(53a)  
and  
(53b) 
We have dropped the dependence for clarity. Then, we may define
(54) 
and take an eigenvalue decomposition:
(55) 
For the sake of naming consistency, we will refer to this procedure as the tSVDDMD algorithm.
6.2 Assumptions
We now provide a DMD recovery performance guarantee. Before stating the result, we require some definitions and further conditions. In addition to the previous assumptions about , the , the relative values of , , , and , and the linear independence of the , we require the following conditions that augment (21). For clarity and conciseness in what follows, we define the constant
(56a)  
and the quantities  
Comments
There are no comments yet.