# Time Series Source Separation using Dynamic Mode Decomposition

The dynamic mode decomposition (DMD) extracted dynamic modes are the non-orthogonal eigenvectors of the matrix that best approximates the one-step temporal evolution of the multivariate samples. In the context of dynamic system analysis, the extracted dynamic modes are a generalization of global stability modes. We apply DMD to a data matrix whose rows are linearly independent, additive mixtures of latent time series. We show that when the latent time series are uncorrelated at a lag of one time-step then, in the large sample limit, the recovered dynamic modes will approximate, up to a column-wise normalization, the columns of the mixing matrix. Thus, DMD is a time series blind source separation algorithm in disguise, but is different from closely related second order algorithms such as SOBI and AMUSE. All can unmix mixed ergodic Gaussian time series in a way that ICA fundamentally cannot. We use our insights on single lag DMD to develop a higher-lag extension, analyze the finite sample performance with and without randomly missing data, and identify settings where the higher lag variant can outperform the conventional single lag variant. We validate our results with numerical simulations, and highlight how DMD can be used in change point detection.

## Authors

• 2 publications
• 12 publications
07/20/2020

### Time Series Source Separation with Slow Flows

In this paper, we show that slow feature analysis (SFA), a common time s...
01/14/2018

### On the number of signals in multivariate time series

We assume a second-order source separation model where the observed mult...
04/03/2009

### Performing Nonlinear Blind Source Separation with Signal Invariants

Given a time series of multicomponent measurements x(t), the usual objec...
02/26/2018

### Data-Driven Source Separation Based on Simplex Analysis

Blind source separation (BSS) is addressed, using a novel data-driven ap...
02/25/2018

### Time Series Analysis via Matrix Estimation

We consider the task of interpolating and forecasting a time series in t...
11/13/2021

### Nyström Regularization for Time Series Forecasting

This paper focuses on learning rate analysis of Nyström regularization w...
06/28/2014

### Convex Analysis of Mixtures for Separating Non-negative Well-grounded Sources

Blind Source Separation (BSS) has proven to be a powerful tool for the a...
##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

The Dynamic Mode Decomposition (DMD) algorithm was invented by P. Schmid as a method for extracting dynamic information from temporal measurements of a multivariate fluid flow vector

(Schmid, 2010). The dynamic modes extracted are the non-orthogonal eigenvectors of a non-normal matrix that best linearizes the one-step evolution of the measured vector.

Schmid showed that the dynamic modes recovered by DMD correspond to the globally stable modes in the flow (Schmid, 2010). The non-orthogonality of the recovered dynamic modes reveals spatial structure in the temporal evolution of the measured fluid flows in a way that other second order spatial correlation based methods, such as the Proper Orthogonal Decomposition (POD), do not (Kerschen et al., 2005). This spurred follow-on work on other applications and extensions of DMD to understanding dynamical systems from measurements.

### 1.1 Previous work on DMD and the analysis of dynamical systems

After the initial work proposing DMD, the first major analyses of the algorithm drew connections between the DMD modes and the eigenfunctions of the Koopman operator from dynamical system theory. Rowley et al. and Mezić et al. showed that under certain conditions, the DMD modes approximate the eigenfunctions of the Koopman operator for a given system

(Rowley et al., 2009; Mezić, 2013). Related work in Bagheri (2013) studied the Koopman operator directly, analyzed its spectrum, and compared it against the spectrum of the matrix decomposed in DMD. The work in Rowley et al. (2009) also explained how the linear DMD modes can elucidate the structure in the temporal evolution in nonlinear fluid flows. The work in Črnjarić-Žic et al. (2017)

provided a further analysis of the Koopman operator and more connections to DMD. More recently, Lusch et al. have shown how deep learning can be combined with DMD to extract modes for a non-linearly evolving dynamical system

(Lusch et al., 2018).

There have been several extensions of DMD. The authors in Chen et al. (2012) developed a method to improve the robusness of DMD to noise. Jovanovic et al. proposed a sparsity-inducing formulation of DMD that allowed fewer dynamic modes to better capture the dynamical system (Jovanović et al., 2014). Tu et al. developed a DMD variant that takes into account systematic measurement errors and measurement noise (Tu et al., 2014); this framework was extended in Hemati et al. (2017). A Bayesian, probabilistic variant of DMD was developed in Takeishi et al. (2017), where a Gibbs sampler for the modes and a sparsity-inducing prior were proposed. Another recent extension of DMD includes an online (or streaming) version of DMD (Zhang et al., 2017).

Additionally, there have been applications of DMD to other domains besides computational fluid mechanics. The work in Bai et al. (2017) applied DMD to compressed sensing settings. A related work applied DMD to model the background in a streaming video (Pendergrass et al., 2017). The authors in Mann and Kutz (2016) applied DMD to finance, by using the predicted modes and temporal variations to forecast future market trends. The authors in Berger et al. (2015)

brought DMD to the field of robotics, and used DMD to estimate perturbations in the motion of a robot. DMD has also been applied to power systems analysis, where it has been used to analyze transients in large power grids

(Barocio et al., 2015). There are many more applications and extensions, and we point the interested reader to the recent book by Kutz et al. (2016).

### 1.2 Our main finding: DMD unmixes lag-1 (or higher lag) uncorrelated time series

Suppose that we are given multivariate observations modeled as

 xt=Hst=QDst, (1)

where is a non-singular mixing matrix, and is the latent vector of signals (or sources). The matrix has unit-norm columns and is related to by

 (2)

Setting entries of the diagonal matrix as ensures that as in (1).

In what follows, we will adopt the following notational convention: we shall use boldface to denote vectors such as . Matrices, such as , will be denoted by non-boldface upper-case letters; and scalars, such as , will be denoted by lower-case symbols.

We assume, without loss of generality, that

 E[st]=0p and E[stsTt]=Ip. (3)

The lag- covariance matrix of is defined as

 E[Lτ]=E[stsTt+τ]=E[st+τsTt], (4)

where is a non-negative integer.

If we are able to form a reliable estimate of the mixing matrix from the multivariate observations then, via Eq. (1), we can unmix the latent signals by computing . Inferring and computing will also similarly unmix the signals. Inferring the mixing matrix and unmixing the signals (or sources) is referred to as blind source separation (Choi et al., 2005).

Our key finding is that when the lag-1 covariance matrix in (4) is diagonal, corresponding to the setting where the latent signals are lag-1 uncorrelated weakly stationary time series, and there are sufficiently many samples of , then the DMD algorithm in (23) produces a non-normal matrix whose non-orthogonal eigenvectors are reliably good estimates of in (1). In other words, DMD unmixes lag-1 uncorrelated signals and weakly stationary time series. Theorems 3 and 3.2 summarize our findings in this regard by stating the problem in terms of the estimation of the matrix .

Our findings reveal that a straightforward extension of DMD, described in Section 4 and (37), allows -DMD to unmix lag uncorrelated signals and time series. This brings up the possibility of using a higher lag to unmix signals that might exhibit a more favorable correlation at larger lag than at a lag of one. Indeed, in Figure 4 we provide one such example where -DMD provides a better estimate of than does -DMD.

Our main contribution, which builds on our previous work in Prasadan and Nadakuditi (2018), is the analysis of the unmixing performance of DMD and -DMD (introduced in Section 4), when unmixing deterministic signals and random, weakly stationary time series in the finite sample regime and in the setting where there is randomly missing data in the observations .

### 1.3 New insight: DMD can unmix ergodic time series that ICA cannot

Independent Component Analysis (ICA) is a classical algorithm for blind source separation (Lee, 1998; Mitsui et al., 2017) that is often used for the cocktail party problem of unmixing mixed audio signals. Our analysis reveals that DMD can be succesfully applied to this problem as well because independent audio sources are well modeled as one-lag (or higher lag) uncorrelated (see Figure 8).

It is known that ICA fails when more then one of the independent, latent signals is normally distributed. A consequence of this is that ICA will fail to unmix mixed independent, ergodic time series with Gaussian marginal distributions. Our analysis reveals that DMD will succeed in this setting, even as ICA fails; see Figure

1 for an illustration where ICA fails to unmix two mixed, independent Gaussian AR(1) processes while DMD succeeds. Thus, DMD can and should be used by practitioners to re-analyze multivariate time series data for which the use of ICA has not revealed any insights.

### 1.4 New insight: DMD can unmix mixed Fourier series that PCA/SVD cannot

The eigenwalker model, described in Troje (2002b), is a linear model for human motion. The model is a linear combination of vectors, via

 xt=k∑i=1qicos(ωit+ϕi). (5)

The vectors are the modes or principal components of the motion, and each has a sinusoidal temporal variation. We generate our model as follows:

 xt=q1cos(2t)+q2cos(t/4),

for to , where

 Q=[q1q2]=⎡⎢ ⎢⎣1/32/√52/31/√52/30⎤⎥ ⎥⎦.

This model has been decomposed with ICA, and used for video motion editing and analysis (Shapiro et al., 2006). Here, we apply PCA (the SVD), and compare it to DMD. In Figure 2, we display the results of unmixing with the SVD and with DMD. We observe that DMD successfully unmixes the cosines, while the SVD fails: note that unless the are orthogonal, there is no hope of a successful unmixing. Moreover, the estimation of of from the SVD fails, as we find that

 ˆQSVD=⎡⎢⎣−0.6868950.624695−0.623497−0.243983−0.373399−0.741774⎤⎥⎦,

which has a squared error of , while the estimate from DMD has a squared error of , where the error is computed according to (29).

### 1.5 Connection with other algorithms for time series blind source separation

Let

be the singular value decomposition (SVD) of

. Then, we have that and

 Σxx=E[xtxTt]=HHT=UΣ2UT. (6)

Given and , we can compute the whitened vector

 wt=Σ−1/2xxxt, (7)

whose covariance matrix is given by . Then from (1) and (6) we have that

 wt=(UVT)st, (8)

where the mixing matrix

is an orthogonal matrix because the

and matrices, which correspond to the left and right singular vector matrices of in (1) are orthogonal.

Equation (8) reveals that we can solve the blind source separation problem and unmix from observations of if we can infer the orthogonal mixing matrix from data. To that end, we note that

 E[wtwTt+τ]=(UVT)E[stsTt+τ](UVT)T=(UVT)E[Lτ](UVT)T. (9)

Equation (9) reveals that when the latent signals are lag-1 uncorrelated, i.e., is a diagonal matrix, then the lag-1 covariance matrix of the whitened vector will be diagonalized by the orthogonal matrix . The sample lag-1 covariance matrix computed from finite data will, in general, not be symmetric and so we might infer from the eigenvectors of the symmetric part: this leads to the AMUSE (Algorithm for Multiple Unknown Signals Extraction) method (Tong et al., 1990).

A deeper inspection of (9) reveals that if are second order, weakly stationary time series that are uncorrelated for multiple values of (corresponding to multiple lags), then we can infer (which, incidentally corresponds to the polar part of the polar decomposition of the mixing matrix in (1)) by posing it as joint-diagonalization of for lags corresponding to . This is the basis of the second-order blind identification (SOBI) method (Belouchrani et al., 1997) where the joint diagonlization problem is addressed by finding the orthogonal matrix that minimizes the sums-of-squares of the off-diagonal entries of . Numerically, this problem is solvable via the JADE method (Cardoso and Souloumiac, 1993; Miettinen et al., 2017, 2016).

Miettinen et al analyze the performance of a symmetric variant of the SOBI method in Miettinen et al. (2016)

and the problem of determining the number of latent signals that are distinct from white noise in

Matilainen et al. (2018). Their results for the performance are asymptotic and distributional. That is, the limiting distribution of the estimated matrix is computed, when the input signals are realizations of some time series, with zero mean and diagonal autocorrelations at every lag . As will be seen in what follows, these assumptions are very similar to those that we impose on DMD. Our analysis for the missing data ssetting is new and has no counter-part in the SOBI or AMUSE performance analysis literature.

In Table 1, we summarize the various algorithms for unmixing of stationary time series. Table 1 brings into sharp focus the manner in which DMD and -DMD are similar to and different from the AMUSE and SOBI algorithms. All algorithms diagonalize a matrix; SOBI and AMUSE estimate orthogonal matrices while DMD and -DMD estimate non-orthogonal matrices. The SOBI and AMUSE algorithms diagonalize cross-covariance matrices formed from whitened time series data while DMD and -DMD works on the time series data directly. Thus SOBI and AMUSE explicity whiten the data while DMD implicitly whitens the data. SOBI and DMD exhibit similar performance (see Figure. 7) – a more detailed theoretical study comparing their performance in the noisy setting is warranted.

A contribution of this is a non-asymptotic finite sample performance analysis for the DMD and -DMD algorithm in the setting where the mixed deterministic signals or erdodic time series are approximately (or exactly) one- or higher lag approximately uncorrelated.

### 1.6 Organization

The remainder of this paper is organized as follows. In Section 2, we introduce the time series data matrix model and describe the DMD algorithm in Section 2.2. We provide a DMD performance guarantee for unmixing deterministic signals in Section 3; a corollary of that result in Section 3.1 explains why DMD is particularly apt for unmixing multivariate mixtures of Fourier series such as the “eigen-walker” model. We extend our analysis to stationary, ergodic time series data in Section 3.2. We describe and analyze the performance of the higher lag extension of DMD, which we call -DMD, in Section 4 and extend its analysis to unmixing ergodic time series in Section 5. We analyze the setting where the time series data matrix has randomly missing data in Section 6. We validate our theoretical results with numerical simulations in Section 7. In Section 8, we describe how a time series matrix can be factorized using DMD to obtain a Dynamic Mode Factorization (DMF) involving the product of the DMD estimate of the (column-wise normalized) mixing matrix and the coordinates, which represent the unmixed latent signals. We show how DMF can be applied to the cocktail party problem in Choi et al. (2005) in Section 8 and how unmixing the latent series via DMF can help improve time series change point detection in Section 8.2. We offer some concluding remarks in Section 9. The proofs of our results are deferred to the appendices.

## 2 Model and Setup

Suppose that, at time , we are given a dimensional time series vector

where an individual entry , for , of is modeled as

 xjt=k∑i=1bijcit, (10)

and is the entry of a dimensional vector . Each is the entry of an dimensional vector , and the are samples of a time series. Equation (10) can be succinctly written in vector form as

 xt=k∑i=1bicit=B⎡⎢ ⎢⎣c1t⋮ckt⎤⎥ ⎥⎦, (11)

where the matrix is defined as

 B=[b1⋯bk].

We are given samples corresponding to uniformly spaced time instances . In what follows, without loss of generality, we assume that . Let be the matrix defined as

 X=[x1⋯xn]. (12)

We define the matrix with columns as

 CT=⎧⎪ ⎪⎨⎪ ⎪⎩⎡⎢ ⎢⎣c1t⋯⋮⋯ckt⎤⎥ ⎥⎦⎫⎪ ⎪⎬⎪ ⎪⎭nt=1. (13)

Consequently, we have that

 X=BCT, (14)

where is the “latent time series” matrix given by (13). Equation (14) reveals that the multivariate time series matrix is a linear combination of rows of the latent time series matrix.

Suppose that for ,

 qi=bi∥bi∥2 and si=ci∥ci∥2, (15)

and the matrices

 Q=[q1⋯qk] and S=[s1⋯sk]. (16)

Then, from (14), and from the definition of and , it can be shown that

 X=QDST (17)

where, for ,

 D=diag(…,∥bi∥2⋅∥ci∥2,…). (18)

We will define

 di=∥bi∥2⋅∥ci∥2, (19)

and assume that, without loss of generality, the and hence the , , , and are ordered so that

 d1≥d2≥…≥dk>0. (20)

Note that by construction, in (17), the columns of the matrices and have unit norm. In what follows, we assume that and have linearly independent columns, that , that the columns of have zero mean, and that the columns of are canonically non-random and non-orthogonal. Our goal in what follows is to estimate the columns of the matrices and .

### 2.1 Technical Assumptions

We will require the following set of technical assumptions on the data.

1. Assume that is fixed, with

 k≤min{p,n−1} (21a)
2. Assume that the are linearly independent, so that is a finite quantity:

 1≤σ1(Q)σk(Q)<∞. (21b)

Here, denotes the singular value of . Essentially, the conditioning of the is independent of and . Moreover, the are canonically non-random and non-orthogonal.

3. Assume that

 limn→∞d1dk↛∞, (21c)

i.e., that the limit of the ratio is finite.

4. Assume that columns of (the ) each have zero mean (the sum of each column is zero), and that they are linearly independent. Moreover, assume that there exists an such that

 maxi,j|Sij|≤O(1nα). (21d)

I.e., the are not too sparse.

### 2.2 Dynamic Mode Decomposition (DMD)

From (11), we see that the columns of represent a multivariate time series. We first partition the matrix into two matrices

 X(0)=[x1x2⋯xn−1], (22a) and X(1)=[x2x3⋯xn]. (22b)

We then compute the matrix via the solution of the optimization problem

 ˆA=argminA∈Rp×p∥∥X(1)−AX(0)∥∥F. (23)

The minimum norm solution to (23) is given by

 ˆA=X(1)X+(0), (24)

where the superscript denotes the Moore-Penrose pseudoinverse. Note that will be a non-symmetric matrix with a rank of at most because , from which and are derived, has rank from the construction in (17). Let

 ˆA=ˆQˆΛˆQ+, (25)

be its eigenvalue decomposition. In (

25), is a diagonal matrix, where the , ordered as , are the, possibly complex, eigenvalues of and is a matrix of, generically non-orthogonal, unit-norm eigenvectors, denoted by .

In what follows, we will refer to the computation of (24) and the subsequent decomposition (25) as the DMD algorithm.

## 3 Performance Guarantee

We now establish a recovery condition for the setting where the setting where in (13) are deterministic. To that end, for as in (17), a central object that governs when we are able to successfully recover is the lag- covariance matrix defined as

 Lij=n∑l=1Si,lSj,[l+1] mod n. (26)

Note that we can succinctly express as where is the matrix formed by taking the identity matrix and circularly right shifting the columns by one. We are now ready to state the DMD recovery condition.

In the following result and in all subsequent results, there is an ambiguity or mismatch between the ordering of the , , , and with that of the and . Formally, there exists a permutation that reorders the and to correspond to the and other quantities, such that the error is minimal. In the statement of our results, without loss of generality, we will assume that , i.e., that it is the identity permutation.

For as in (17) and defined as in (26), suppose that the conditions in (21) hold. Further suppose that

 limn→∞|Lii|↛0. (27a) Moreover, assume that for i≠j we have that |Lij|≤O(f(n)) and ∣∣sTisj∣∣≤O(f(n)) (27b) for some f(n) such that limn→∞f(n)=0. (27c)
 Let pi=sign(ˆqTiqi). (28a)

a) Then, we have that

 k∑i=1∥ˆqi−piqi∥22≤O([d1dk]2⋅k7δ2L⋅[f2(n)+n−2α]), (29a) where δL is given by δL=mini≠j∣∣Lii−Ljj∣∣. (29b) b) Moreover, for each Lii, we have that |Lii−λi|2≤O([d1dk]2⋅k6⋅[f2(n)+n−2α]). (29c)

### 3.1 Application of Theorem 3: DMD Unmixes Multivariate Mixed Fourier Series

Consider the setting where in (10) is modeled as

 cit=cos(ωit+ϕi). (30)

The is thus a linear mixture of Fourier series. This model frequently comes up in many applications such as the eigenwalker model for human motion: Troje (2002a, Equations (1) and (2)), Troje (2002b) and Unuma et al. (1995, Equations (1) and (2)).

This model fits into the framework of Theorem 3 via an application of Corollary 3.1 below. This implies the the DMD modes will correctly correspond to the non-orthogonal mixing modes. Using the SVD in this setting would recover orthogonal modes that would be linear combinations of the latent non-orthogonal dynamic modes.

[Mixtures of Cosines] Assume that the are given by (30). Then we have that

 k∑i=1∥ˆqi−piqi∥22≤O([d1dk]2⋅k7δ4L⋅1n), (31a) where δL=mini≠j∣∣cosωi−cosωj∣∣, (31b) and that for each ωi, we have that |cosωi−λi|2≤O([d1dk]2⋅k6n). (31c)

Corollary 3.1 explains why DMD successfully unmixes the eigenwalker data in Figure 2. In that setting, PCA/SVD does not succeed because it returns an orthogonal matix as an estimate of the non-orthogonal mixing matrix. The ability of DMD to reliably unmix non-orthogonally mixed multivariate Fourier series, and the fact that the eigenvalues are cosines of the frequencies, provides some context for the statement that DMD is a spectral algorithm where the eigen-spectra reveal information on Fourier spectra (Rowley et al., 2009).

### 3.2 An Extension of Theorem 3: Stationary, Ergodic Time Series

We now consider the setting where are elements of a stationary, ergodic time series and the , thus formed. Consider the matrix

 ELij=n∑l=1E[Si,lSj,[l+1] mod n]. (32)

When is diagonal, then DMD asymptotically unmixes the time series, as expressed in the Theorem below. We will require the assumptions from (21), with the following updates:

1. Assume that the , , , and are ordered so that

 Ed1≥Ed2≥…≥Edk>0, (33a)

where

 Edi=∥bi∥2⋅E∥ci∥2. (33b)
2. Assume that

 limn→∞Ed1Edk↛∞, (33c)

i.e., that the limit of the ratio is finite.

[Stationary, Ergodic Time Series] Let the be as described above, and let be as defined in (32). Assume that , , and . Suppose that the conditions in (33) hold, in addition to conditions (1, 2, 4) from (21).

a) For some and , we have that

 f(n)≤o((logn)2/r(loglogn)(1+ϵ)2/rn−1/2). (34a) Then for i≠j, ∣∣Lij∣∣≤O(f(n)) and ∣∣sTisj∣∣≤O(f(n)) (34b) with probability at least 1−O([logn(loglogn)1+ϵ]−1). (34c)

b) Then we have that

 |di−Edi|≤f(n)[1+o(1)] for i=1,…,k,

with probability at least (34c).

c) For given by (28a), we have that

 k∑i=1∥ˆqi−piqi∥22≤O([Ed1Edk]2⋅k7δ2L⋅[f2(n)+n−2α]), (34d)

where is given by

 δL=mini≠j∣∣ELii−ELjj∣∣, (34e)

with probability at least (34c).

d) Moreover, for each , we have that

 |ELii−λi|2≤O([Ed1Edk]2⋅k6⋅[f2(n)+n−2α]), (34f)

with probability (34c).

If the are samples from a stationary, ergodic ARMA process, we may simplify the results of Theorem 3.2 slightly. [ARMA Processes] Assume that the are samples from an ARMA process. Then (34a) may be replaced with

 f(n)≤o((loglogn/n)1/2). (35)

The iterated logarithmic rate in our error bounds and accompanying probability, are consequences of the classical time series results in Hong-Zhi et al. (1982). Here, we have stated a result that is similar in spirit to that for SOBI, given in Miettinen et al. (2016). Our result says that time series that are uncorrelated at lags and can be unmixed, provided that they are not sparse. The result for SOBI requires uncorrelatedness at all integral lags, and states an asymptotic distributional result; our result relies on looser assumptions, and is a finite sample guarantee. It should be noted that at the expense of using a single lag, our result is slightly weaker than the convergence described in Miettinen et al. (2016, Theorem 1).

## 4 τ−Dmd

We have previously described and analyzed the DMD algorithm at a lag of . That is, we let and differ by one time-step. The proof of Theorem (3) reveals that DMD, in this formulation, unmixes signals that are uncorrelated at a lag of one time-step, but have non-vanishing autocorrelations at this lag. However, we may easily imagine and construct signals that have non-vanishing autocorrelations or autocorrelation functions that do not peak at a lag of , or signals that are uncorrelated at later lags. This is the basis for -DMD which we describe next.

From (11), we recall that the columns of represent a multivariate time series. We first partition the matrix into two matrices:

 Xτ(0)=[x1x2⋯xn−τ], (36a) and Xτ(1)=[x1+τx2+τ⋯xn]. (36b)

We then compute the matrix via the solution of the optimization problem

 ˆAτ=argminA∈Rp×p∥∥Xτ(1)−AXτ(0)∥∥F. (37)

The minimum norm solution to (37) is given by

 ˆAτ=Xτ(1)(Xτ(0))+. (38)

Note that will be a non-symmetric matrix with a rank of at most because , from which and are derived, has rank from the construction in (17). Once again, let

 ˆAτ=ˆQˆΛˆQ+, (39)

be its eigenvalue decomposition. In (55), is a diagonal matrix, where are the (possibly complex) eigenvalues of and is a matrix of, generically non-orthogonal, unit-norm eigenvectors that are denoted by .

In what follows, we will refer to the computation of (38) and the subsequent decomposition (55) as the -lag DMD algorithm.

Similarly to the standard DMD algorithm, our -lag DMD algorithm depends on a lag- cross covariance matrix. Let the lag- covariance matrix defined as

 [Lτ]ij=n∑l=1Si,lSj,[l+τ] mod n. (40)

Note that we can succinctly express as where is the matrix formed by taking the identity matrix and circularly right shifting the columns by one. After defining the following updates to the assumptions in (21, we are ready to state the DMD recovery condition.

1. Assume that is fixed, with

 k≤min{p,n−τ} (41a)
2. Assume that

 τn−2α↛∞ (41b)

and for large . I.e., is small relative to .

[-lag DMD] For as in (17) and defined as in (40), suppose that the conditions in (41) hold along with conditions (2, 3, 4) from (21). Further suppose that

 limn→∞∣∣[Lτ]ii∣∣↛0. (42a) Moreover, assume that for i≠j we have that ∣∣[Lτ]ij∣∣≤O(f(n)) and ∣∣sTisj∣∣≤O(f(n)) (42b) for some f(n) such that limn→∞f(n)=0. (42c)

a) Then, assuming that is given by (28a), we have that

 k∑i=1∥ˆqi−piqi∥22≤O([d1dk]2⋅k7δ2L⋅[f2(n)+τn−2α]), (43a) where δL is given by (43b) b) Moreover, for each [Lτ]ii, we have that ∣∣[Lτ]ii−λi∣∣2≤O([d1dk]2⋅k6⋅[f2(n)+τn−2α]). (43c)

### 4.1 Extensions of Theorem 4: Stationary, Ergodic Time Series

In Theorem 3.2, we provided a result for DMD applied to a stationary, ergodic time series. This result generalizes very simply to higher lags. Once again, we must define the expectation of the lag matrix:

 E[Lτ]ij=n∑l=1E[Si,lSj,[l+τ] mod n]. (44)

[Stationary, Ergodic Time Series at Lag ] Suppose that the conditions in (33) hold, in addition to conditions (2, 4) from (21). Moreover, let satisfy the conditions in (41) in addition to

 1≤τ≤nr2(r−2), (45a) for some value of r≥4.

Let the be as described above, and let be as defined in (44). Assume that , , and . Then, we have that

a) For some and , we have that

 f(n)≤o((logn)2/r(loglogn)(1+ϵ)2/rn−1/2). (45b)

Then,

 ∣∣[Lτ]ij∣∣≤O(f(n)) and ∣∣sTisj∣∣≤O(f(n)) (45c)

with probability at least

 1−O([logn(loglogn)1+ϵ]−1). (45d)

b) Then we have that

 |di−Edi|≤f(n)[1+o(1)] for i=1,…,k,

with probability (45d).

c) For given by (28a), we have that

 k∑i=1∥ˆqi−piqi∥22≤O([Ed1Edk]2⋅k7δ2L⋅[f2(n)+τn−2α]), (45e)

where is given by

 δL=mini≠j∣∣E[Lτ]ii−[Lτ]jj∣∣, (45f)

with probability (45d).

d) Moreover, for each , we have that

 ∣∣E[Lτ]ii−λi∣∣2≤O([Ed1Edk]2⋅k6⋅[f2(n)+τn−2α]), (45g)

with probability (45d).

If the are samples from a stationary, ergodic ARMA process, we may simplify the results of Theorem 44 slightly. [ARMA Processes at Lag ] Assume that the are samples from an ARMA process. Then (45a) may be replaced with

 1≤τ≤[logn]a, (46)

for some , and (45b) may be replaced with

 f(n)≤o((loglogn/n)1/2). (47)

These results for stationary time series can be thought of as a finite-sample analogue of the SOBI results given in Miettinen et al. (2016). Our assumptions are similar, in that we also assume that the latent time series have a diagonal autocorrelation for and . We differ here in that we allow for , but require non-sparsity of the , as well as requiring only two lags at which the time series are uncorrelated. As before, at the expense of using a single lag, our result is slightly weaker than the convergence described in Miettinen et al. (2016, Theorem 1).

## 5 Estimating the temporal behavior: S

We now establish a recovery condition for deterministic .

[Extending the bounds to ] Assume that the conditions of Theorem 4 hold for a lag with a bound for the squared estimation error of the . Moreover, assume that

 kd21ϵ2d,v

Then, given an estimate of the top left eigenvectors of , denoted by the rows of the matrix , let be formed by normalizing the columns of . The columns of are denoted by , and let

 pi=sign(sTiˆsi).

Then, we have that

 k∑i=1∥ˆsi−pisi∥22≤O(k[d1dk]2ϵ2d,v). (48)

This result translates the results for the mixing matrix to the estimation of the signals . For the practitioner intending to estimate the latent signals instead of the mixing matrix, this final result has a greater utility.

### 5.1 Applications of Theorem 5: Cosines

As we did for Theorem 3, we may restate Theorem 5 for the cosine model.

[Cosines] Assume that the are given by (30). Then we have that

 k∑i=1∥ˆsi−pisi∥22≤O([d1dk]4⋅k8δ4L⋅1n), (49a) where δL=mini≠j∣∣cosωi−cosωj∣∣. (49b)

## 6 Missing Data Analysis

We now consider the randomly missing data setting. We assume that the data is modeled as

 ˜X=X⊙M=(QDST)⊙M, (50)

where is a masking matrix, whose entries are drawn uniformly at random:

 Mi,j={1 with probability q,0 with probability 1−q. (51)

The notation represents the Hadamard or element-wise matrix product.

### 6.1 The tSVD-DMD algorithm

A natural, and perhaps the simplest, choice to ‘fill-in’ the missing entries in is to use a low-rank approximation, also known as a truncated SVD (Davenport and Romberg, 2016; Eckart and Young, 1936). That is, given , we compute the SVD

 ˜X=ˆUˆΣˆVT,

and then the rank- truncation

 ˆXk=k∑i=1ˆσiˆuiˆvTk, (52)

where the columns of and are the and , respectively, and the are the non-zero entries of . In what follows, , , and will denote the singular vectors and values of . We assume that the number of sources is known apriori.

After ‘filling-in’ the missing entries of and computing , we may apply the -DMD algorithm to . If has columns

 ˆXk=[ˆx1ˆx2⋯ˆxn],

we may define

 ˆXτ(0)=[ˆx1ˆx2⋯ˆxn−τ], (53a) and ˆXτ(1)=[ˆx1+τˆx2+τ⋯ˆxn]. (53b)

We have dropped the -dependence for clarity. Then, we may define

 ˜Aτ=ˆXτ(1)(ˆXτ(0))+, (54)

and take an eigenvalue decomposition:

 ˜Aτ=ˆQˆΛˆQ+. (55)

For the sake of naming consistency, we will refer to this procedure as the tSVD-DMD algorithm.

### 6.2 Assumptions

We now provide a DMD recovery performance guarantee. Before stating the result, we require some definitions and further conditions. In addition to the previous assumptions about , the , the relative values of , , , and , and the linear independence of the , we require the following conditions that augment (21). For clarity and conciseness in what follows, we define the constant

 γ=n2αp2βd21k2, (56a) and the quantities g(n,p,k,q)=O(4√q(1−q)d1k×max{n1/4−