# Randomized Dynamic Mode Decomposition

This paper presents a randomized algorithm for computing the near-optimal low-rank dynamic mode decomposition (DMD). Randomized algorithms are emerging techniques to compute low-rank matrix approximations. They are able to ease the computational challenges arising in the area of big data. The idea is to derive from the high-dimensional input matrix a smaller matrix, which is then used to efficiently compute the dynamic modes and eigenvalues. The algorithm is presented in a modular probabilistic framework, and the approximation quality can be controlled via oversampling, and power iterations.

## Authors

• 16 publications
• 38 publications
• 48 publications
08/06/2016

### Randomized Matrix Decompositions using R

Matrix decompositions are fundamental tools in the area of applied mathe...
08/20/2021

### State-Of-The-Art Algorithms For Low-Rank Dynamic Mode Decomposition

This technical note reviews sate-of-the-art algorithms for linear approx...
05/04/2021

### Deterministic matrix sketches for low-rank compression of high-dimensional simulation data

Matrices arising in scientific applications frequently admit linear low-...
09/22/2021

### Randomized Projection Learning Method forDynamic Mode Decomposition

A data-driven analysis method known as dynamic mode decomposition (DMD) ...
12/11/2015

### Randomized Low-Rank Dynamic Mode Decomposition for Motion Detection

This paper introduces a fast algorithm for randomized computation of a l...
10/15/2018

### Compressed Randomized UTV Decompositions for Low-Rank Approximations and Big Data Applications

Low-rank matrix approximations play a fundamental role in numerical line...
09/23/2020

### Fast and stable randomized low-rank matrix approximation

Randomized SVD has become an extremely successful approach for efficient...
##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

Extracting dominant coherent structures and modal expansions from high-dimensional data is a cornerstone of computational science and engineering [42]. The dynamic mode decomposition (DMD) is a leading data-driven algorithm to extract spatiotemporal coherent structures from high-dimensional data sets [40, 46, 26]. DMD originated in the fluid dynamics community [40, 37], where the identification of coherent structures, or modes

, is often an important step in building models for prediction, estimation, and control

[4, 42]. In contrast to the classic proper orthogonal decomposition (POD) [1, 24]

, which orders modes based on how much energy or variance of the flow they capture, DMD identifies spatially correlated modes that oscillate at a fixed frequency in time, possibly with an exponentially growing or decaying envelope. Thus, DMD combines the advantageous features of POD in space and the Fourier transform in time

[26].

With rapidly increasing volumes of measurement data from simulations and experiments, modal extraction algorithms such as DMD may become prohibitively expensive, especially for online or real-time analysis. Even though DMD is based on an efficient singular value decomposition (SVD), computations scale with the dimension of the measurements, rather than with the intrinsic dimension of the data. With increasingly vast measurements, it is often the case that the intrinsic rank of the data does not increase appreciably, even as the dimension of the ambient measurements grows. Indeed, many high-dimensional systems exhibit low-dimensional patterns and coherent structures, which is one of the driving perspectives in both POD and DMD analysis. In this work, we develop a computationally effective strategy to compute the DMD using randomized linear algebra, which scales with the intrinsic rank of the dynamics, rather than with the measurement dimension.

More concretely, we embed the DMD in a probabilistic framework by following the seminal work of Halko et al. [19]. The main concept is depicted in Figure 1. Numerical experiments show that the randomized algorithm achieves considerable speedups over previously proposed randomized algorithms for computing the DMD such as the compressed DMD algorithm [5]. Further, the approximation error can be controlled via oversampling and additional power iterations. This allows the user to choose the optimal trade-off between computational time and accuracy. Importantly, we demonstrate the ability to handle data which are too big to fit into fast memory by using a blocked matrix scheme.

### 1.1 Related Work

Efforts to address the computational challenges of DMD can be traced back to the original paper by Schmid [40]. It was recognized that DMD could be computed in a lower dimensional space and then used to reconstruct the dominant modes and dynamics in the original high-dimensional space. This philosophy has been adopted in several strategies to determine the low-dimensional subspace, as shown in Fig. 2. Since then, efficient parallelized algorithms have been proposed for computations at scale [39].

In [40], the high-dimensional data are projected onto the linear span of the associated POD modes, enabling an inexpensive low-dimensional DMD. The high-dimensional DMD modes have the same temporal dynamics as the low-dimensional modes and are obtained via lifting with the POD modes. However, computing the POD modes may be expensive and strategies have been developed to alleviate this cost. An algorithm utilizing the randomized singular value decomposition (rSVD) has been presented by [12] and [2]. While this approach is reliable and robust to noise, only the computation of the SVD is accelerated, so that subsequent computational steps involved in the DMD algorithm remain expensive. The paper by Bistrian and Navon [2] has other advantages, including identifying the most influential DMD modes and guarantees that the low-order model satisfies the boundary conditions of the full model.

As an alternative to using POD modes, a low-dimensional approximation of the data can be obtained by random projections [5]. This is justified by the Johnson-Lindenstrauss lemma [25, 13] which provides statistical guarantees on the preservation of the distances between the data when projected into the low-dimensional space. This approach avoids the cost of computing POD modes and has favorable statistical error bounds.

In contrast with these projection-based approaches, alternative strategies for reducing the dimension of the data involve computations with subsampled snapshots. In this case, the reduction of the computational cost comes from the fact that only a subset of the data is used to derive the DMD and the full data is never processed by the algorithm. DMD is then determined from a sketch-sampled database. Variants of this approach differ in the sampling strategy; for example, [5] and [11], develop the compressed dynamic mode decomposition, which involves forming a small data matrix by randomly projecting the high-dimensional row-space of a larger data matrix. This approach has been successful in computational fluid dynamics and video processing applications, where the data have relatively low noise. However, this is a suboptimal strategy, which leads to a large variance in the performance due to poor statistical guarantees in the resulting sketched data matrix. Clustering techniques can be employed to select a relevant subset of the data [18]. Further alternatives derive from algebraic considerations, such as the maximization of the volume of the submatrix extracted from the data (e.g., Q-DEIM [10, 30, 29]) or rely on energy-based arguments such as leverage-scores and length-squared sampling [14, 28, 8].

### 1.2 Contribution of the Present Work

This work presents a randomized DMD algorithm that enables the accurate and efficient extraction of spatiotemporal coherent structures from high-dimensional data. The algorithm scales with the intrinsic rank of the dynamics, which are assumed to be low-dimensional, rather than the ambient measurement dimension. We demonstrate the effectiveness of this algorithm on two data sets of increasing complexity, namely the canonical fluid flow past a circular cylinder and the high-dimensional sea-surface temperature dataset. Moreover, we show that is possible to control the accuracy of the algorithm via oversampling and power iterations.

The remainder of the manuscript is organized as follows. Section 2 presents notation and background concepts used throughout the paper. Section 3 outlines the probabilistic framework used to compute the randomized DMD, which is presented in Sec. 4. Section 5 provides numerical results to demonstrate the performance of the randomized DMD algorithm. Final remarks and an outlook are given in Sec. 6.

## 2 Technical Preliminaries

We now establish notation and provide a brief overview of the singular value decomposition (SVD) and the dynamic mode decomposition (DMD).

### 2.1 Notation

Vectors in and are denoted as bold lower case letters . Both real and complex matrices are denoted by bold capitals , and its entry at the -th row and -th column is denoted as . The Hermitian transpose of a matrix is denoted as . The spectral or operator norm of a matrix is defined as the largest singular value of the matrix , i.e., the square root of the largest eigenvalue of the positive-semidefinite matrix :

 ∥X∥2=√λmax(X∗X)=σmax(X).

The Frobenius norm of a matrix is the positive square root of the sum of the absolute squares of its elements, which is equal to the positive square root of the trace of

 ∥X∥F= ⎷n∑i=1m∑j=1|X(i,j)|2=√trace(X∗X).

The relative reconstruction error is , where is an approximation to the matrix . The column space (range) of is denoted and the row space is .

### 2.2 The Singular Value Decomposition

Given an matrix , the ‘economic’ singular value decomposition (SVD) produces the factorization

 X=UΣV∗=r∑iuiσiv∗i, (1)

where and are orthonormal, is diagonal, and . The left singular vectors in provide a basis for the column space of , and the right singular vectors in form a basis for the domain of . contains the corresponding non-negative singular values . Often, only the dominant singular vectors and values are of interest, resulting in the low-rank SVD:

 Xk=UkΣkV∗k=[u1,…,uk]diag(σ1,…,σk)[v1,…,vk]∗=k∑iuiσiv∗i.
##### The Moore-Penrose Pseudoinverse

Given the singular value decomposition , the pseudoinverse is computed as

 X†:=VΣ−1U∗=r∑iviσ−1iu∗i, (2)

where is here understood as

We use the Moore-Penrose pseudoinverse in the following to provide a least squares solution to a system of linear equations.

### 2.3 Dynamic Mode Decomposition

DMD is a dimensionality reduction technique, originally introduced in the field of fluid dynamics [40, 37, 26]. The method extracts spatiotemporal coherent structures from an ordered time series of snapshots , separated in time by a constant step

. Specifically, the aim is to find the eigenvectors and eigenvalues of the time-independent linear operator

that best approximates the map from a given snapshot to the subsequent snapshot as

 xj+1≈Axj. (3)

Following [46], the deterministic DMD algorithm proceeds by first separating the snapshot sequence into two overlapping sets of data

 (4)

and are called the left and right snapshot sequences. Equation (3) can then be reformulated in matrix notation as

 XR≈AXL. (5)

Many variants of the DMD have been proposed since its introduction, as discussed in [26]. Here, we discuss the exact DMD formulation of Tu et al. [46].

#### 2.3.1 Non-Projected DMD

In order to find an estimate for the linear map , the following least-squares problem can be formulated

 ˆA=argminA∥XR−AXL∥2F. (6)

The estimator for the best-fit linear map is given in closed-form as

 ˆA:=XRX†L, (7)

where denotes the pseudoinverse from Eq. (2). The DMD modes are the leading eigenvectors of . If the data are high-dimensional (i.e., is large), the linear map in Eq. (7) may be intractable to evaluate and analyze directly. Instead, a rank-reduced representation of the linear map is determined by projecting onto the first modes.

#### 2.3.2 Projected DMD

We can define the projected map onto the class of rank- linear operators by pre- and post-multiplying Eq. (7) with :

 (8)

where is the projected map. If the projection is made onto the dominant left singular vectors of (i.e., POD modes) then and

 ˜A:=U∗kXRVkΣ−1k, (9)

since . Low-dimensional projection is a form of spectral filtering which has the positive effect of dampening the influence of noise [22, 16]. This effect is illustrated in Figure 3

, which shows the non-projected linear map in absence and presence of white noise. The projected linear map avoids the amplification of noise and acts as a hard-threshold regularizer.

The difficulty is to choose the rank providing a good trade-off between suppressing the noise and retaining the useful information. In practice, the optimal hard-threshold [15]

provides a good heuristic to determine the rank

.

Once the linear map is approximated, its eigendecomposition is computed

 ˜A˜W=˜WΛ, (10)

where the columns of are eigenvectors, and is a diagonal matrix containing the corresponding eigenvalues . Finally, we may recover the high-dimensional DMD modes, which are eigenvectors of , following the approach by Schmid [40]:

 Wk=Uk˜Wk, (11)

where . The matrix has the same dominant eigenvalues as , as they are related via a similarity transform. The dominant eigenvectors of can also be recovered from those of . Alternatively, we may recover DMD modes using the exact DMD [46]

 Wk=XRVkΣ−1k˜Wk. (12)

In both the projected and the non-projected formulation, a singular value decomposition (SVD) is involved. This is computationally demanding, and the resources required to compute DMD can be tremendous for large data matrices.

### 2.4 A Note on Regularization

The projected algorithm described above relies on the truncated singular value decomposition (TSVD), also referred to as pseudo-inverse filter, to solve the unconstrained least-squares problem in Eq. (6). Indeed, Figure 3 illustrates that the low-rank approximation introduces an effective regularization effect. This warrants an extended discussion on regularization, following the work by Hansen [20, 21, 22, 23].

In DMD, we often experience a linear system of equations where or are ill-conditioned, i.e., the ratio between the largest and smallest singular value is large, so that the pseudo-inverse may magnify small errors. Regularization becomes crucial in this situation to compute an accurate estimate of , since a small perturbation in or may result in a large perturbation in the solution. Tikhonov-Phillips regularization [44, 33]

, also known as ridge regression in the statistical literature, is one of the most popular regularization techniques. The regularized least squares problem becomes

 ˆA=argminA∥∥XR−AXL∥2F+λ2∥A∥∥2F. (13)

More concisely, this can be expressed as the following augmented least-squares problem

 ˆA=argminA∥∥∥[XR0]−A[XLλI]∥∥∥2F. (14)

The estimator for the map takes advantage of the regularized inverse

 X†λ:=[X∗X+λ2I]−1X∗=VΣ+λU∗withΣ+λ:=diag(σ21σ21+λ2,...,σ2rσ2r+λ2) (15)

where the additional regularization improves the conditioning of the problem. This form of regularization can also be seen as a smooth filter that attenuates the parts of the solution corresponding to the small singular values. Specifically, the filter takes the form [20]

 fi=σ2iσ2i+λ2,i=1,2,...,r. (16)

The Tikhonov-Phillips regularization scheme is closely related to the TSVD and the Wiener filter. Indeed, the TSVD is known to be a method for regularizing ill-posed linear least-squares problems, which often produces very similar results [47]. More concretely, regularization via the TSVD can be seen as a hard-threshold filter, which takes the form

 fi={1,σi≥σk0,σi<σk. (17)

Here, controls how much smoothing (or low-pass filtering) is introduced, i.e., increasing includes more terms in the SVD expansion; thus components with higher frequencies are included. See [22] for further details. As a consequence of the regularization effect introduced by TSVD, the DMD algorithm requires a careful choice of the target-rank to compute a meaningful low-rank DMD approximation. Indeed, the solutions (i.e., the computed DMD modes and eigenvalues) may be sensitive to the amount of regularization which is controlled via . In practice, however, the TSVD approach yields good approximations, particularly if has a well-determined numerical rank.

## 3 Randomized Methods for Linear Algebra

With rapidly increasing volumes of measurement data from simulations and experiments, deterministic modal extraction algorithms may become prohibitively expensive. Fortunately, a wide range of applications produce data which feature low-rank structure, i.e., the rank of the data matrix, given by the number of independent columns or rows, is much smaller than the ambient dimension of the data space. In other words, the data feature a large amount of redundant information. In this case, randomized methods allow one to efficiently produce familiar decompositions from linear algebra. These so-called randomized numerical methods have the potential to transform computational linear algebra, providing accurate matrix decompositions at a fraction of the cost of deterministic methods. Indeed, over the past two decades, probabilistic algorithms have been prominent for computing low-rank matrix approximations, as described in a number of excellent surveys [19, 28, 8, 3]. The idea is to use randomness as a computational strategy to find a smaller representation, often denoted as sketch. This smaller matrix sketch can be used to compute an approximate low-rank factorization for the high-dimensional data matrix .

Broadly, these techniques can be divided into random sampling and random projection-based approaches. The former class of techniques carefully samples and rescales some ‘interesting’ rows or columns to form a sketch. For instance, we can sample rows by relying on energy-based arguments such as leverage-scores and length-squared sampling [14, 28, 8]. The second class of random projection techniques relies on favorable properties of random matrices and forms the sketch as a randomly weighted linear combination of the columns or rows. Computationally efficient strategies to form such a sketch include the subsampled randomized Hadamard transform (SRHT) [38, 45] and the CountSketch [49]. Choosing between the different sampling and random projection strategies depends on the particular application. In general, sampling is more computational efficient, while random projections preserve more of the information in the data.

### 3.1 Probabilistic Framework

One of the most effective off-the-self methods to form a sketch is the probabilistic framework proposed in the seminal work by Halko et al. [19]:

• Stage A: Given a desired target-rank , find a near-optimal orthonormal basis for the range of the input matrix .

• Stage B: Given the near-optimal basis , project the input matrix onto the low-dimensional space, resulting in . This smaller matrix can then be used to compute a near-optimal low-rank approximation.

#### 3.1.1 Stage A: Computing a Near-Optimal Basis

The first stage is used to approximate the range of the input matrix. Given a target rank , the aim is to compute a near-optimal basis for the input matrix such that

 X≈QQ∗X. (18)

Specifically, the range of the high-dimensional input matrix is sampled using the concept of random projections. Thus a basis is efficiently computed as

 Y=XΩ, (19)

where

denotes a random test matrix drawn from the normal Gaussian distribution. However, the cost of dense matrix multiplications can be prohibitive. The time complexity is order

. To improve this scaling, more sophisticated random test matrices, such as the subsampled randomized Hadamard transform (SRHT), have been proposed [38, 45]. Indeed, the time complexity can be reduced to by using a structured random test matrix to sample the range of the input matrix.

The orthonormal basis

is obtained via the QR-decomposition

.

#### 3.1.2 Stage B: Computing a Low-Dimensional Matrix

We now derive a smaller matrix from the high-dimensional input matrix . Specifically, given the near-optimal basis , the matrix is projected onto the low-dimensional space

 B=Q∗X, (20)

which yields the smaller matrix . This process preserves the geometric structure in a Euclidean sense, so that angles between vectors and their lengths are preserved. It follows that

 X≈QB. (21)

#### 3.1.3 Oversampling

In theory, if the matrix has exact rank , the sampled matrix

spans a basis for the column space, with high probability. In practice, however, it is common that the truncated singular values

are non-zero. Thus, it helps to construct a slightly larger test matrix in order to obtain an improved basis. Thus, we compute using an test matrix instead, where . Thus, the oversampling parameter denotes the number of additional samples. In most situations small values are sufficient to obtain a good basis that is comparable to the best possible basis [19].

#### 3.1.4 Power Iteration Scheme

A second strategy to improve the performance is the concept of power iterations [36, 17]. In particular, a slowly decaying singular value spectrum of the input matrix can seriously affect the quality of the approximated basis matrix . Thus, the method of power iterations is used to pre-process the input matrix in order to promote a faster decaying spectrum. The sampling matrix is obtained as

 Y=((XX∗)qX)Ω (22)

where is an integer specifying the number of power iterations. With , one has . Hence, for , the pre-processed matrix has a relatively fast decay of singular values compared to the input matrix . The drawback of this method is that additional passes over the input matrix are required. However, as few as power iterations can considerably improve the approximation quality, even when the singular values of the input matrix decay slowly.

Algorithm 1 describes this overall procedure for the randomized QB decomposition.

###### Remark 1

A direct implementation of the power iteration scheme, as outlined in Section 3.1.4, is numerically unstable due to round-off errors. Instead, the sampling matrix is orthogonalized between each computational step to improve the stability [19].

###### Remark 2

As default values for the oversampling and power iteration parameter, we suggest , and .

#### 3.1.5 Theoretical Performance

Both the concept of oversampling and the power iteration scheme provide control over the quality of the low-rank approximation. The average-case error behavior is described in the probabilistic framework as [31]:

 EμQ∥X−QQ∗X∥F≤[1+√kp−1+e√lp⋅√min(m,n)−k]12q+1σk+1(X),

with , the expectation operator over the probability measure of and the -th element of the decreasing sequence of singular values of . Here it is assumed that . Thus, both oversampling and the computation of additional power iterations drive the approximation error down.

#### 3.1.6 Computational Considerations

The steps involved in computing the approximate basis and the low-dimensional matrix are simple to implement, and embarrassingly parallelizable. Thus, randomized algorithms can benefit from modern computational architectures, and they are particularly suitable for GPU-accelerated computing. Another favorable computational aspect is that only two passes over the input matrix are required in order to obtain the low-dimensional matrix. Pass-efficiency is a crucial aspect when dealing with massive data matrices which are too large to fit into fast memory, since reading data from the hard disk is prohibitively slow and often constitutes the actual bottleneck.

### 3.2 Blocked Randomized Algorithm

When dealing with massive fluid flows that are too large to read into fast memory, the extension to sequential, distributed, and parallel computing might be inevitable [39]. In particular, it might be necessary to distribute the data across processors which have no access to a shared memory to exchange information. To address this limitation, Martinsson and Voronin [32] proposed a blocked scheme to compute the QB decomposition on smaller blocks of the data. The basic idea is that a given high-dimensional sequence of snapshots is subdivided into smaller blocks along the rows. The submatrices can then be processed in independent streams of calculations. Here,

is assumed to be a power of two, and zero padding can be used in order to divide the data into blocks of the same size. This scheme constructs the smaller matrix

in a hierarchical fashion, for more details see [48].

In the following, we describe a modified and simple scheme that is well-suited for practical applications such as computing the dynamic mode decomposition for large-scale fluid flows. Unlike [32, 48], which discuss a fixed-precision approximation, we focus on a fixed-rank approximation problem. Further, our motivation is that it is often unnecessary to use the full hierarchical scheme if the desired target rank is relatively small. By small we mean that we can fit a matrix of dimension into fast memory. To be more precise, suppose again that we are given an input matrix with rows and columns. We partition along the rows into blocks of dimension . To illustrate the scheme we set so that

 X=⎡⎢ ⎢ ⎢⎣X1X2X3X4⎤⎥ ⎥ ⎥⎦. (23)

Next, we approximate each block by a fixed low-rank approximation, using the QB decomposition described in Algorithm 1, which yields

 X≈⎡⎢ ⎢ ⎢ ⎢⎣Q1B1Q2B2Q3B3Q4B4⎤⎥ ⎥ ⎥ ⎥⎦=diag(Q1,Q2,Q3,Q4)⎡⎢ ⎢ ⎢⎣B1B2B3B4⎤⎥ ⎥ ⎥⎦. (24)

We can then collect all matrices and stack them together as

 K=⎡⎢ ⎢ ⎢⎣B1B2B3B4⎤⎥ ⎥ ⎥⎦. (25)

Subsequently, we compute the QB decomposition of and obtain

 K≈ˆQB. (26)

The small matrix can then be used to compute the randomized dynamic mode decomposition as described below. The basis matrix can be formed as

 Q=diag(Q1,Q2,Q3,Q4)ˆQ. (27)

In practice, we choose the target-rank for the approximation slightly larger than , i.e., in order to provide a better approximation for the range of the input matrix.

## 4 Randomized Dynamic Mode Decomposition

In many cases, even with increased measurement resolution, the data may have dominant coherent structures that define a low-dimensional attractor [24]; in fact, the presence of these structures was the original motivation for methods such as DMD. Hence, it seems natural to use randomized methods for linear algebra to accelerate the computation of the approximate low-rank DMD.

### 4.1 Problem Formulation Revisited

We start our discussion by formulating the following least-squares problem to find an estimate for the projected linear map

 ˆ˜A:=argmin˜A∥XR−P˜AP−1XL∥2F, (28)

with . Recall that , where denotes the desired target rank, and is the oversampling parameter.

Alternatively, we can formulate the problem in terms of the projected snapshot sequences and :

 ˆ˜A:=argmin˜A∥P−1XR−˜AP−1XL∥2F=argminAB∥BR−ABBL∥2F, (29)

where . Here we make the assumption that .

The question remains how to construct in order to quickly compute and . The dominant left singular vectors of provide a good choice; however, the computational demands to compute the deterministic SVD may be prohibitive for large data sets. Next, we discuss randomized methods to compute DMD.

### 4.2 Compressed Dynamic Mode Decomposition

Brunton et al. [5] proposed one of the first algorithms using the idea of matrix sketching to find small matrices and . The algorithm proceeds by forming a small number of random linear combinations of the rows of the left and right snapshot matrices to form the following representation

 BL=SXL,BR=SXR. (30)

with a random test matrix. As discussed in Section 3,

can be constructed by drawing its entries from the standard normal distribution. While using a Gaussian random test matrix has beneficial theoretical properties, the dense matrix multiplication

can be expensive for large dense matrices. Alternatively, can be chosen to be a random and rescaled subset

of the identity matrix, i.e.,

is formed by sampling and rescaling rows from with probability . Thus, is very sparse and is not required to be explicitly constructed and stored. More concretely, we sample the th row of with probability and, if sampled, the row is rescaled by the factor

in order to yield an unbiased estimator

[9]. A naive approach is to sample rows with uniform probability . Uniform random sampling may work if the information is evenly distributed across the input data, so that dominant structures are not spatially localized. Indeed, it was demonstrated in [5] that the dominant DMD modes and eigenvalues for some low-rank fluid flows can be obtained from a massively under-sampled sketch. However, the variance can be large and the performance may be poor in the presence of white noise. While not explored by [5], the performance can be improved using more sophisticated sampling techniques, such as leverage score sampling [7, 27, 9]. Better performance is expected when using structured random test matrices such as the subsampled randomized Hadamard transform (SRHT) [38, 45] or the CountSketch [49]. Both of these methods enable efficient matrix multiplications, yet they are more computationally demanding than random sampling.

The shortcoming of the algorithm in [5] is that depends on the ambient dimension of the measurement space of the input matrix. Thus, even if the data matrix is low-rank, a large may be required to compute the approximate DMD. Next, we discuss a randomized algorithm which depends on the intrinsic rank of the input matrix.

### 4.3 An Improved Randomized Scheme

In the following, we present a novel algorithm to compute the dynamic mode decomposition using the probabilistic framework above. The proposed algorithm is simple to implement and can fully benefit from modern computational architectures, e.g., parallelization and multi-threading.

Given a sequence of snapshots , we first compute the near-optimal basis . Then, we project the data onto the low-dimensional space, so that we obtain the low-dimensional sequence of snapshots

 b0,b1,...,bm:=Q∗x0,Q∗x1,...,Q∗xm∈Rl.

Next, we separate the sequence into two overlapping matrices and

 (31)

This leads to the following projected least-squares problem

 (32)

Using the pseudoinverse, the estimator for the linear map is defined as

 ˆAB:=BRB†L=BRVΣ−1˜U∗, (33)

where and are the truncated left and right singular vectors of . The diagonal matrix contains the corresponding singular values. If , then , and no truncation is needed. Next, is projected onto the left singular vectors equationparentequation

 ˜AB =˜U∗ˆAB˜U (34a) =˜U∗BRVΣ−1. (34b)

The DMD modes, containing the spatial information, are then obtained by computing the eigendecomposition of

 ˜AB˜WB=˜WBΛ, (35)

where the columns of are eigenvectors , and is a diagonal matrix containing the corresponding eigenvalues . The high-dimensional DMD modes may be recovered as

 W=QBRVΣ−1˜WB. (36)

The computational steps for a practical implementation are sketched in Algorithm 2.

#### 4.3.1 Derivation

In the following we outline the derivation of Eq. (36). Recalling Eq. (7), the estimated transfer operator is defined as

 ˆA:=XRX†L. (37)

Random projections of the data matrix result in an approximate orthonormal basis for the range of , as described in Sec. 3. The sampling strategy is assumed to be efficient enough so that and . Equation (37) then becomes

 ˆA≈(QQ∗XR)(QQ∗XL)†. (38)

Letting and , the projected transfer operator estimate is defined as in Eq. (33), . Substituting this into Eq. (38) leads to

 ˆA≈QˆABQ∗. (39)

Letting be the SVD of , and left- and right-projecting onto the dominant left singular vectors , yields

 ˜AB:=˜U∗ˆAB˜U. (40)

The eigendecomposition is given by , as in Eq. (35).

Let . It is simple to verify that this is an eigenvector of :

 ˆABˆWB = BRB†LBRVΣ−1˜WB, = BRVΣ−1˜ABˆWB, = BRVΣ−1˜WBΛ, = ˆWBΛ.

Substituting the eigendecomposition of in Eq. (39) and right-multiplying by leads to

 ˆAQˆWB≈QˆWBΛ, (41)

so that identification with verifies the claim in Eq. (36) on the eigendecomposition of :

 W≈QBRVΣ−1˜WB,˜Λ≈Λ.

## 5 Numerical Results

In the following we present numerical results demonstrating the computational performance of the randomized dynamic mode decomposition (rDMD). All computations are performed on a laptop with Intel Core i7-7700K CPU Quad-Core 4.20GHz and 32GB fast memory. The underlying numerical linear algebra routines are accelerated using the Intel Math Kernel Library (MKL).

### 5.1 Fluid Flow Behind a Cylinder

As a canonical example, we consider the fluid flow behind a cylinder at Reynolds number . The data consist of a sequence of snapshots of fluid vorticity fields on a grid111Data for this example may be downloaded at dmdbook.com/DATA.zip., computed using the immersed boundary projection method [43, 6]. The flow features a periodically shedding wake structure and the resulting dataset is low-rank. While this data set poses no computational challenge, it demonstrates the accuracy and quality of the randomized approximation on an example that builds intuition. Flattening and concatenating the snapshots horizontally yields a matrix of dimension , i.e., the columns are the flattened snapshots .

We compute the low-rank DMD approximation using as the desired target rank. Figure 3(a) shows the DMD eigenvalues. The proposed randomized DMD algorithm (with and ), and the compressed DMD algorithm (with ) faithfully capture the eigenvalues. Overall, the randomized algorithm leads to a fold speedup compared to the deterministic DMD algorithm. Further, if the singular value spectrum is slowly decaying, the approximation accuracy can be improved by computing additional power iterations . To further contextualize the results, Fig. 5 shows the leading six DMD modes in absence of noise. The randomized algorithm faithfully reveals the coherent structures, while requiring considerably less computational resources.

Next, the analysis is repeated in presence of additive white noise with a signal-to-noise (SNR) ratio of . Figure 3(b) shows distinct performance of the different algorithms. The deterministic algorithm performs most accurate, capturing the first eleven eigenvalues. The randomized DMD algorithm reliably captures the first nine eigenvalues, while the compressed algorithm only accurately captures seven of the eigenvalues. This is, despite the fact that the compressed DMD algorithm uses a large sketched snapshot sequence of dimension , i.e., randomly selected rows out of the rows of the flow dataset are used. For comparison the randomized DMD algorithm provides a more satisfactory approximation using and additional power iterations, i.e., the sketched snapshot sequence is only of dimension . The results show that the randomized DMD algorithm yields a more accurate approximation, while allowing for higher compression rates. This is because the approximation quality of the randomized DMD algorithm depends on the intrinsic rank of the data and not on the ambient dimensions on the measurement space.

### 5.2 Sea Surface Data

We now compute the dynamic mode decomposition on the high-resolution sea surface temperature (SST) data. The SST data are widely studied in climate science for climate monitoring and prediction, providing an improved understanding of the interactions between the ocean and the atmosphere [35, 34, 41]

. Specifically, the daily SST measurements are constructed by combining infrared satellite data with observations provided by ships and buoys. In order to account and compensate for platform differences and sensor errors, a bias-adjusted methodology is used to combine the measurements from the different sources. Finally, the spatially complete SST map is produced via interpolation. A comprehensive discussion of the data is provided in

[34].

The data are provided by the National Oceanic and Atmospheric Administration (NOAA) via their web site at https://www.esrl.noaa.gov/psd/. Data are available for the years from to with a temporal resolution of 1 day and a spatial grid resolution of . In total, the data consists of temporal snapshots which measure the daily temperature at spatial grid points. Since we omit data over land, the ambient dimension reduces to spatial measurements in our analysis. Concatenating the reduced data yield a GB data matrix of dimension , which is sufficiently large to test scaling.

#### 5.2.1 Aggregated Data

The full data set outstrips the available fast memory required to compute the deterministic DMD. Thus, we perform the analysis on an aggregated data set first, in order to compare the randomized and deterministic DMD algorithms. Therefore, we compute the weekly mean temperature, which reduces the number of temporal snapshots to . Figure 5(c) shows the corresponding eigenvalues. Unlike the eigenvalues obtained via the compressed DMD algorithm, the randomized DMD algorithm provides an accurate approximation for the dominant eigenvalues. Next, Fig. 5(b) and 5(b) show the extracted DMD modes. Indeed, the randomized modes faithfully capture the dominant coherent structure in the data.

The computational performance is contextualized in Figure 7, which shows the computational times and the relative error for varying target ranks. Here, we show the performance of the randomized DMD algorithm using the standard QB scheme and the blocked QB scheme. We see that splitting the input matrix into blocks slightly boosts the computational performance, while maintaining the same accuracy.

Often the memory requirements are more important than the absolute computational time. Figure 8 shows a profile of the memory usage vs runtime. The blocked randomized DMD algorithm requires only a fraction of the memory, compared to the deterministic algorithm, to compute the approximate low-rank dynamic mode decomposition. This can be crucial to carry-out the computations on mobile platforms or to scale the computations to very large applications.

#### 5.2.2 Full Data

The blocked randomized DMD algorithm allows us to extract the DMD modes from the high-dimensional data set. While the full data-set does not fit into fast memory, it is only required that we can access some of the rows in a sequential manner. Computing the approximation using blocks takes about seconds. The resulting modes are shown in Fig. 9. Here, the leading modes are similar to the modes extracted from the aggregated data set. However, subsequent modes may provide further insights which are not revealed by the aggregated data set.

## 6 Conclusion

Randomness as a computational strategy has recently been shown capable of efficiently solving many standard problems in linear algebra. The need for highly efficient algorithms becomes increasingly important in the area of ‘big data’. Here, we have proposed a novel randomized algorithm for computing the low-rank dynamic mode decomposition. Specifically, we have shown that DMD can be embedded in the probabilistic framework formulated by [19]. This framework not only enables computations at scales far larger than what was previously possible, but it is also modular, flexible, and the error can be controlled. Hence, it can be also utilized as a framework to embed other innovations around the dynamic mode decomposition, for instance, see [26] for an overview.

The numerical results show that randomized dynamic mode decomposition (rDMD) has computational advantages over previously suggested probabilistic algorithms for computing the dominant dynamic modes and eigenvalues. More importantly, we showed that the algorithm can be executed using a blocked scheme which is memory efficient. This aspect is crucial in order to efficiently deal with massive data which are too big to fit into fast memory. Thus, we believe that the randomized DMD framework will provide a powerful and scalable architecture for extracting dominant spatiotemporal coherent structures and dynamics from increasingly large-scale data, for example from epidemiology, neuroscience, and fluid mechanics.

## Acknowledgments

The authors would like to acknowledge generous funding support from the Army Research Office (ARO W911NF-17-1-0118), Air Force Office of Scientific Research (AFOSR FA9550-16-1-0650), and Defense Advanced Research Projects Agency (DARPA- PA-18-01-FP-125). LM further acknowledges the support of the French Agence Nationale pour la Recherche (ANR) and Direction Générale de l’Armement (DGA) via the FlowCon project (ANR-17-ASTR-0022).

## References

• [1] G. Berkooz, P. Holmes, and J. L. Lumley, The proper orthogonal decomposition in the analysis of turbulent flows, Annual Review of Fluid Mechanics, 23 (1993), pp. 539–575.
• [2] D. Bistrian and I. Navon, Randomized dynamic mode decomposition for non-intrusive reduced order modelling, International Journal for Numerical Methods in Engineering, (2016), pp. 1–22, https://doi.org/10.1002/nme.5499.
• [3] S. L. Brunton and J. N. Kutz,

Data-Driven Science and Engineering: Machine Learning, Dynamical Systems, and Control

, Cambridge University Press, 2018.
• [4] S. L. Brunton and B. R. Noack, Closed-loop turbulence control: Progress and challenges, Applied Mechanics Reviews, 67 (2015), pp. 050801–1–050801–48.
• [5] S. L. Brunton, J. L. Proctor, J. H. Tu, and J. N. Kutz, Compressed sensing and dynamic mode decomposition, Journal of Computational Dynamics, 2 (2015), pp. 165–191.
• [6] T. Colonius and K. Taira, A fast immersed boundary method using a nullspace approach and multi-domain far-field boundary conditions, Comp. Meth. App. Mech. Eng., 197 (2008), pp. 2131–2146.
• [7] P. Drineas, M. Magdon-Ismail, M. W. Mahoney, and D. P. Woodruff, Fast approximation of matrix coherence and statistical leverage, Journal of Machine Learning Research, 13 (2012), pp. 3475–3506.
• [8] P. Drineas and M. W. Mahoney, Randnla: Randomized numerical linear algebra, Communications of the ACM, 59 (2016), pp. 80–90, https://doi.org/10.1145/2842602.
• [9] P. Drineas and M. W. Mahoney, Lectures on randomized numerical linear algebra, arXiv preprint arXiv:1712.08880, (2017).
• [10] Z. Drmac̆ and S. Gugercin, A new selection operator for the discrete empirical interpolation method—improved a priori error bound and extensions, SIAM J. Sci. Comput., 38 (2016), pp. A631–A648.
• [11] N. B. Erichson, S. L. Brunton, and J. N. Kutz, Compressed dynamic mode decomposition for background modeling, Journal of Real-Time Image Processing, (2016), pp. 1–14, https://doi.org/10.1007/s11554-016-0655-2.
• [12] N. B. Erichson and C. Donovan, Randomized low-rank dynamic mode decomposition for motion detection

, Computer Vision and Image Understanding, 146 (2016), pp. 40–50,

https://doi.org/10.1016/j.cviu.2016.02.005.
• [13] J. E. Fowler,

Compressive-projection principal component analysis

, IEEE Transactions on Image Processing, 18 (2009), pp. 2230–2242, https://doi.org/10.1109/TIP.2009.2025089.
• [14] A. Frieze, R. Kannan, and S. Vempala, Fast Monte-Carlo algorithms for finding low-rank approximations, Journal of the ACM, 51 (2004), pp. 1025–1041.
• [15] M. Gavish and D. L. Donoho, The optimal hard threshold for singular values is , IEEE Transactions on Information Theory, 60 (2014), pp. 5040–5053.
• [16] I. F. Gorodnitsky and B. D. Rao, Analysis of error produced by truncated SVD and Tikhonov regularization methods, in Proceedings of 1994 28th Asilomar Conference on Signals, Systems and Computers, IEEE, 1994, pp. 25–29.
• [17] M. Gu, Subspace iteration randomization and singular value problems, SIAM Journal on Scientific Computing, 37 (2015), pp. A1139–A1173, https://doi.org/10.1137/130938700.
• [18] F. Guéniat, L. Mathelin, and L. Pastur, A dynamic mode decomposition approach for large and arbitrarily sampled systems, Physics of Fluids, 27 (2015), p. 025113.
• [19] N. Halko, P. G. Martinsson, and J. A. Tropp, Finding structure with randomness: Probabilistic algorithms for constructing approximate matrix decompositions, SIAM Review, 53 (2011), pp. 217–288, https://doi.org/10.1137/090771806.
• [20] P. C. Hansen, The truncatedsvd as a method for regularization, BIT Numerical Mathematics, 27 (1987), pp. 534–553.
• [21] P. C. Hansen, Truncated singular value decomposition solutions to discrete ill-posed problems with ill-determined numerical rank, SIAM Journal on Scientific and Statistical Computing, 11 (1990), pp. 503–518.
• [22] P. C. Hansen, J. G. Nagy, and D. P. O’leary, Deblurring images: matrices, spectra, and filtering, vol. 3, Siam, 2006.
• [23] P. C. Hansen, T. Sekii, and H. Shibahashi, The modified truncated svd method for regularization in general form, SIAM Journal on Scientific and Statistical Computing, 13 (1992), pp. 1142–1150.
• [24] P. J. Holmes, J. L. Lumley, G. Berkooz, and C. W. Rowley, Turbulence, coherent structures, dynamical systems and symmetry, Cambridge Monographs in Mechanics, Cambridge University Press, Cambridge, England, 2nd ed., 2012.
• [25] W. B. Johnson and J. Lindenstrauss, Extensions of Lipschitz mappings into a Hilbert space, Contemporary mathematics, 26 (1984), pp. 189–206, https://doi.org/10.1090/conm/026/737400.
• [26] J. N. Kutz, S. L. Brunton, B. W. Brunton, and J. L. Proctor, Dynamic Mode Decomposition: Data-Driven Modeling of Complex Systems, SIAM, 2016.
• [27] P. Ma, M. W. Mahoney, and B. Yu, A statistical perspective on algorithmic leveraging, The Journal of Machine Learning Research, 16 (2015), pp. 861–911.
• [28] M. W. Mahoney, Randomized algorithms for matrices and data, Foundations and Trends in Machine Learning, 3 (2011), pp. 123–224, https://doi.org/10.1561/2200000035.
• [29] K. Manohar, B. W. Brunton, J. N. Kutz, and S. L. Brunton, Data-driven sparse sensor placement, IEEE Control Systems Magazine, 38 (2018), pp. 63–86.
• [30] K. Manohar, E. Kaiser, S. L. Brunton, and J. N. Kutz, Optimized sampling for multiscale dynamics, arXiv preprint arXiv:1712.05085, (2017).
• [31] P.-G. Martinsson, Randomized methods for matrix computations, arXiv preprint arXiv:1607.01649, (2016).
• [32] P.-G. Martinsson and S. Voronin, A randomized blocked algorithm for efficiently computing rank-revealing factorizations of matrices, SIAM Journal on Scientific Computing, 38 (2016), pp. S485–S507.
• [33] D. L. Phillips, A technique for the numerical solution of certain integral equations of the first kind, Journal of the ACM (JACM), 9 (1962), pp. 84–97.
• [34] R. W. Reynolds, N. A. Rayner, T. M. Smith, D. C. Stokes, and W. Wang, An improved in situ and satellite SST analysis for climate, Journal of climate, 15 (2002), pp. 1609–1625.
• [35] R. W. Reynolds, T. M. Smith, C. Liu, D. B. Chelton, K. S. Casey, and M. G. Schlax, Daily high-resolution-blended analyses for sea surface temperature, Journal of Climate, 20 (2007), pp. 5473–5496.
• [36] V. Rokhlin, A. Szlam, and M. Tygert, A randomized algorithm for principal component analysis, SIAM Journal on Matrix Analysis and Applications, 31 (2009), pp. 1100–1124.
• [37] C. W. Rowley, I. Mezić, S. Bagheri, P. Schlatter, and D. Henningson, Spectral analysis of nonlinear flows, J. Fluid Mech., 645 (2009), pp. 115–127.
• [38] T. Sarlos, Improved approximation algorithms for large matrices via random projections, in Foundations of Computer Science. 47th Annual IEEE Symposium on, 2006, pp. 143–152.
• [39] T. Sayadi and P. J. Schmid, Parallel data-driven decomposition algorithm for large-scale datasets: with application to transitional boundary layers, Theoretical and Computational Fluid Dynamics, (2016), pp. 1–14.
• [40] P. J. Schmid, Dynamic mode decomposition of numerical and experimental data, Journal of Fluid Mechanics, 656 (2010), pp. 5–28.
• [41] T. M. Smith and R. W. Reynolds, A global merged land–air–sea surface temperature reconstruction based on historical observations (1880–1997), Journal of Climate, 18 (2005), pp. 2021–2036.
• [42] K. Taira, S. L. Brunton, S. Dawson, C. W. Rowley, T. Colonius, B. J. McKeon, O. T. Schmidt, S. Gordeyev, V. Theofilis, and L. S. Ukeiley, Modal analysis of fluid flows: An overview, AIAA Journal, 55 (2017), pp. 4013–4041.
• [43] K. Taira and T. Colonius, The immersed boundary method: A projection approach, J. Comp. Phys., 225 (2007), pp. 2118–2137.
• [44] A. N. Tihonov, Solution of incorrectly formulated problems and the regularization method, Soviet Math., 4 (1963), pp. 1035–1038.
• [45] J. A. Tropp, Improved analysis of the subsampled randomized hadamard transform, Advances in Adaptive Data Analysis, 3 (2011), pp. 115–126.
• [46] J. H. Tu, C. W. Rowley, D. M. Luchtenburg, S. L. Brunton, and J. N. Kutz, On dynamic mode decomposition: theory and applications, Journal of Computational Dynamics, 1 (2014), pp. 391–421.
• [47] J. Varah, A practical examination of some numerical methods for linear discrete ill-posed problems, Siam Review, 21 (1979), pp. 100–111.
• [48] S. Voronin and P.-G. Martinsson, RSVDPACK: Subroutines for computing partial singular value decompositions via randomized sampling on single core, multi core, and GPU architectures, 2015, https://arxiv.org/abs/1502.05366.
• [49] D. P. Woodruff et al., Sketching as a tool for numerical linear algebra, Foundations and Trends in Theoretical Computer Science, 10 (2014), pp. 1–157.