1 Introduction
In various applications, it can be difficult (sometimes even impossible) to derive models from first principles. However, the inputoutput response of a system and data of the inner state may still be available; we refer to such a setup as a graybox. For example, a natural process whose underlying action is not well understood can be considered a graybox since we may only be able to measure its behavior. In applications, such as manufacturing and design, it may be necessary to model a thirdparty provided subcomponent whose exact or full specifications may not be obtainable due to it containing proprietary information. In order to gain insight into such natural or technical processes, and derive models that simulate and/or predict their behaviors, it is often beneficial and perhaps necessary to create models using measured or generated data. The discipline of system identification investigates methods for the task of obtaining such models from data. One class of methods for datadriven identification is dynamic mode decomposition (DMD), which also provides a modal analysis of the resulting systems. In this work, we investigate variants of DMD for the class of inputoutput systems and compare data sampling strategies.
DMD has its roots in the modal decomposition of the Koopman operator [15, 26], which has been recently rediscovered for the spectral analysis of fluid dynamics [19]. The basic DMD method was introduced in [27], and various extensions have been added, such as Exact DMD [28] or Extended DMD [9]. Furthermore, DMD can also be used as a tool for model order reduction: [25] proposed using DMD for flow analysis and control while DMD has also been combined with Galerkinprojection techniques to model nonlinear systems [1]. For a comprehensive survey of DMD and its variants, see [16].
Our work here builds upon two specific variants of DMD. The first is Dynamic Mode Decomposition with Control (DMDc) [23], and thus by relation, also Koopman with Inputs and Control (KIC) [24]. The second is InputOutput Dynamic Mode Decomposition (ioDMD) [3, 4], which itself is closely related to the Direct Numerical Algorithm for SubSpace State System Identification (N4SID) [31]. DMDc extends DMD to scenarios where the input of the discrete system approximation is given by a functional while ioDMD additionally handles the case when the system’s output is specified and also a functional.
To generically identify a system without prior knowledge on the relevant input functions, techniques such as persistent excitation [6] have been well known for some time now. We propose an extension to the ioDMD method of [3] with a new excitation variant related to the cross Gramian matrix [11]. Additionally, since DMDbased identification does not guarantee that its resulting models are stable, we propose a postprocessing procedure to stabilize ioDMDderived models, by employing the nonsmooth constrained optimization method of [10] to solve a corresponding stabilization problem.
This document is structured as follows. In Section 2, an overview of the prerequisite dynamic mode decomposition method and its relevant variants is given. Section 3 describes the new excitation strategy while our optimizationbased stabilization procedure is discussed in Section 4. Finally, two numerical experiments are conducted in Section 5.
2 Dynamic Mode Decomposition
Consider the timeinvariant ordinary differential equation (ODE):
(1) 
with state
and vector field
. Furthermore, consider for now that (1) is sampled at uniform intervals for times . The most basic version of DMD [27, 16] aims to approximate (1) by constructing a discretetime linear system(2) 
with a linear operator , such that if , then should also hold, for all .
Starting at an initial state , the sequence defined by (2) corresponds to a trajectory of the state vectors . The corresponding data matrix is simply the inorder concatenation of these state vectors, that is,
By forming the following two partial trajectories:
where is just the data matrix with its last data point removed while is just with its first data point removed, the idea behind (plain) DMD [27] is to then solve the linear system of equations:
which is in fact just (2), where and have been replaced by and , respectively. A bestfit solution to this problem is given by the pseudoinverse:
which is also the solution to minimizing the leastsquares error in the Frobenius norm ():
The DMD modes of (1
) are given by the eigenvectors of the matrix
:Beyond just using a single uniformlysampled time series, DMD has also been generalized to a method called Exact DMD [28], which can additionally support the concatenation of multiple different time series and/or nonuniform time steppings. This generalization of DMD is made possible by reducing the requirements of and to the lesser condition that their columns need only be composed in pairs of data such that is fulfilled.
2.1 Practical Computation
We now give a highlevel algorithmic description of DMD identification. The pseudoinverse of the data matrix can be computed using the (rankrevealing) singular value decomposition (SVD),
, as follows:However, inverting tiny but nonzero singular values in the computed diagonal matrix poses a numerical problem, as these small singular values may be inaccurate. Applying could overamplify components of , in particular, the less important ones. To counteract this effect, computing the pseudoinverse via the SVD is done in practice by truncating any singular values that are smaller than some fixed , which is equivalent to adding a regularization term to the leastsquares solution for :
for some value of the regularization parameter . Following [28], the algorithm for the DMDbased computation of the system matrix , given a statespace trajectory and a lower bound for discarding tiny singular values, is as follows:

Sample the statespace trajectory and form data matrix .

Assemble the shifted partitions and .

Compute the SVD of .

Truncate : if ; 0 otherwise.

Identify the approximate discrete system matrix:
In this DMD variant, the order (dimension) of the computed matrix is equal to the number of retained (nontruncated) singular values, but this truncation is done solely for numerical accuracy; the intention is to keep as many components of the dynamics as possible. In contrast, model order reduction typically aims to significantly reduce the system down to just a small set of essential dynamics, and accomplishing this goal will be the focus of Section 2.4.
2.2 Dynamic Mode Decomposition with Control
Considering systems whose vector field depends not just on the state but also on an input function :
(3) 
has led to a DMD variant called Dynamic Mode Decomposition with Control (DMDc) [23]. We focus on a specific DMDc variant [23, Sec. 3.3] that aims to approximate (3) by a linear discretetime control system:
(4) 
where (called the input operator) must also be identified in addition to and input is a discretelysampled version of the continuous input function for some sampling times given by . In addition to forming and as in plain DMD (Section 2), an analogous construction of matrices for input data is also done. An inorder concatenation of the series of input data vectors is done to obtain matrix while the partial data matrix is simply without its last column (the last input sample):
Then, and can be obtained by computing the approximate solutions to the linear system of equations given by:
which is (4) with replaced by , by , and by , and has solution:
As mentioned in Section 1, DMDc is actually a special case of the KIC method [24]. For KIC, the state of the system is also augmented with the discretized input , which leads the resulting augmented system to have additional operators:
where and . Of course, these two additional operators must now also be identified along with matrices and . However, if one assumes that the input is known and no identification of the associated operators is required, then and are just zero matrices, and it is thus clear that KIC is a generalization of DMDc.
2.3 InputOutput Dynamic Mode Decomposition
Extending the underlying system once more to now also include an output function , with an associated output functional that also depends on the state and the input , yields the following system:
(5) 
Databased modeling of systems of the form given by (5) has given rise to a class of DMD methods called InputOutput Dynamic Mode Decomposition (ioDMD) [3]. Similar to previously discussed DMD variants, the ioDMD method also approximates the original system by a discretetime linear system, but now with input and output:
(6) 
where and are the output and feedthrough operators, respectively. Since this approximation includes output data, the discrete output instances are also correspondingly arranged into a matrix by inorder concatenation while simply omits the last column (output sample) of :
Matrices , , , and are then approximated by solving:
(7) 
which is (6) but where , , , and have been replaced by , , and , respectively, and has the solution:
(8) 
This procedure is equivalent to the Direct N4SID algorithm [30, 14, Ch. 6.6] mentioned in Section 1.
Note that all the DMD variants discussed so far identify the original continuoustime systems by linear discretetime models. However, one can create a continuoustime model given by , that is a firstorder approximation to the discretetime model obtained by ioDMD, using for example, the standard firstorder Euler method:
The output and feedthrough operators for the continuoustime model remain the same as in the discretetime model produced by ioDMD, that is, , . Finally, it is important to note that (io)DMD derived models are generally only valid for the time horizon over which the data has been gathered.
2.4 Reduced Order DMD
To accelerate the computation of ioDMDderived models, we follow [7, 4] and reduce the order of the possibly highdimensional state trajectories using a projectionbased approach. The data matrices and are compressed using a truncated (Galerkin) projection , , :
The order of the identified system is thus determined by the rank of the projection.
A popular method to compute such a truncating projection is proper orthogonal decomposition (POD) [12], which is practically obtained as the left singular vectors (POD modes) of the data matrix :
The number of resulting computed POD modes depends on the desired projection error , which is a consequence of the SchmidtEckhardYoungMirsky Lemma (see for example [5]).
Note again that this data compression scheme has a very different motivation compared to that of the aforementioned dimensionalityreduction done when computing the pseudoinverse via the truncated SVD (discussed in Section 2.1). The latter truncation, based on the magnitude of the singular values, is done for reasons of numerical accuracy when computing the pseudoinverse (and applying it in subsequent computations). The projectionbased truncation discussed here, using the sum of the singular values squared, allows for the possibility of a much less onerous computational burden, as the state space of the models can often be greatly reduced by discarding all but a handful of essential modes in order to obtain a desired approximation error. Other projectionbased datadriven model reduction techniques for the compression of the state trajectory are similarly applicable, for example empirical balanced truncation [17].
3 Training Data and Generic Identification
DMD is a datadriven method, hence, the source of the data used for the system identification needs to be considered. Usually it is possible to identify an inputoutput system (for provided discrete input, state, and output functions) so that the identified system produces similar outputs given the input used for the identification. To identify a model from data, the associated system needs to produce outputs approximating the data source, not only for specific data sets, but for a whole class of relevant input functions and perhaps even arbitrarily admissible ones. For such generic linear system identification, the matrices , , , have to be computed in such a manner that (a) does not require a specific data set and (b) allows behaviors of the system being modeled to be predicted for a given initial condition and input function. This can be accomplished, for example, by persistent excitation (PE) [6, 23], which utilizes signals such as step functions or Gaussian noise as training input data. Here, we propose a related approach that also exploits random (Gaussian) sampling, yet is based on the cross operator.
3.1 Cross Excitation
The cross operator [13] is a tool for balancingtype model reduction and encodes the inputoutput coherence of an associated system, which for linear timeinvariant systems is the socalled cross Gramian matrix [11]. This operator is defined as the composition of the controllability operator with the observability operator :
Thus, for a square system with the same input and output space dimension, maps a given initial state via the observability operator to an output function, and is in turn passed to the controllability operator as an input function resulting in a final state^{1}^{1}1This is also illustrated in [13, Fig. 1]:
To generate trajectory data, we modify the cross operator by replacing the controllability operator with the inputtostate map [5, Ch. 4.3]. This yields an operator :
which maps, as before, an initial state to an output function, but then maps this output (as an input) function to a state trajectory (instead of to a final state). Compared to PE, the cross(operatorbased) excitation (CE) is a twostage procedure using perturbations of the initial state to generate the excitation, as opposed to perturbing the input directly.
The cross excitation is related to indirect identification of closedloop systems [29, 22], which is also a twostage process. First, an intermediary system (openloop) system is identified, which then is used in the second step to generate a signal that acts as an excitation for the identification of the actual closedloop system. A distinct difference of CE compared to the indirect closedloop identification approach is that the latter exclusively uses inputoutput data while the former also uses state trajectory data in addition to the inputoutput data.
4 Stabilization
As DMD and its variants are timedomain methods (ioDMD included), it is typically desired to preserve stability in the (reduced) identified discretetime systems. However, models derived by ioDMD are not guaranteed to be stable. To enforce stability, an additional postprocessing step is required. For example, [2] proposed stabilizing models derived using PetrovGalerkin projections by solving a sequence of semidefinite programs. In this paper, we take a much more direct approach.
A square matrix is discretetime stable if its spectral radius is less than one, that is, , where
Although the spectral radius is nonconvex, it is a continuous function with respect to the matrix and furthermore, it is continuously differentiable almost everywhere
(in the mathematical sense). In other words, the set of points where the spectral radius is nonsmooth only has measure 0 and so it holds that points chosen randomly will, with probability 1, be outside of this set. Hence, despite the nonsmoothness of the spectral radius, it is still possible to attain a wealth of information from its gradient, since it is defined on all but a subset of measure 0 in the full space. Thus if the matrix
from the ioDMDderived model, that is, from (8), is not stable, we could consider employing a gradientbased optimization method to stabilize it, while hopefully ensuring that the resulting stabilized version of the ioDMD solution still models the original largescale system. In order to meet these two objectives, we consider solving the following constrained optimization problem:(9) 
where , , , , and is a margin for the stability tolerance. As the unstabilized ioDMD model is already a solution to (7), solving (9) should promote solutions that are still close to the original ioDMD model while simultaneously enforcing the requirement that these models must now be stable, due to the presence of the stability radius in the inequality constraint. Furthermore, the unstabilized ioDMD model should make a good point to start the optimization method at.
There are however some difficulties with solving (9) iteratively via optimization techniques. The first is that the objective function of (9) is typically underdetermined in DMD settings, which can adversely impact a method’s usual rate of convergence, as minimizers are no longer locally unique. However, as our goal is mainly to stabilize the ioDMD model, without changing its other properties too much, we do not necessarily need to solve (9) exactly. A few iterations may be all that is needed to accomplish this task.
As an alternative, one could consider solving
in lieu of (9), where , , , and are the matrices produced by ioDMD. On the upside, this helps to avoid the problem of underdeterminedness arising in (9) and encourages that a stable solution close to the original ioDMDderived system is found. However, this modified objective no longer takes any observed data into account. We did evaluate this alternate optimization problem in our experiments, but the performance of the models it produced was sometimes worse. As such, we will only report results for our experiments done using (9) in Section 5.
The second issue for trying to solve (9
) is that the spectral radius can be a rather difficult function to optimize, relatively speaking. First, despite being continuously differentiable almost everywhere, the spectral radius is still a nonsmooth function, specifically at matrices which have multiple eigenvalues that attain the maximum modulus, that is, the value of the spectral radius. Typically, minimizers of the spectral radius will be at such matrices (for example, see the plots of spectral configurations post optimization in
[10, Section 4.1 and Appendix 9.1]), so optimizing the spectral radius often means that an optimization method must try to converge to a nonsmooth minimizer, a difficult prospect. Worse still is that the spectral radius is also nonlocally Lipschitz at matrices where these multiple eigenvalues attaining the value of the spectral radius coincide (see [8]). Many of the available continuous optimization methods are designed under the assumption that functions they optimize are smooth or if not, at least locally Lipschitz. If the functions being optimized do not meet these criteria, these methods typically break down. Furthermore, the nonconvexity of the spectral radius means that optimization codes may get stuck in local minima that may or may not provide sufficiently acceptable solutions.Although the inclusion of the spectral radius constraint makes (9) a nonsmooth, nonconvex optimization problem, with a nonlocallyLipschitz constraint function, again we do not necessarily need to solve it exactly (though this will remain to be seen in our experiments). Furthermore, while much of the literature on nonsmooth optimization has historically focused on unconstrained problems, there has also been recent interest in addressing problems with nonsmooth constraints. For example, a new algorithm combining quasiNewton BFGS (BroydenFletcherGoldfarbShanno) updating and SQP (sequential quadratic programming)^{2}^{2}2 For a good introductory reference on many of the optimization techniques referred to in this paper, see [21]. was recently proposed for general nonsmooth, nonconvex, constrained optimization problems [10], where no special knowledge or structure is assumed about the objective or constraint functions. Particularly relevant to our approach here, this BFGSSQP method was evaluated on 100 different spectral radius constrained optimization problems, with promising results relative to other solvers [10, Section 6]. This indicates that it may also be a good solver for our nonsmooth constrained optimization problem and thus we propose using it to solve (9). Specifically, we use GRANSO: GRadientbased Algorithm NonSmooth Optimization [20], an opensource MATLAB code implementing the aforementioned BFGSSQP method of [10].
5 Numerical Results
We implemented our new ioDMD variant using both PE and CE sampling strategies (presented in Section 3.1) to collect observations of the original system’s behaviors. Furthermore, our software can also optionally stabilize the resulting ioDMDderived models by using GRANSO [10] on our associated nonsmooth constrained optimization problem.
As discussed in Section 4, (9) is an underdetermined optimization problem in DMD settings. Since such problems are in a sense quite flat, the norm of the gradient merely being small can be a poor measure of when to terminate; correspondingly, we tightened GRANSO’s default termination tolerance of to (i.e. opts.opt_tol = 1e8). Relatedly, convergence can also be slow so the choice of a starting point can also be critical. As our goal, specified by (9), is to stabilize a model while minimizing the tradeoff in any increased approximation error (that may occur due to stabilization), we simply used the unstable ioDMDderived models as starting points for GRANSO and used GRANSO’s custom termination feature to halt optimization once a model had been found that was both stable (for (9) we used ) and had an objective value that was less than 1000 times the objective function evaluated at the original ioDMDderived unstable model. We found that this loose limit on how much the objective function was allowed to increase was more than adequate to retain good output errors. We ran GRANSO using its fullmemory BFGS mode (its default behavior and the recommended choice for nonsmooth problems) and kept all other GRANSO options at their default values as well.
The numerical experiments were implemented in the Matlab language and were run under MATLAB R2017a on a workstation computer with an Intel Core i76700 (4 Cores @ 3.4 GHz) and 8 GB memory.
5.1 Excitation and Stabilization Evaluation
We demonstrate the effects of different types of excitation used for the ioDMDbased system identification by a numerical example. Specifically, for a given target data set, we identify a discretetime linear system first using the target data itself, second by persistent excitation (PE) and third by utilizing the herein proposed cross excitation (CE) from Section 3.1.
The considered inputoutput system is based on the transport equation, with the left boundary of the domain selected as input and the right boundary as output:
The partial differential equation is discretized in space using the firstorder finitedifference upwind scheme, with a spatial resolution of
and . The resulting ODE inputoutput system is then given by:(10) 
The target data is given by discrete input, state and output functions from a simulation, which is performed by a first order implicit RungeKutta method. We used a timestep width of and a time horizon of , with a Gaussianbell type input function .
For the comparison, a discretetime linear system was first identified from the snapshots generated by a simulation of (10) excited by input
, to obtain a baseline for the modeling performance of ioDMD. The PE variant was sampled using zeromean, unitcovariance Gaussian noise and a unit step function input was used for the ioDMDbased identification. Finally, our CE variant had the initial state sampled from a unit Gaussian distribution and a (componentwise) shifted initial state
was tested. All methods were tested with ioDMD only and then also with stabilization.5.1.1 ioDMD without Stabilization
Fig. 1 depicts the relative output error for simulations of systems identified using data associated to the target input , PE, and CE for increasingly accurate statespace data compression (that is, for increasingly smaller amounts of compression). The statespace data dimensionality was reduced using the POD method with prescribed projection errors of . In this set of experiments, we did not use stabilization.
In Figs. 0(b) and 0(a), the ioDMD procedure was not regularized by truncating singular values, while regularization was used in Figs. 0(d) and 0(c). In Figs. 0(c) and 0(a), system identification was performed using zeromean Gaussian noise for the PE and an initial state sampled from a zeromean Gaussian distribution for CE, respectively. In Figs. 0(d) and 0(b), respectively, the identification was driven by a step input for the PE and a shifted initial state for CE.
For this set of experiments, using the target data only produced stable systems for large values of acceptable projection error while the PEderived models were always unstable (and thus had very poor performance regardless of the projection error). In contrast, the CE method produced stable systems for all levels of projection error tested. Performancewise, when comparing to the few targetdataderived models that also happened to be stable, the CEderived models had errors that were less than one order of magnitude higher (see Figs. 0(c) and 0(a)). On the other hand, when using the step input or shifted initial state, both PE and CE produced models with increasing accuracy as the level of acceptable projection error of the data was decreased, as seen in Figs. 0(d) and 0(b). In Fig. 0(d), we see that the regularization limited the attainable accuracy for both PE and CE. The targetdataderived system had a constant error independent from the projection error of the data.
5.1.2 ioDMD with Stabilization
In this second set of experiments, Fig. 2 still shows the relative output error for simulations of systems identified using the target data, PE, and CE for increasingly accurate statespace data compression, but now the systems have been postprocessed using our optimizationbased approach to enforce stability, as necessary. The statespace data dimensionality was again reduced using the POD method, with prescribed projection errors of . The subfigures are arranged as they were in Fig. 1.
The step function PE and shifted initial state CE are unaffected by our stabilization postprocessing phase, as these systems were already stable; thus their plots are the same in Figs. 1(d) and 1(b) as they were in Figs. 0(d) and 0(b). In the case of using Gaussian noise or randomly sampled initial state (Figs. 1(c) and 1(a)), which had not yielded stable systems for the target data or PE (either with or without regularization), our optimizationbased postprocessing procedure now enforced stability.
For the ioDMDderived models that were unstable, GRANSO was able to stabilize all 24 of them. The average number of iterations needed to find the first stabilized version was 13.5 while the maximum was just 61. Furthermore, for 12 of the problems, our full termination criteria were met in less than 20 iterations while the average and maximum iteration counts over all 24 problems were respectively 84.2 and 329. This demonstrates that our optimizationbased approach is indeed able to stabilize such models reliably and efficiently. Solving (9) via GRANSO also met our secondary goal, that stabilization is achieved without deviating too significantly from the original unstable models. The largest observed relative change between an initial unstable model and its corresponding GRANSOderived stabilized version was just 1.44% while the average observed relative change was merely 0.231%; the relative differences were calculated by comparing , where the matrices are GRANSO’s computed stabilized solution to (9), to a similar of the original (unstable) ioDMDderived model.
5.1.3 Reduced Orders and Runtimes
We now compare the order of the identified systems. The order of the identified system is determined by the projection error selected for the statespace compression in the POD. For each data set (Target, Gaussian noise PE, Gaussian sample CE, Step input PE, shifted initial state CE), the resulting reduced order is plotted for the different prescribed projection errors in Fig. 2(a). As we can see, the stepinput PE and shiftedinitialstate CE method behave similar in terms of system dimension while for the Gaussiannoise PE and Gaussiansampled initialstate CE, the CE variant resulted in smaller system dimensions.
In terms of computational cost, only the Target and Gaussiannoise PE variants required stabilization and as such, it is only for these that we see increased runtimes, as shown by the red and green plots (respectively) in Fig. 2(b). Otherwise, the runtimes were mostly identical for the other variants in the comparison.
5.2 LimitedMemory BFGS for Stabilization?
One potential downside to our optimizationbased stabilization procedure is that fullmemory BFGS inverse Hessian updating (GRANSO’s recommended setting for nonsmooth problems) requires periteration work and storage that is quadratic in the number of optimization variables. As the number of optimization variables is , the running time required to solve (9) could become unacceptably long as , the reduced order of the model, is increased. Thus, we now also consider whether or not limitedmemory BFGS updating can also be effective for solving (9).
Using limitedmemory BFGS has the benefit that the periteration work and storage is reduced to a linear amount (again in the number of optimization variables). However, one of the tradeoffs is that convergence can be quite slow in practice. For smooth problems, fullmemory BFGS converges superlinearly while limitedmemory BFGS only does so linearly; on nonsmooth problems, linear convergence is the best one can typically expect. Another potential issue is that while there has been much evidence supporting that fullmemory BFGS is very reliable for nonsmooth optimization, the case for using limitedmemory BFGS on nonsmooth problems is much less clear; for a good overview of the literature on this topic, see [18, Section 1].
To investigate this question, we reran the experiments from Section 5.1 a second time but where GRANSO’s limitedmemory BFGS mode was now enabled. Specifically, we configured GRANSO to approximate the inverse Hessian at each iteration using only the 10 most recently computed gradients, accomplished by setting opts.limited_mem_size = 10. All other parameters of our experimental setup were kept as they were described earlier.
In this limitedmemory configuration, GRANSO often required significantly more iterations, as one might expect. The average and max number of iterations to find the first stable version of a model were respectively 73.6 and 474, about an order of magnitude more iterations than incurred when using fullmemory BFGS. On the other hand, for 19 of the 24 problems, stable models were encountered within the first 20 iterations. To meet our full termination criteria, the average and max number of iterations were respectively 222.0 and 811, roughly about two and a half times more than incurred when using fullmemory BFGS. Nevertheless, 12 of the 24 problems were still satisfactorily solved in less than 20 iterations, matching the earlier result when using fullmemory BFGS. Despite the large increases in iteration counts, GRANSO’s overall runtime was on average 3.83 times faster when enabling limitedmemory BFGS.
With respect to output error in this limitedmemory evaluation, the resulting stabilized models still essentially matched the earlier results using fullmemory BFGS. There was one notable exception, for Target data using the smallest projection error of , where GRANSO remarkably found a betterperforming model when using limitedmemory BFGS. However, we did observe that the quality of the stabilized models appeared to be much more sensitive to changing GRANSO’s parameters than they were when using fullmemory BFGS. As a consequence, we still advocate that solving (9) with GRANSO is generally best done using its default fullmemory BFGS updating. Nonetheless, if this is simply not feasible computationally, one may still be able to obtain good results using limitedmemory BFGS but perhaps not as reliably or consistently.
As a final clarifying remark on this topic, we note that one cannot necessarily expect good performance on nonsmooth problems when using just any BFGSbased optimization code and that generally it is critical that the choice of software is one specifically designed for nonsmooth optimization. Indeed, this is highlighted in the evaluation done in [10, Section 6], where offtheshelf quasiNewtonbased codes built for smooth optimization perform much worse on a test set of nonsmooth optimization problems compared to the quasiNewtonbased codes specifically built with nonsmooth optimization in mind.
6 Conclusion
In this work, we evaluated the approximation quality of ioDMD system identification using a novel excitation scheme and a new optimizationbased, postprocessing procedure to ensure stability of the identified systems. Our new cross excitation strategy, particularly when used with random sampling, often produces better results than when using persistent excitation, and our experiments indicate that both excitation schemes are useful for efficiently obtaining good models for approximating the target data. Furthermore, we show that directly solving a nonsmooth constrained optimization problem can indeed be a viable approach for stabilizing ioDMDderived systems while retaining the salient properties for approximating the output response.
Code Availability
The source code of the presented numerical examples can be obtained from:
http://runmycode.org/companion/view/2902
and is authored by: Christian Himpe and Tim Mitchell.
References
 [1] A. Alla and J. N. Kutz. Nonlinear model order reduction via dynamic model decomposition. SIAM J. Sci. Comput., 39(5):B778–B796, 2017. doi:10.1137/16M1059308.
 [2] D. Amsallem and C. Farhat. Stabilization of projectionbased reducedorder models. Numerical Methods in Engineering, 91(4):358–377, 2012. doi:10.1002/nme.4274.
 [3] J. Annoni, P. Gebraad, and P. Seiler. Wind farm flow modeling using inputoutput dynamic mode decomposition. In American Control Conference (ACC), pages 506–512, 2016. doi:10.1109/ACC.2016.7524964.
 [4] J. Annoni and P. Seiler. A method to construct reducedorder parametervarying models. International Journal of Robust and Nonlinear Control, 27(4):582–597, 2017. doi:10.1002/rnc.3586.
 [5] A. C. Antoulas. Approximation of LargeScale Dynamical Systems, volume 6 of Adv. Des. Control. SIAM Publications, Philadelphia, PA, 2005. doi:10.1137/1.9780898718713.
 [6] K. J. Åström and P. Eykhoff. System identification – a survey. Automatica, 7(2):123–162, 1971. doi:10.1016/00051098(71)900598.
 [7] B. W. Brunton, L. A. Johnson, J. G. Ojemann, and J. N. Kutz. Extracting spatialtemporal coherent patterns in largescale neural recordings using dynamic mode decomposition. Journal of Neuroscience Methods, 258:1–15, 2016. doi:10.1016/j.jneumeth.2015.10.010.
 [8] J. V. Burke and M. L. Overton. Variational analysis of nonLipschitz spectral functions. Math. Program., 90(2, Ser. A):317–352, 2001. doi:10.1007/s102080010008.
 [9] K. K. Chen, J. H. Tu, and R. W. Rowley. Variants of dynamic mode decomposition: Boundary condition, Koopman, and Fourier analyses. Nonlinear Science, 22(6):887–915, 2012. doi:10.1007/s0033201291309.
 [10] F. E. Curtis, T. Mitchell, and M. L. Overton. A BFGSSQP method for nonsmooth, nonconvex, constrained optimization and its evaluation using relative minimization profiles. Optim. Methods Softw., 32(1):148–181, 2017. doi:10.1080/10556788.2016.1208749.
 [11] K. V. Fernando and H. Nicholson. On the structure of balanced and other principal representations of SISO systems. IEEE Trans. Autom. Control, 28(2):228–231, 1983. doi:10.1109/TAC.1983.1103195.
 [12] P. Holmes, J. L. Lumley, G. Berkooz, and C. W. Rowley. Turbulence, coherent structures, dynamical systems and symmetry. Cambridge Monographs on Mechanics. Cambridge University Press, Cambridge, 2012. doi:10.1017/CBO9780511919701.
 [13] T. C. Ionescu, K. Fujimoto, and J. M. A. Scherpen. The cross operator and the singular value analysis for nonlinear symmetric systems. In Proc. Eur. Control Conf., pages 1565–1570, 2009. URL: http://hdl.handle.net/11370/1bcfd3bec00649e0bbbe5c09ca055d87.
 [14] K. Katayama. Subspace Methods for System Identification. Communications and Control Engineering. Springer London, 2005. doi:10.1007/184628158X.
 [15] B. O. Koopman. Hamiltonian systems and transformation in Hilbert space. Proceedings of the National Academy of Sciences, 17(5):315–381, 1931. URL: http://www.pnas.org/content/17/5/315.full.pdf.
 [16] J. N. Kutz, S. L. Brunton, B. W. Brunton, and J. L. Proctor. Dynamic Mode Decomposition: DataDriven Modeling of Complex Systems. Society of Industrial and Applied Mathematics, Philadelphia, USA, 2016. doi:10.1137/1.9781611974508.
 [17] S. Lall, J. E. Marsden, and S. Glavaški. Empirical model reduction of controlled nonlinear systems. In Proc. of the IFAC World Congress, volume F, pages 473–478, 1999. URL: resolver.caltech.edu/CaltechAUTHORS:20101007154754737.
 [18] A. S. Lewis and M. L. Overton. Nonsmooth optimization via quasiNewton methods. Math. Program., 141(1–2, Ser. A):135–163, 2013. doi:10.1007/s1010701205142.
 [19] I. Mezic. Spectral properties of dynamical systems, model reduction and decompositions. Nonlinear Dynamics, 41(1):309–325, 2005. doi:10.1007/s110710052824x.
 [20] T. Mitchell. GRANSO: GRadientbased Algorithm for NonSmooth Optimization. http://timmitchell.com/software/GRANSO. See also [10].
 [21] J. Nocedal and S. J. Wright. Numerical Optimization. Springer, New York, 1999. doi:10.1007/b98874.
 [22] H. Oku and T. Fujii. Direct subspace model identification of LTI systems operating in closedloop. In 43rd IEEE Conference on Decision and Control, pages 2219–2224, 2004. doi:100.1109/CDC.2004.143378.
 [23] J. L. Proctor, S. L. Brunton, and J. N. Kutz. Dynamic mode decomposition with control. SIAM J. Applied Dynamical Systems, 15(1):142–161, 2016. doi:10.1137/15M1013857.
 [24] J. L. Proctor, S. L. Brunton, and J. N. Kutz. Generalizing Koopman theory to allow for inputs and control. arXiv eprints 1602.07647, Cornell University, 2016. math.OC. URL: https://arxiv.org/abs/1602.07647.
 [25] C. W. Rowley and S. T. M. Dawson. Model reduction for flow analysis and control. Annual Review of Fluid Mechanics, 49:387–417, 2017. doi:10.1146/annurevfluid010816060042.
 [26] C. W. Rowley, I. Mezic, S. Bagheri, P. Schlatter, and D. S. Henningson. Spectral analysis of nonlinear flows. Journal of Fluid Mechanics, 641:115–1127, 2009. doi:10.1017/S0022112009992059.
 [27] P. J. Schmid. Dynamic mode decomposition of numerical and experimental data. J. Fluid Mech., 656:5–28, 2010. doi:10.1017/S0022112010001217.
 [28] J. H. Tu, C. W. Rowley, D. M. Luchtenburg, S. L. Brunton, and J. N. Kutz. On dynamic mode decomposition: Theory and applications. Journal of Computational Dynamics, 1(2):391–421, 2014. doi:10.3934/jcd.2014.1.391.

[29]
P. M. J. Van Den Hof and R. J. P. Schrama.
An indirect method for transfer function estimation from closed loop data.
Automatica, 29(6):1523–1527, 1993. doi:10.1016/00051098(93)90015L.  [30] P. Van Overschee and B. de Moor. N4SID: numerical algorithms for state space subspace system identification. In IFAC Proceedings Volumes, volume 26, pages 55–58, 1993. doi:10.1016/S14746670(17)482218.
 [31] M. Viberg. Subspacebased methods for the identification of linear timeinvariant systems. Automatica, 31(12):1835–1851, 1995. doi:10.1016/00051098(95)001075.
Comments
There are no comments yet.