## 1 Introduction

Dynamical systems play a key role in shaping our understanding of the physical world and in defining our ability to predict the evolution of a given process. From the simple swinging motion of a clock pendulum to the complex flow around an airplane wing, the mathematical modeling of dynamical systems can yield a set of tools with which we can analyze the way the current state of the system depends on the past, and predict the possible states we may encounter in the future. Often such tools are precisely known, usually coming in the form of differential equations that are derived from first physical principles, such as the conservation of energy, mass, and momentum courant2008methods. However, in many cases, the sheer complexity of a system can prohibit our complete understanding and render a first principles approach infeasible. In this setting, one may be only able to postulate crude and potentially overly simplified models based on a given a set of empirical observations (see e.g., models for tumor growth michor2004dynamics, social dynamics castellano2009statistical, and the stock market gavin1989stock). In the present era of abundant data and advanced machine learning capabilities, a natural question arises: can we automatically discover sufficiently sophisticated and accurate mathematical models of complex dynamical systems directly from data?

The answer to this question pertains to the well-established field of systems identification ljung1998system. Discriminating between white-, gray-, and black-box

approaches, depending on whether a first principles modeling approach is fully, partially, or not admissible, systems identification aims to devise mathematical models for predicting a future state of a system, given the evolution of a set of previously observed or latent states. Specifically, in the context of identifying nonlinear dynamics, there exist several deterministic and probabilistic tools including radial basis functions

chen1990non, neural networks cochocki1993neural, Gaussian processes kocijan2005dynamic; raissi2017machine; raissi2017hidden; raissi2017inferring; raissi2017numerical, and nonlinear auto-regressive models such as NARMAX billings2013nonlineargoodfellow2016deep. A common theme among all such methods is the pursuit of learning a nonlinear and potentially multi-variate mapping that predicts the future system states given a set of data describing the present and past states. More recently, approaches based on symbolic regression schmidt2009distilling, sparse regression, and compressive sensing brunton2016discovering; rudy2017datawere able to go beyond estimating a

black-box approximation of the dynamics given by , and return more interpretable models that can uncover the full parametric form of an underlying governing equation. However, in order to obtain sparse representation of the dynamics, the aforementioned approaches have to rely upon the nontrivial task of choosing “appropriate” sets of basis functions. Consequently, investigating ways of incorporating broader function search spaces is an important area of current and future research.In this work, we introduce a novel approach to nonlinear systems identification that combines the classical multistep family of time-stepping schemes from numerical analysis iserles2009first with deep neural networks. Inspired by recent developments in

physics-informed deep learning

raissi2017physics_I; raissi2017physics_II, we construct structured nonlinear regression models that can discover the dynamic dependencies in a given set of temporal data-snapshots, and return a closed form model that can be subsequently used to forecast future states or identify basins of attraction. In contrast to recent approaches to systems identification brunton2016discovering; rudy2017data, here we do not have to have direct access or approximations to temporal gradients because the time derivatives are discretized using classical time-stepping rules. Moreover, we are using a richer family of function approximators and consequently we do not have to commit to a particular class of basis functions such as polynomials or sines and cosines. This comes at the cost of losing interpretability of the learned dynamics. However, there is nothing hindering the use of a particular class of basis functions and obtain more interpretable equations.This paper is structured as follows. In section 2 we provide a detailed overview of the proposed methodology.
In section 3.1, we investigate the performance of the proposed framework by applying our algorithm to the two-dimensional damped harmonic oscillator. We then explore the identification of chaotic dynamics of the Lorenz system in section 3.2. As an example of a high dimensional dynamical systems, in section 3.3, we study the Navier-Stokes equations describing the fluid flow behind a cylinder. To illustrate the ability of our method to identify parameterized dynamics, we consider the Hopf normal form in section 3.4. As an example of complicated nonlinear dynamics typical of biological systems, we explore the glycolytic oscillator model in section 3.5. It should be highlighted that all of the examples considered in this work are inspired by the pioneering work of Brunton *et. al.* brunton2016discovering. Moreover, all data and codes used in this manuscript are publicly available on GitHub at https://github.com/maziarraissi/MultistepNNs.

## 2 Problem setup and solution methodology

Let us consider nonlinear dynamical systems of the form^{1}^{1}1It is straightforward to generalize the dynamics to include
parameterization, time dependence, and forcing. In particular, parameterization, time dependence, and external forcing or feedback control

may be added to the vector field according to

(1) |

where the vector denotes the state of the system at time and the function describes the evolution of the system. Given noisy measurements of the state of the system at several time instances , our goal is to determine the function and consequently discover the underlying dynamical system (1) from data. We proceed by applying the general form of a linear multistep method with steps to equation (1) and obtain

(2) |

Here, denotes the state of the system at time . Different choices for the parameters and result in specific schemes. For instance, the trapezoidal rule

(3) |

corresponds to the case where , , , and . We proceed by placing a neural network prior on the function

. The parameters of this neural network can be learned by minimizing the mean squared error loss function

(4) |

where

(5) |

is obtained from the multistep scheme (2).

## 3 Results

### 3.1 Two-dimensional damped oscillator

As a first illustrative example, let us consider the two-dimensional damped harmonic oscillator with cubic dynamics; i.e.,

(6) |

We use as initial condition and collect data from to with a time-step size of . The data are plotted in figure 1. We employ a neural network with one hidden layer and neurons to represent the nonlinear dynamics. As for the multistep scheme (2) we use Adams-Moulton with steps (i.e., the trapezoidal rule). Upon training the neural network, we solve the identified system using the same initial condition as the one above. Figure 1 provides a qualitative assessment of the accuracy in identifying the correct nonlinear dynamics. Specifically, by comparing the exact and predicted trajectories of the system, as well as the resulting phase portraits, we observe that the algorithm can correctly capture the dynamic evolution of the system.

To investigate the performance of the proposed work-flow with respect to different linear multi-step methods, we have considered the three families that are most commonly used in practice: Adams-Bashforth (AB) methods, Adams-Moulton (AM) methods, and the backward differentiation formulas (BDFs). In tables 1 and 2, we report the relative error between trajectories of the exact and the identified systems for different members of the class of linear multi-step methods.
Interestingly, the Adams-Moulton scheme seems to consistently return more accurate results compared to the Adams-Bashforth and BDF approaches. One intuitive explanation for this behavior stems from a closer inspection of equation 2. Specifically, the arrangement of the resulting terms for the Adams-Moulton schemes leads to a higher throughput of training data flowing through the
neural network during model training as compared to the Adams-Bashforth and BDF cases. This
helps regularize the neural network and eventually achieve a better calibration during training. Also, out
of the Adams-Moulton family, the trapezoidal rule seems to work the best in practice perhaps due to its superior stability properties iserles2009first. These performance characteristics should be interpreted as product of empirical evidence, and not as concrete theoretical properties of the method. Identification of the latter requires more extensive systematic studies that go beyond the scope of this paper.

SchemeM | 1 | 2 | 3 | 4 | 5 |
---|---|---|---|---|---|

Adams-Bashforth | 1.5e+00 | 3.1e-02 | 1.2e-01 | 4.3e-02 | 1.2e-02 |

Adams-Moulton | 8.8e-03 | 1.2e-02 | 1.6e-02 | 6.3e-03 | 1.1e-02 |

BDF | 1.3e+00 | 8.8e-03 | 1.3e-02 | 1.4e-02 | 1.7e-02 |

SchemeM | 1 | 2 | 3 | 4 | 5 |
---|---|---|---|---|---|

Adams-Bashforth | 1.5e+00 | 3.0e-02 | 9.7e-02 | 3.5e-02 | 1.2e-02 |

Adams-Moulton | 8.8e-03 | 1.0e-02 | 1.6e-02 | 5.8e-03 | 1.1e-02 |

BDF | 1.3e+00 | 8.6e-03 | 9.9e-03 | 1.4e-02 | 1.5e-02 |

In tables 3 and 4, we study the robustness of our results with respect to the gap between pairs of data and with respect to noise in the observations of the system. These results fail to reveal a consistent pattern as larger time-step sizes and larger noise corruption levels sometimes lead to superior accuracy and other times to inferior. In the latter cases, the reasons are obvious, namely that a larger gap makes the approximation in time less accurate, while too much noise is devastating because it would be harder to distinguish between noise and the true dynamics. On the other hand, we do observe some cases in which larger

and noise levels may actually help. In these cases, we believe that input noise can act as a regularization mechanism that increases the robustness of the model training procedure, similar to how it has been previously proposed in the neural network literature (see for e.g., denoising autoencoders

vincent2008extracting). Along the same lines, a bigger temporal gap helps because it makes two consecutive time snapshots carry more information simply because they are more dissimilar to one another. On the contrary, if is too small, the importance of the neural network becomes less and less pronounced as seen in equation (2), hence model training becomes infeasible. These empirical results indicate that there exists a problem-dependent sweet spot for the admissible values of the time-step and noise levels that can lead to the best predictive accuracy. Although one may have no control over the noise corrupting the data, the temporal gap could be treated as another hyper-parameter like the number of neurons and hidden layers when setting up the neural network.noise | 0.00% | 0.01% | 0.02% |
---|---|---|---|

0.01 | 5.7e-03 | 2.4e-02 | 2.2e-01 |

0.02 | 1.8e-02 | 1.1e-01 | 1.3e-01 |

0.03 | 3.8e-02 | 9.2e-02 | 7.8e-01 |

0.04 | 5.4e-02 | 4.0e-02 | 8.7e-01 |

0.05 | 8.3e-02 | 2.9e-01 | 9.2e-02 |

noise | 0.00% | 0.01% | 0.02% |
---|---|---|---|

0.01 | 5.9e-03 | 2.1e-02 | 2.2e-01 |

0.02 | 1.7e-02 | 1.0e-01 | 1.1e-01 |

0.03 | 3.3e-02 | 9.0e-02 | 7.9e-01 |

0.04 | 5.2e-02 | 3.8e-02 | 8.7e-01 |

0.05 | 7.2e-02 | 2.6e-01 | 8.2e-02 |

Finally, tables 5 and 6 study the robustness of our results with respect to the neural network structure. For this case, more accurate results seem to be obtained with increasing network depth, although increasing the network width seems to have a negative affect for more than neurons per layer. To fully quantify sensitivity with respect to network architecture a more systematic study involving multiple data-sets is needed.

LayersNeurons | 64 | 128 | 256 |
---|---|---|---|

1 | 9.8e-03 | 6.1e-03 | 3.7e-02 |

2 | 3.6e-03 | 1.2e-02 | 2.4e-02 |

3 | 3.4e-03 | 1.6e-02 | 4.2e-02 |

LayersNeurons | 64 | 128 | 256 |
---|---|---|---|

1 | 7.2e-03 | 5.4e-03 | 3.5e-02 |

2 | 3.3e-03 | 9.1e-03 | 2.0e-02 |

3 | 3.0e-03 | 1.4e-02 | 3.7e-02 |

### 3.2 Lorenz system

To explore the identification of chaotic dynamics evolving on a finite dimensional attractor, we consider the nonlinear Lorenz system lorenz1963deterministic

(7) |

We use as initial condition and collect data from to with a time-step size of . The data are plotted in figures 2 and 3. We employ a neural network with one hidden layer and neurons to represent the nonlinear dynamics. As for the multistep scheme (2) we use Adams-Moulton with steps (i.e., the trapezoidal rule). Upon training the neural network, we solve the identified system using the same initial condition as the one above. As depicted in figure 2, the learned system correctly captures the form of the attractor.

The Lorenz system has a positive Lyapunov exponent, and small differences between the exact and learned models grow exponentially, even though the attractor remains intact. This behavior is evident in figure 3, as we compare the exact versus the predicted trajectories. Small discrepancies due to finite accuracy in the predicted dynamics lead to large errors in the forecasted time-series after , despite the fact that the bi-stable structure of the attractor is well captured (see figure 2).

### 3.3 Fluid flow behind a cylinder

In this example we collect data for the fluid flow past a cylinder (see figure 4) at Reynolds number 100 using direct numerical simulations of the two dimensional Navier-Stokes equations. In particular, following the problem setup presented in kutz2016dynamic and Rudye1602614, we simulate the Navier-Stokes equations describing the two-dimensional fluid flow past a circular cylinder at Reynolds number 100 using the Immersed Boundary Projection Method taira2007immersed; colonius2008fast. This approach utilizes a multi-domain scheme with four nested domains, each successive grid being twice as large as the previous one. Length and time are non-dimensionalized so that the cylinder has unit diameter and the flow has unit velocity. Data is collected on the finest domain with dimensions at a grid resolution of . The flow solver uses a 3rd-order Runge Kutta integration scheme with a time step of t = 0.02, which has been verified to yield well-resolved and converged flow fields. After simulations converge to steady periodic vortex shedding, flow snapshots are saved every . We then reduce the dimension of the system by proper orthogonal decomposition (POD) berkooz1993proper; brunton2016discovering. The POD results in a hierarchy of orthonormal modes that, when truncated, capture most of the energy of the original system for the given rank truncation. The first two most energetic POD modes capture a significant portion of the energy; the steady-state vortex shedding is a limit cycle in these coordinates brunton2016discovering. An additional mode, called the shift mode, is included to capture the transient dynamics connecting the unstable steady state with the mean of the limit cycle noack2003hierarchy. The resulting POD coefficients are depicted in figure 5.

We employ a neural network with one hidden layer and neurons to represent the nonlinear dynamics shown in figure 5. As for the linear multistep scheme (2) we use Adams-Moulton with steps (i.e., the trapezoidal rule). Upon training the neural network, we solve the identified system. As depicted in figure 5, the learned system correctly captures the form of the dynamics and accurately reproduces the phase portrait, including both the transient regime as well as the limit cycle attained once the flow dynamics converge to the well known Kárman vortex street.

### 3.4 Hopf bifurcation

Many real-world systems depend on parameters and, when the parameters are varied, they may go through bifurcations. To illustrate the ability of our method to identify parameterized dynamics, let us consider the Hopf normal form

(8) |

Our algorithm can be readily extended to encompass parameterized systems. In particular, the system (8) can be equivalently written as

(9) |

We collect data from the Hopf system (9) for various initial conditions corresponding to different parameter values for . The data is depicted in figure 6. The identified parameterized dynamics is shown in figure 6 for a set of parameter values different from the ones used during model training. The learned system correctly captures the transition from the fixed point for to the limit cycle for .

### 3.5 Glycolytic oscillator

As an example of complicated nonlinear dynamics typical of biological systems, we simulate the glycolytic oscillator model presented in daniels2015efficient and brunton2016discovering

. The model consists of ordinary differential equations for the concentrations of 7 biochemical species; i.e.,

(10) | |||

The parameters of the model are chosen according to table 1 of daniels2015efficient. As shown in figure 7, data from a simulation of equation (3.5) are collected from to with a time-step size of . We employ a neural network with one hidden layer and neurons to represent the nonlinear dynamics. As for the multi-step scheme (2) we use Adams-Moulton with steps (i.e., the trapezoidal rule). Upon training the neural network, we solve the identified system using the same initial condition as the ones used for the exact system. As depicted in figure 7, the learned system correctly captures the form of the dynamics.

## 4 Summary and Discussion

We have presented a machine learning approach for extracting nonlinear dynamical systems from time-series data. The proposed algorithm leverages the structure of well studied multi-step time-stepping schemes such as Adams-Bashforth, Adams Moulton, and BDF families, to construct efficient algorithms for learning dynamical systems using deep neural networks. A key property of the proposed approach is the use of multiple steps which enables us to incorporate memory effects in learning the temporal dynamics and tackle problems with a nonlinear and non-Markovian dynamical structure. Specifically, the use of steps allows us to decouple the regression complexity due to several temporal lags, ultimately leading to a simpler -dimensional regression problem, as opposed to an -dimensional problem in the case of a brute force NARMAX or recurrent neural network approaches. Although state-of-the-art results are presented for a diverse collection of benchmark problems, there exist a series of open questions mandating further investigation. How could one handle a variable temporal gap

, i.e., irregularly sampled data in time? How would common techniques such as batch normalization, drop out, and

/ regularization enhance the robustness of the proposed algorithm and mitigate the effects of over-fitting? How could one incorporate partial knowledge of the dynamical system in cases where certain interaction terms are already known? In terms of future work, interesting directions include the application of convolutional architectures goodfellow2016deep for mitigating the complexity associated with very high-dimensional inputs, as well as studying possible connections with recent studies linking deep neural networks with numerical methods and dynamical systems chang2017multi; lu2017beyond.## Acknowledgements

This work received support by the DARPA EQUiPS grant N66001-15-2-4055 and the AFOSR grant FA9550-17-1-0013. All data and codes used in this manuscript are publicly available on GitHub at https://github.com/maziarraissi/MultistepNNs.