## Summary

Automated data-driven modeling, the process of directly discovering the governing equations of a system from data, is increasingly being used across the scientific community. PySINDy is a Python package that provides tools for applying the sparse identification of nonlinear dynamics (SINDy) approach to data-driven model discovery. In this major update to PySINDy, we implement several advanced features that enable the discovery of more general differential equations from noisy and limited data. The library of candidate terms is extended for the identification of actuated systems, partial differential equations (PDEs), and implicit differential equations. Robust formulations, including the integral form of SINDy and ensembling techniques, are also implemented to improve performance for real-world data. Finally, we provide a range of new optimization algorithms, including several sparse regression techniques and algorithms to enforce and promote inequality constraints and stability. Together, these updates enable entirely new SINDy model discovery capabilities that have not been reported in the literature, such as constrained PDE identification and ensembling with different sparse regression optimizers.

## Statement of need

Traditionally, the governing laws and equations of nature have been derived from first principles and based on rigorous experimentation and expert intuition. In the modern era, cheap and efficient sensors have resulted in an unprecedented growth in the availability of measurement data, opening up the opportunity to perform automated model discovery using data-driven modeling. These data-driven approaches are also increasingly useful for processing and interpreting the information in these large datasets. A number of such approaches have been developed in recent years, including the dynamic mode decomposition [schmid2010dynamic, Kutz2016book], Koopman theory [Brunton2021koopman], nonlinear autoregressive algorithms [Billings2013book]

[pathak2018model, vlachas2018data, Raissi2019jcp], Gaussian process regression [raissi2017machine], operator inference and reduced-order modeling [Benner2015siamreview, peherstorfer2016data, qian2020lift][Bongard2007pnas, schmidt_distilling_2009], and sparse regression [brunton2016pnas]. These approaches have seen many variants and improvements over the years, so data-driven modeling software must be regularly updated to remain useful to the scientific community. The SINDy approach has experienced particularly rapid development, motivating this major update to aggregate these innovations into a single open-source tool that is transparent and easy to use for non-experts or scientists from other fields.

The original PySINDy code [de2020pysindy] provided an implementation of the traditional SINDy method [brunton2016pnas], which assumes that the dynamical evolution of a state variable follows an ODE described by a function ,

(1) |

SINDy approximates the dynamical system in Eq. (1) as a sparse combination of terms from a library of candidate basis functions

(2) |

where contain the sparse coefficients. In order for this strategy to be successful, a reasonably accurate approximation of should exist as a sparse expansion in the span of . Therefore, background scientific knowledge about expected terms in can be used to choose the library . To pose SINDy as a regression problem, we assume we have a set of state measurements sampled at time steps and rearrange the data into the data matrix ,

(3) |

A matrix of derivatives in time, , is defined similarly and can be numerically computed from . In this case, Eq. (2) becomes and the goal of the SINDy sparse regression problem is to choose a sparse set of coefficients that accurately fits the measured data in . We can promote sparsity in the identified coefficients via a sparse regularizer , such as the or norm, and use a sparse regression algorithm such as SR3 [champion2020unified] to solve the resulting optimization problem,

(4) |

The original PySINDy package was developed to identify a particular class of systems described by Eq. (1).
Recent variants of the SINDy method are available that address systems with control inputs and model predictive control (MPC) [Kaiser2018prsa, fasel2021sindy], systems with physical constraints [Loiseau2017jfm, kaptanoglu2020physics], implicit ODEs [mangan2016inferring, kaheman2020sindy], PDEs [Rudy2017sciadv, Schaeffer2017prsa], and weak form ODEs and PDEs [Schaeffer2017pre, Reinbold2020pre, messenger2021weakpde]. Other methods, such as ensembling and sub-sampling [maddu2019stability, reinbold2021robust], are often vital for making the identification of Eq. (1) more robust.
In order to incorporate these new developments and accommodate the wide variety of possible dynamical systems, we have extended PySINDy to a more general setting and added significant new functionality. Our code^{1}^{1}1https://github.com/dynamicslab/pysindy is thoroughly documented, contains extensive examples, and integrates a wide range of functionality, some of which may be found in a number of other local SINDy implementations^{2}^{2}2https://github.com/snagcliffs/PDE-FIND, https://github.com/eurika-kaiser/SINDY-MPC,

https://github.com/dynamicslab/SINDy-PI, https://github.com/SchatzLabGT/SymbolicRegression,

https://github.com/dynamicslab/databook_python, https://github.com/sheadan/SINDy-BVP,

https://github.com/sethhirsh/BayesianSindy, https://github.com/racdale/sindyr,

https://github.com/SciML/DataDrivenDiffEq.jl, https://github.com/MathBioCU/WSINDy_PDE,

https://github.com/pakreinbold/PDE_Discovery_Weak_Formulation, https://github.com/ZIB-IOL/CINDy. In contrast to some of these existing implementations, PySINDy is completely open-source, professionally-maintained (for instance, providing unit tests and adhering to PEP8 stylistic standards), and minimally dependent on non-standard Python packages.

## New Features

Given spatiotemporal data , and optional control inputs (note has been redefined here to be the product of the number of spatial measurements and the number of time samples), PySINDy can now approximate algebraic systems of PDEs (and corresponding weak forms) in up to 3 spatial dimensions. Assuming the system is described by a function , we have

(5) |

ODEs, implicit ODEs, PDEs, and other dynamical systems are subsets of Eq. (5). We can accommodate control terms and partial derivatives in the SINDy library by adding them as columns in , which becomes .

In addition, we have extended PySINDy to handle more complex modeling scenarios, including trapping SINDy for provably stable ODE models for fluids [kaptanoglu2021promoting], models trained using multiple dynamic trajectories, and the generation of many models with sub-sampling and ensembling methods for cross-validation and probabilistic system identification. In order to solve Eq. (5), PySINDy implements several different sparse regression algorithms. Greedy sparse regression algorithms, including step-wise sparse regression (SSR) [boninsegna2018sparse] and forward regression orthogonal least squares (FROLS) [Billings2013book], are now available. Figure 1 illustrates the PySINDy code structure, changes, and high-level goals for future work.

PySINDy includes extensive Jupyter notebook tutorials that demonstrate the usage of various features of the package and reproduce nearly the entirety of the examples from the original SINDy paper [brunton2016pnas], trapping SINDy paper [kaptanoglu2021promoting], and the PDE-FIND paper [Rudy2017sciadv]. We include an extended example for the quasiperiodic shear-driven cavity flow [callaham2021role]. As a simple illustration of the new functionality, we demonstrate how SINDy can be used to identify the Kuramoto-Sivashinsky (KS) PDE from data. We train the model on the first 60% of the data from Rudy et al. [Rudy2017sciadv], which in total contains 1024 spatial grid points and 251 time steps. The KS model is identified correctly and the prediction for on the remaining testing data indicates strong performance in Fig. 2.

(b) Flow chart for organizing the SINDy variants and functionality in the literature. Bright color boxes indicate the features that have been implemented through this work, roughly organized by functionality. Semi-transparent boxes indicate features that have not yet been implemented.

## Conclusion

The goal of the PySINDy package is to enable anyone with access to measurement data to engage in scientific model discovery. The package is designed to be accessible to inexperienced users, adhere to scikit-learn standards, include most of the existing SINDy variations in the literature, and provide a large variety of functionality for more advanced users. We hope that researchers will use and contribute to the code in the future, pushing the boundaries of what is possible in system identification.

## Acknowledgments

PySINDy is a fork of sparsereg [markus_quade_sparsereg]. SLB, AAK, KK, and UF acknowledge support from the Army Research Office (ARO W911NF-19-1-0045). JLC acknowledges support from funding support from the Department of Defense (DoD) through the National Defense Science & Engineering Graduate (NDSEG) Fellowship Program.

Comments

There are no comments yet.