Introduction
A number of scientific machine learning (ML) tasks seek to discover a dynamical system whose solution is consistent with data (e.g. constitutive modeling
patel2020thermodynamically; karapiperis2021data; ghnatios2019data; masi2021thermodynamics, reducedorder modeling chen2021physics; lee2021deep; wan2018data, physicsinformed machine learning karniadakis2021physics; wu2018physics, and surrogates for performing optimal control alexopoulos2020digital). A major challenge for this class of problems is the preservation of both numerical stability and physical realizability when performing out of distribution inference (i.e. extrapolation/forecasting). Unlike traditional ML for e.g. image/language processing tasks, engineering and science models pose strict requirements on physical quantities to guarantee properties such as conservation, thermodynamic consistency and wellposedness of resulting models baker2019workshop. These structural constraints translate to desirable mathematical properties for simulation, such as improved numerical stability and accuracy, particularly for chaotic systems lee2021machine; trask2020enforcing.While socalled physicsinformed ML (PIML) approaches seek to impose these properties by imposing soft physics constraints into the ML process, many applications require structure preservation to hold exactly; PIML requires empirical tuning of weighting parameters and physics properties hold only to within optimization error, which typically may be large wang2020understanding; rohrhofer2021pareto. Structurepreserving machine learning has emerged as a means of designing architectures such that physics constraints hold exactly by construction lee2021machine; trask2020enforcing. By parameterizing relevant geometric or topological structures, researchers obtain more dataefficient hybrid physics/ML architectures with guaranteed mathematical properties.
In this work we consider geometric structure preservation for dynamical systems hairer2006geometric; marsden1995introduction. Reversible bracket formalisms (e.g. Hamiltonian/Lagrangian mechanics) have been shown effective for learning reversible dynamics toth2019hamiltonian; cranmer2020lagrangian; lutter2018deep; chen2019symplectic; jin2020sympnets; tong2021symplectic; zhong2021benchmarking; chen2021data; bertalan2019learning, while dissipative metric bracket extensions provide generalizations to irreversible dynamics in the metriplectic formalism lee2021machine; desai2021port; hernandez2021structure; yu2020onsagernet; zhong2020dissipative. Accounting for dissipation in this manner provides important thermodynamic consistency guarantees: namely, discrete versions of the first and second law of thermodynamics along with a fluctuationdissipation theorem for thermal systems at microscales. Training these models is however a challenge and leads to discovery of noninterpretable Casimirs (generalized entropy and energy functionals describing the system).
For denoting the state of a system at time , denoting a velocity with potentially exploitable mathematical structure, and denoting trainable parameters, we consider learning of dynamics of the form
(1) 
We synthesize the Sparse Identification of Nonlinear Dynamics (SINDy) formalism brunton2016discovering with neural ordinary differential equations (NODEs) chen2018neural to obtain a framework for learning dynamics. This sparse dictionarylearning formalism allows learning of either interpretable when learning ”blackbox” ODEs, or for learning more complicated bracket dynamics for which describe structurepreserving reversible and irreversible systems. Our main technical contributions include:

a novel extension of SINDy into neural ODE settings, including a training strategy with L1 weight decay and pruning,

the first application of SINDy with structurepreservation, including: blackbox, Hamiltonian, GENERIC, and portHamiltonian formulations

empirical demonstration on the effectiveness of the proposed algorithm for a wide array of dynamics spanning: reversible, irreversible, and chaotic systems.
Sparse Identification of Nonlinear Dynamics (SINDy)
The Sparse Identification of Nonlinear Dynamics (SINDy) method aims to identify the dynamics of interest using a sparse set of dictionaries such that
(2) 
where is the coefficient matrix and
denotes a “library” vector consisting of candidate functions, e.g., constant, polynomials, and so on.
From measurements of states (and derivatives), we can construct a linear system of equations:
(3) 
where and are collections of the states and the (numerically approximated) derivatives sampled at time indices such that
(4) 
A library, , consists of candidate nonlinear functions, e.g., constant, polynomial, and trigonometric terms:
(5) 
where is a function of the quadratic nonlinearities such that the th row of is defined as, for example, with ,
(6) 
Lastly, denotes the collection of the unknown coefficients , where , are sparse vectors.
Typically, only is available and is approximated numerically. Thus, in SINDy, is considered to be noisy, which leads to a new formulation:
(7) 
where is a matrix of independent identically distributed unit normal entries and is noise magnitude. To compute the sparse solution , SINDy employs either the least absolute shrinkage and selection operator (LASSO) tibshirani1996regression or sequential threshold leastsquares method brunton2016discovering. LASSO is an L1regularized regression technique and the sequential threshold leastsquares method is an iterative algorithm which repetitively zeroes out entries of and solves leastsquares problems with remaining entries of .
Neural SINDy
One evident weakness of SINDy is that it requires the derivatives of the state variable either empirically measured or numerically computed. To resolve this issue, we propose to use the training algorithm introduced in neural ordinary differential equations (NODEs) chen2018neural with L1 weight decay. In addition, we propose the magnitudebased pruning strategy to retain only the dictionaries with significant contributions.
Neural Ordinary Differential Equations
NODEs chen2018neural
are a family of deep neural network models that parameterize the timecontinuous dynamics of hidden states
using a system of ODEs:(8) 
where is a timecontinuous representation of a state, is a parameterized (trainable) velocity function, and is a set of neural network weights and biases.
In the forward pass, given the initial condition , a hidden state at any time index can be obtained by solving the initial value problem (IVP). A blackbox time integrator can be employed to compute the hidden states with the desired accuracy:
(9) 
The model parameters
are then updated with an optimizer, which minimizes a loss function measuring the difference between the output and the target variables.
Dictionarybased parameterization
As in the original SINDy formulation, we assume that there is a set of candidate nonlinear functions to represent the dynamics
(10) 
where, again, is a trainable coefficient matrix and denotes a library vector. In this setting, the learnable parameters of NODE become .
Sparsity inducing loss: L1 weight decay (Lasso)
Following SINDy, to induce sparsity in we add an L1 weight decay to the main loss objective for the mean absolute error:
(11) 
where .
Pruning
Taking linear combinations of candidate functions in Eq. (10) admits implementation as a linear layer, specified by , which does not have biases or nonlinear activation. To further sparsify the coefficient matrix , we employ the magnitudebased pruning strategy:
(12) 
We find that pruning is essential to find the sparse representation of and will provide empirical evidence obtained from the numerical experiments. In next Section, we detail how we use the pruning strategy during the training process.
Training
We employ minibatching to train the proposed neural network architecture. For each training step, we randomly sample trajectories from the training set and then randomly sample initial points from the selected trajectories to assemble subsequences of length . We solve IVPs with the sample initial points, measure the loss (Eq. (11)), and update the model parameters via Adamax. After the update, we prune the model parameters using the magnitudebased pruning as shown in (12). Algorithm 1 summarizes the training procedure.
Eq. name  Ground truth  Identified 

Hyperbolic  
Cubic oscillator  
Van der Pol  
Hopf bifurcation  
Lorenz  
Experimental results with neural SINDy
We implement the proposed method in Python 3.7.2 and PyTorch 1.9.0 paszke2019pytorch. In all training, we use Adamax kingma2015Adam with an initial learning rate 0.01 and use exponential learning rate decay with the multiplicative factor, 0.9987. In the following experiments, we set the penalty weight as and the pruning threshold as . For ODESolve, we use the Dormand–Prince method (dopri5) dormand1980family with relative tolerance and absolute tolerance unless otherwise specified. All experiments are performed on Macbook pro with M1 CPU and 16 GB memory.
Dictionary construction
In the following experiment, we employ a set of polynomials, , where is the number of variables and is the maximal total degree of polynomials in the set. An example set, is given as
(13) 
Experiment set
We examine the performance of the proposed method for a number of dynamical systems (see Table 1 for the equations):

Hyperbolic example,

Cubic oscillator,

Van der Pol oscillator,

Hopf bifurcation,

Lorenz system.
To generate data, we base our implementation on the code from liu2020hierarchical: generating 1600 training trajectories, 320 validating trajectories, and 320 test trajectories. For the first four example problems, we use time step size and total simulation time . For Lorenz, we use and . For training, we use and
for the first four example problems and the last example problem, respectively. In Appendix, a comparison against a “blackbox” feedforward neural network with fullyconnected layers is presented; we report the performance of both the blackbox neural network and the proposed network measured in terms of timeinstantaneous meansquared error (MSE).
Table 1 shows the ground truth equations and equations identified by using neural SINDy. We use , , and , respectively, for the first three example problems, the fourth example problem, and the last example problem. During training, coefficients with small magnitude are pruned (i.e., setting them as 0) and are not presented in Table. We also report the learned coefficients without the pruning strategy in Appendix, which highlights that the pruning is required to zero out the coefficients of small magnitude.
Figure 1 depicts examples of reference trajectories and trajectories computed from identified dynamics, where the trajectories are chosen from the test set. As reported in the Appendix, timeinstantaneous MSEs computed by nSINDy are several orders of magnitude smaller ( smaller) than those computed by the blackbox NODEs.
Neural SINDy for structure preserving parameterization
GENERIC structurepreserving parameterization
In the following, we present neural SINDy for a structurepreserving reversible and irreversible dynamics modeling approach. An alternative formalism for irreversible dynamics based on portHamiltonians is included in the appendix.
GENERIC formalism
We begin by reviewing the general equation for the nonequilibrium reversible–irreversible coupling (GENERIC) formalism.
Eq. name  Parameterization  Ground truth  Identified 

DNO  nSINDy  
nSINDy  GNN 
In GENERIC, the evolution of an observable is assumed to evolve under the gradient flow
(14) 
where and denote generalized energy and entropy, and and
denote a Poisson bracket and an irreversible metric bracket. The Poisson bracket is given in terms of a skewsymmetric Poisson matrix
and the irreversible bracket is given in terms of a symmetric positive semidefinite friction matrix ,(15) 
Lastly, the GENERIC formalism requires two degeneracy conditions defined as
(16) 
With the state variables , the GENERIC formalism defines the evolution of as
(17) 
Parameterization for GENERIC
Here, we review the parameterization technique, developed in oettinger2014irreversible
and further extended into deep learning settings in
lee2021machine. As for the Hamiltonian Eq. (29), we parameterize the total energy such as(18) 
with . Then we parameterize the symmetric irreversible dynamics via the bracket
(19) 
where
(20) 
The matrices, and , are skewsymmetric and symmetric positive semidefinite matrices, respectively, such that
(21) 
where the skewsymmetry and the symmetric positive semidefiniteness can be achieved by the parameterization tricks
(22) 
where and are matrices with learnable entries and, thus, .
With this parameterization, the irreversible part of the dynamics is given by
(23) 
and, as a result, is now defined as
(24) 
with .
Experiments
An example problem (with their reference mathematical formulations) considered in this paper is:

Damped nonlinear oscillator from shang2020structure: the ground truth equation can be written in the GENERIC formalism (Eq. (17)) with the following components:
(25) (26) Alternatively, the equation for the dynamics can be written as
(27)
Eq. name  Parameterization  Ground truth  Identified 

Ideal massspring  nSINDy  
nSINDy  HNN  
Ideal pendulum  nSINDy  
nSINDy  HNN 
In the experiment, we assume that we have knowledge on and that the measurements on are available. That is, the nSINDy method with the GENERIC structure preservation seeks the unknown and . For generating data, we base our implementation on the code from shang2020structure. We generate 800 training trajectories, 160 validation trajectories, and 160 test trajectories with and the simulation time . We use a dictionary consisting of polynomials and trigonometric functions:
(28) 
We consider the same experimental settings that are used in the above experiments: Adamax optimizer with the initial learning rate 0.01, exponential learning rate decay with a factor 0.9987, L1penalty weight , pruning threshold , dopri5 for the ODE integrator with and for relative and absolute tolerances, and, finally, .
Table 2 reports the coefficients of the identified systems when nSINDy and nSINDy  GNN are used. With , nSINDy fails to correctly identify the system, whereas nSINDy  GNN identifies the exact terms correctly and computes the coefficients accurately. Figure 2 depicts the ground truth trajectory and the trajectory computed from the learned dynamics function (nSINDy  GNN). Figure 3 shows the advantages of using GENERIC structure preservation via plots of and . The difference between structurepreserving and nonstructurepreserving parameterization is shown dramatically in the plot of ; the structurepreserving parameterization produces , whereas the nonstructurepreserving parameterization produces fluctuating values of .
Hamiltonian structurepreserving parameterization
In the following, we consider the Hamiltonian structurepreserving parameterization technique proposed in Hamiltonian neural networks (HNNs) Greydanus2019hnn: parameterizing the Hamiltonian function as such that
(29) 
With the above parameterization and , the dynamics can be modeled as
(30) 
i.e., is now defined as . Then the loss objective can be written as
(31) 
As noted in lee2021machine, with canonical coordinates , and canonical Poisson matrix , and , the GENERIC formalism in Eq. (17) recovers Hamiltonian dynamics.
Experiments
We examine the performance of the proposed method with a list of example dynamic systems:

Ideal massspring,

Ideal pendulum.
For generating data, we base our implementation on the code from Greydanus2019hnn. We generate 800 training trajectories, 160 validation trajectories, and 160 test trajectories. For the mass spring and the pendulum problem, we use and , respectively, and the simulation time is set as and , respectively.
Table 3 shows the ground truth equations and equations identified by using neural SINDy. For the mass spring problem, we use and for the pendulum problem we use a dictionary consisting of polynomials and trigonometric functions:
(32) 
Again, we use the same experimental settings described above for the GENERIC parameterization. In Table 3, we presented results obtained by two different parameterization techniques: 1) the “plain” dictionary approach (Eq. (10)) denoted by (nSINDy) and 2) the Hamiltonian approach (Eq (29)) denoted by (nSINDy  HNN).
Figure 4 depicts examples of reference trajectories and trajectories computed from identified dynamics, where the trajectories are chosen from the test set. Figure 5 depicts how the energy is being conserved in the dynamics learned with nSINDyHNN, whereas the plain apporach (nSINDy) fails to conserve the energy.
Discussion
Limitation
The proposed method shares the same limitations of the original SINDy method: successful identification requires inclusion of the correct dictionaries in the library. Potential alternatives are wellstudied in the literature and include either adding an extra “blackbox” neural network to compensate missing dictionaries or designing a neural network that can learn dictionaries from data (such as in sahoo2018learning).
The gradientbased parameter update and the magnitudebased pruning could potentially zero out unwanted coefficients in a special case: when the signs of the coefficients need to be changed in the later stage of the training. If the magnitude of the loss term becomes very small (and the gradient as well), the updated coefficients may satisfy the pruning condition shown in (12).
Lastly, when adaptive stepsize ODE solvers are used (e.g., dopri5), numerical underflow may occur. This can be mitigated by trying different initialization for the coefficients, or smaller batch length . However, further study regarding robust initialization is required.
Related work
System identification
In schmidt2009distilling
, the proposed method uses a genetic algorithms to identify governing physical laws (Lagrangian or Hamiltonian) from measurements of real experiments. A seminal work on sparse regression methods for system identification has been proposed in
brunton2016discovering. Then the sparse regression methods have been extended to various settings, e.g., for model predictive control kaiser2018sparse, and for identifying dynamics in latent space using autoencoders
hinton2006reducing and then learning parsimonious representations for the latent dynamics champion2019data. Also, the sparse regression methods have been applied for identifying partial differential equations (PDEs)
rudy2017data; rudy2019data. For identifying PDEs, deeplearningbased approaches such as physicsinformed neural networks raissi2019physics and PDEnet long2018pde have been studied recently.Structure preserving neural network
Designing neural network architectures that exactly enforces important physical properties has been an important topic and studied extensively. Parameterization techniques that preserve physical structure include Hamiltonian neural networks Greydanus2019hnn; toth2019hamiltonian, Lagrangian neural networks cranmer2020lagrangian; lutter2018deep, portHamiltonian neural networks desai2021port, and GENERIC neural networks hernandez2021structure; lee2021machine. Neural network architectures that mimic the action of symplectic integrators have been proposed in chen2019symplectic; jin2020sympnets; tong2021symplectic, and a training algorithm exploiting physical invariance for learning dynamical systems, e.g., timereversal symmetric, has been studied in huh2020time.
Conclusion
We have proposed a simple and effective deeplearningbased training algorithm for a dictionarybased parameterization of nonlinear dynamics. The proposed algorithm is based on the training procedure introduced in neural ordinary differential equations (NODE) chen2018neural and employs L1weight decay of model parameters and magnitudebased pruning strategy, inspired by sparse identification of nonlinear dynamics (SINDy) brunton2016discovering. We have further extended the dictionarybased parameterization approach to structurepreserving parameterization techniques, such as Hamiltonian neural networks, GENERIC neural networks, and portHamiltonian networks. For a suite of benchmark problems, we have demonstrated that the proposed training algorithm is very effective in identifying the underlying dynamics from data with expected gains from imposing structurepreservation.
Acknowledgments
N. Trask and P. Stinis acknowledge funding under the Collaboratory on Mathematics and PhysicsInformed Learning Machines for Multiscale and Multiphysics Problems (PhILMs) project funded by DOE Office of Science (Grant number DESC001924). N. Trask and K. Lee acknowledge funding from the DOE Early Career program. Sandia National Laboratories is a multimission laboratory managed and operated by National Technology and Engineering Solutions of Sandia, LLC., a wholly owned subsidiary of Honeywell International, Inc., for the U.S. Department of Energy’s National Nuclear Security Administration under contract DENA0003525. This paper describes objective technical results and analysis. Any subjective views or opinions that might be expressed in the paper do not necessarily represent the views of the U.S. Department of Energy or the United States Government.
References
Appendix A Training without pruning
Tables 4 and 5 report the coefficients learned by training without the pruningstrategy. Although the algorithm successfully finds the significant coefficients, it fails to zero out the coefficients of the small contributions.
Hyperbolic  Cubic oscillator  Van der Pol  
4.0265e04  1.5801e04  1.1837e04  1.9252e04  7.6617e04  4.6053e04  
4.8321e02  6.7640e04  3.8775e04  2.1818e05  4.3242e04  9.9930e01  
2.2759e05  9.9818e01  3.4231e04  5.1497e04  1.0002e+00  1.9997e+00  
1.1598e04  1.0021e+00  8.5114e04  1.0675e04  4.3595e06  2.6983e04  
9.0811e04  1.3638e03  3.9183e04  1.0255e04  1.3413e03  1.3681e04  
9.9513e04  1.2584e03  4.7589e04  3.6600e04  8.9859e04  5.9787e04  
2.3237e03  1.3569e03  9.9985e02  2.0003e+00  6.0941e05  5.1159e05  
7.3889e04  1.1560e03  8.8203e04  8.3778e04  1.6378e04  2.0006e+00  
3.7055e04  3.2972e04  3.6603e04  6.3260e04  3.8560e04  3.5658e04  
2.8758e03  3.6145e04  2.0003e+00  1.0056e01  1.3837e04  6.8370e05 
Hopf bifurcation  Lorenz  
7.7859e05  5.4043e05  7.5873e05  5.1030e07  5.2767e07  8.4725e07  
9.0900e05  1.0005e+00  2.7873e04  5.6920e01  1.1720e+01  3.6492e07  
9.9948e01  1.6792e04  3.3130e04  4.5274e+00  7.2215e+00  1.4872e07  
2.5933e04  5.7727e04  4.2798e04  5.4791e07  3.4551e06  2.6812e+00  
3.8502e04  7.2785e04  2.8368e04  1.7573e06  1.7074e06  1.4821e02  
1.6046e03  4.7278e05  1.6167e03  1.4202e06  1.6468e06  9.6046e01  
9.9672e01  3.5927e04  5.1250e04  9.4857e01  4.6996e01  3.5173e06  
3.8864e04  3.1930e04  2.0894e04  1.5292e06  8.3760e06  1.6890e02  
4.5870e04  9.9483e01  8.0084e04  4.5105e01  6.7203e01  8.0195e06  
2.6971e04  5.6016e04  2.3888e04  8.3810e05  1.1768e04  2.0347e03  
9.9727e01  1.3899e05  1.2669e03  5.9848e02  9.4702e02  8.1650e07  
2.1682e03  9.9460e01  5.3697e04  6.0315e02  8.9793e02  4.1083e06  
2.4824e04  5.5955e04  2.7237e04  5.6009e06  5.7127e06  1.1296e04  
9.9607e01  1.2029e03  4.7030e04  1.9930e02  2.6786e02  1.4347e06  
5.4106e04  8.6669e04  8.0186e04  1.1614e05  1.3253e05  8.0456e04  
9.1186e04  1.8879e06  1.3931e04  2.1789e02  3.3973e02  3.2658e07  
7.2948e05  9.9499e01  1.5385e04  2.8419e03  3.5977e03  1.2999e07  
3.1669e04  2.1393e04  1.0944e03  4.1031e07  4.9989e07  3.6131e04  
4.0919e04  5.7416e04  7.1154e04  8.8740e03  1.2924e02  8.9630e07  
3.7226e04  1.8204e04  9.8719e05  2.7687e06  3.6289e06  6.3774e05 
Appendix B Comparison with MLP
Figure depict timeinstantaneous meansquared errors measured between the ground truth trajectories and the trajectories computed from the learned dynamics functions,
. We consider the two models: the blackbox neural ODE models and the proposed SINDy models. For the SINDy models, we use the same experimental settings considered in Experimental results Section. For the blackbox neural ODE models, we consider feedforward neural networks consisting of four layers with 100 neurons in each layer with the hyperbolic Tangent nonlinear activation (Tanh). For training the both models, we use the same optimizer, Adamax, and use the same initial learning rate 0.01 with the same decay strategy (i.e., exponential decay with the multiplicative factor 0.9987). Both models are trained over 500 epochs and 2000 epochs, respectively, for the first four benchmark problems and the last benchmark problem, respectively.
Appendix C PortHamiltonian structure preservation
Here, we consider the dictionarybased parameterization for the portHamiltonian systems. We follow the portHamilonian neural network formulation presented in desai2021port, which is written as:
(33) 
where denotes the parameterized Hamiltonian function, denotes the nonzero coefficient for a damping term, and denotes a timedependent forcing. We consider the dictionarytype parameterization:
(34) 
set to be a trainable coefficient, and assume that is known.
As in desai2021port, we consider the Duffing equation, where the Hamiltonian function is defined as
(35) 
With the Hamiltonian, the Duffing equation considering the damping term and the forcing can be written as:
(36) 
In the following experiment, we consider for parameterizing the Hamiltonian function.
Nonchaotic case:
, , , , and . In the experiment, we assume that we know the values of and and identify the Hamiltonian function. The identified Hamiltonian function is and the identified value of .
Chaotic case
, , , , and . The identified Hamiltonian function is and the identified value of .
Comments
There are no comments yet.