A number of scientific machine learning (ML) tasks seek to discover a dynamical system whose solution is consistent with data (e.g. constitutive modelingpatel2020thermodynamically; karapiperis2021data; ghnatios2019data; masi2021thermodynamics, reduced-order modeling chen2021physics; lee2021deep; wan2018data, physics-informed machine learning karniadakis2021physics; wu2018physics, and surrogates for performing optimal control alexopoulos2020digital). A major challenge for this class of problems is the preservation of both numerical stability and physical realizability when performing out of distribution inference (i.e. extrapolation/forecasting). Unlike traditional ML for e.g. image/language processing tasks, engineering and science models pose strict requirements on physical quantities to guarantee properties such as conservation, thermodynamic consistency and well-posedness of resulting models baker2019workshop. These structural constraints translate to desirable mathematical properties for simulation, such as improved numerical stability and accuracy, particularly for chaotic systems lee2021machine; trask2020enforcing.
While so-called physics-informed ML (PIML) approaches seek to impose these properties by imposing soft physics constraints into the ML process, many applications require structure preservation to hold exactly; PIML requires empirical tuning of weighting parameters and physics properties hold only to within optimization error, which typically may be large wang2020understanding; rohrhofer2021pareto. Structure-preserving machine learning has emerged as a means of designing architectures such that physics constraints hold exactly by construction lee2021machine; trask2020enforcing. By parameterizing relevant geometric or topological structures, researchers obtain more data-efficient hybrid physics/ML architectures with guaranteed mathematical properties.
In this work we consider geometric structure preservation for dynamical systems hairer2006geometric; marsden1995introduction. Reversible bracket formalisms (e.g. Hamiltonian/Lagrangian mechanics) have been shown effective for learning reversible dynamics toth2019hamiltonian; cranmer2020lagrangian; lutter2018deep; chen2019symplectic; jin2020sympnets; tong2021symplectic; zhong2021benchmarking; chen2021data; bertalan2019learning, while dissipative metric bracket extensions provide generalizations to irreversible dynamics in the metriplectic formalism lee2021machine; desai2021port; hernandez2021structure; yu2020onsagernet; zhong2020dissipative. Accounting for dissipation in this manner provides important thermodynamic consistency guarantees: namely, discrete versions of the first and second law of thermodynamics along with a fluctuation-dissipation theorem for thermal systems at microscales. Training these models is however a challenge and leads to discovery of non-interpretable Casimirs (generalized entropy and energy functionals describing the system).
For denoting the state of a system at time , denoting a velocity with potentially exploitable mathematical structure, and denoting trainable parameters, we consider learning of dynamics of the form
We synthesize the Sparse Identification of Nonlinear Dynamics (SINDy) formalism brunton2016discovering with neural ordinary differential equations (NODEs) chen2018neural to obtain a framework for learning dynamics. This sparse dictionary-learning formalism allows learning of either interpretable when learning ”black-box” ODEs, or for learning more complicated bracket dynamics for which describe structure-preserving reversible and irreversible systems. Our main technical contributions include:
a novel extension of SINDy into neural ODE settings, including a training strategy with L1 weight decay and pruning,
the first application of SINDy with structure-preservation, including: black-box, Hamiltonian, GENERIC, and port-Hamiltonian formulations
empirical demonstration on the effectiveness of the proposed algorithm for a wide array of dynamics spanning: reversible, irreversible, and chaotic systems.
Sparse Identification of Nonlinear Dynamics (SINDy)
The Sparse Identification of Nonlinear Dynamics (SINDy) method aims to identify the dynamics of interest using a sparse set of dictionaries such that
where is the coefficient matrix and
denotes a “library” vector consisting of candidate functions, e.g., constant, polynomials, and so on.
From measurements of states (and derivatives), we can construct a linear system of equations:
where and are collections of the states and the (numerically approximated) derivatives sampled at time indices such that
A library, , consists of candidate nonlinear functions, e.g., constant, polynomial, and trigonometric terms:
where is a function of the quadratic nonlinearities such that the th row of is defined as, for example, with ,
Lastly, denotes the collection of the unknown coefficients , where , are sparse vectors.
Typically, only is available and is approximated numerically. Thus, in SINDy, is considered to be noisy, which leads to a new formulation:
where is a matrix of independent identically distributed unit normal entries and is noise magnitude. To compute the sparse solution , SINDy employs either the least absolute shrinkage and selection operator (LASSO) tibshirani1996regression or sequential threshold least-squares method brunton2016discovering. LASSO is an L1-regularized regression technique and the sequential threshold least-squares method is an iterative algorithm which repetitively zeroes out entries of and solves least-squares problems with remaining entries of .
One evident weakness of SINDy is that it requires the derivatives of the state variable either empirically measured or numerically computed. To resolve this issue, we propose to use the training algorithm introduced in neural ordinary differential equations (NODEs) chen2018neural with L1 weight decay. In addition, we propose the magnitude-based pruning strategy to retain only the dictionaries with significant contributions.
Neural Ordinary Differential Equations
are a family of deep neural network models that parameterize the time-continuous dynamics of hidden statesusing a system of ODEs:
where is a time-continuous representation of a state, is a parameterized (trainable) velocity function, and is a set of neural network weights and biases.
In the forward pass, given the initial condition , a hidden state at any time index can be obtained by solving the initial value problem (IVP). A black-box time integrator can be employed to compute the hidden states with the desired accuracy:
The model parameters
are then updated with an optimizer, which minimizes a loss function measuring the difference between the output and the target variables.
As in the original SINDy formulation, we assume that there is a set of candidate nonlinear functions to represent the dynamics
where, again, is a trainable coefficient matrix and denotes a library vector. In this setting, the learnable parameters of NODE become .
Sparsity inducing loss: L1 weight decay (Lasso)
Following SINDy, to induce sparsity in we add an L1 weight decay to the main loss objective for the mean absolute error:
Taking linear combinations of candidate functions in Eq. (10) admits implementation as a linear layer, specified by , which does not have biases or nonlinear activation. To further sparsify the coefficient matrix , we employ the magnitude-based pruning strategy:
We find that pruning is essential to find the sparse representation of and will provide empirical evidence obtained from the numerical experiments. In next Section, we detail how we use the pruning strategy during the training process.
We employ mini-batching to train the proposed neural network architecture. For each training step, we randomly sample trajectories from the training set and then randomly sample initial points from the selected trajectories to assemble sub-sequences of length . We solve IVPs with the sample initial points, measure the loss (Eq. (11)), and update the model parameters via Adamax. After the update, we prune the model parameters using the magnitude-based pruning as shown in (12). Algorithm 1 summarizes the training procedure.
|Eq. name||Ground truth||Identified|
|Van der Pol|
Experimental results with neural SINDy
We implement the proposed method in Python 3.7.2 and PyTorch 1.9.0 paszke2019pytorch. In all training, we use Adamax kingma2015Adam with an initial learning rate 0.01 and use exponential learning rate decay with the multiplicative factor, 0.9987. In the following experiments, we set the penalty weight as and the pruning threshold as . For ODESolve, we use the Dormand–Prince method (dopri5) dormand1980family with relative tolerance and absolute tolerance unless otherwise specified. All experiments are performed on Macbook pro with M1 CPU and 16 GB memory.
In the following experiment, we employ a set of polynomials, , where is the number of variables and is the maximal total degree of polynomials in the set. An example set, is given as
We examine the performance of the proposed method for a number of dynamical systems (see Table 1 for the equations):
Van der Pol oscillator,
To generate data, we base our implementation on the code from liu2020hierarchical: generating 1600 training trajectories, 320 validating trajectories, and 320 test trajectories. For the first four example problems, we use time step size and total simulation time . For Lorenz, we use and . For training, we use and
for the first four example problems and the last example problem, respectively. In Appendix, a comparison against a “black-box” feed-forward neural network with fully-connected layers is presented; we report the performance of both the black-box neural network and the proposed network measured in terms of time-instantaneous mean-squared error (MSE).
Table 1 shows the ground truth equations and equations identified by using neural SINDy. We use , , and , respectively, for the first three example problems, the fourth example problem, and the last example problem. During training, coefficients with small magnitude are pruned (i.e., setting them as 0) and are not presented in Table. We also report the learned coefficients without the pruning strategy in Appendix, which highlights that the pruning is required to zero out the coefficients of small magnitude.
Figure 1 depicts examples of reference trajectories and trajectories computed from identified dynamics, where the trajectories are chosen from the test set. As reported in the Appendix, time-instantaneous MSEs computed by nSINDy are several orders of magnitude smaller ( smaller) than those computed by the black-box NODEs.
Neural SINDy for structure preserving parameterization
GENERIC structure-preserving parameterization
In the following, we present neural SINDy for a structure-preserving reversible and irreversible dynamics modeling approach. An alternative formalism for irreversible dynamics based on port-Hamiltonians is included in the appendix.
We begin by reviewing the general equation for the nonequilibrium reversible–irreversible coupling (GENERIC) formalism.
|Eq. name||Parameterization||Ground truth||Identified|
|nSINDy - GNN|
In GENERIC, the evolution of an observable is assumed to evolve under the gradient flow
where and denote generalized energy and entropy, and and
denote a Poisson bracket and an irreversible metric bracket. The Poisson bracket is given in terms of a skew-symmetric Poisson matrixand the irreversible bracket is given in terms of a symmetric positive semi-definite friction matrix ,
Lastly, the GENERIC formalism requires two degeneracy conditions defined as
With the state variables , the GENERIC formalism defines the evolution of as
Parameterization for GENERIC
Here, we review the parameterization technique, developed in oettinger2014irreversible
and further extended into deep learning settings inlee2021machine. As for the Hamiltonian Eq. (29), we parameterize the total energy such as
with . Then we parameterize the symmetric irreversible dynamics via the bracket
The matrices, and , are skew-symmetric and symmetric positive semi-definite matrices, respectively, such that
where the skew-symmetry and the symmetric positive semi-definiteness can be achieved by the parameterization tricks
where and are matrices with learnable entries and, thus, .
With this parameterization, the irreversible part of the dynamics is given by
and, as a result, is now defined as
An example problem (with their reference mathematical formulations) considered in this paper is:
Damped nonlinear oscillator from shang2020structure: the ground truth equation can be written in the GENERIC formalism (Eq. (17)) with the following components:
Alternatively, the equation for the dynamics can be written as
|Eq. name||Parameterization||Ground truth||Identified|
|nSINDy - HNN|
|nSINDy - HNN|
In the experiment, we assume that we have knowledge on and that the measurements on are available. That is, the nSINDy method with the GENERIC structure preservation seeks the unknown and . For generating data, we base our implementation on the code from shang2020structure. We generate 800 training trajectories, 160 validation trajectories, and 160 test trajectories with and the simulation time . We use a dictionary consisting of polynomials and trigonometric functions:
We consider the same experimental settings that are used in the above experiments: Adamax optimizer with the initial learning rate 0.01, exponential learning rate decay with a factor 0.9987, L1-penalty weight , pruning threshold , dopri5 for the ODE integrator with and for relative and absolute tolerances, and, finally, .
Table 2 reports the coefficients of the identified systems when nSINDy and nSINDy - GNN are used. With , nSINDy fails to correctly identify the system, whereas nSINDy - GNN identifies the exact terms correctly and computes the coefficients accurately. Figure 2 depicts the ground truth trajectory and the trajectory computed from the learned dynamics function (nSINDy - GNN). Figure 3 shows the advantages of using GENERIC structure preservation via plots of and . The difference between structure-preserving and non-structure-preserving parameterization is shown dramatically in the plot of ; the structure-preserving parameterization produces , whereas the non-structure-preserving parameterization produces fluctuating values of .
Hamiltonian structure-preserving parameterization
In the following, we consider the Hamiltonian structure-preserving parameterization technique proposed in Hamiltonian neural networks (HNNs) Greydanus2019hnn: parameterizing the Hamiltonian function as such that
With the above parameterization and , the dynamics can be modeled as
i.e., is now defined as . Then the loss objective can be written as
As noted in lee2021machine, with canonical coordinates , and canonical Poisson matrix , and , the GENERIC formalism in Eq. (17) recovers Hamiltonian dynamics.
We examine the performance of the proposed method with a list of example dynamic systems:
For generating data, we base our implementation on the code from Greydanus2019hnn. We generate 800 training trajectories, 160 validation trajectories, and 160 test trajectories. For the mass spring and the pendulum problem, we use and , respectively, and the simulation time is set as and , respectively.
Table 3 shows the ground truth equations and equations identified by using neural SINDy. For the mass spring problem, we use and for the pendulum problem we use a dictionary consisting of polynomials and trigonometric functions:
Again, we use the same experimental settings described above for the GENERIC parameterization. In Table 3, we presented results obtained by two different parameterization techniques: 1) the “plain” dictionary approach (Eq. (10)) denoted by (nSINDy) and 2) the Hamiltonian approach (Eq (29)) denoted by (nSINDy - HNN).
Figure 4 depicts examples of reference trajectories and trajectories computed from identified dynamics, where the trajectories are chosen from the test set. Figure 5 depicts how the energy is being conserved in the dynamics learned with nSINDy-HNN, whereas the plain apporach (nSINDy) fails to conserve the energy.
The proposed method shares the same limitations of the original SINDy method: successful identification requires inclusion of the correct dictionaries in the library. Potential alternatives are well-studied in the literature and include either adding an extra “black-box” neural network to compensate missing dictionaries or designing a neural network that can learn dictionaries from data (such as in sahoo2018learning).
The gradient-based parameter update and the magnitude-based pruning could potentially zero out unwanted coefficients in a special case: when the signs of the coefficients need to be changed in the later stage of the training. If the magnitude of the loss term becomes very small (and the gradient as well), the updated coefficients may satisfy the pruning condition shown in (12).
Lastly, when adaptive step-size ODE solvers are used (e.g., dopri5), numerical underflow may occur. This can be mitigated by trying different initialization for the coefficients, or smaller batch length . However, further study regarding robust initialization is required.
, the proposed method uses a genetic algorithms to identify governing physical laws (Lagrangian or Hamiltonian) from measurements of real experiments. A seminal work on sparse regression methods for system identification has been proposed inbrunton2016discovering. Then the sparse regression methods have been extended to various settings, e.g., for model predictive control kaiser2018sparse
, and for identifying dynamics in latent space using autoencodershinton2006reducing and then learning parsimonious representations for the latent dynamics champion2019data
. Also, the sparse regression methods have been applied for identifying partial differential equations (PDEs)rudy2017data; rudy2019data. For identifying PDEs, deep-learning-based approaches such as physics-informed neural networks raissi2019physics and PDE-net long2018pde have been studied recently.
Structure preserving neural network
Designing neural network architectures that exactly enforces important physical properties has been an important topic and studied extensively. Parameterization techniques that preserve physical structure include Hamiltonian neural networks Greydanus2019hnn; toth2019hamiltonian, Lagrangian neural networks cranmer2020lagrangian; lutter2018deep, port-Hamiltonian neural networks desai2021port, and GENERIC neural networks hernandez2021structure; lee2021machine. Neural network architectures that mimic the action of symplectic integrators have been proposed in chen2019symplectic; jin2020sympnets; tong2021symplectic, and a training algorithm exploiting physical invariance for learning dynamical systems, e.g., time-reversal symmetric, has been studied in huh2020time.
We have proposed a simple and effective deep-learning-based training algorithm for a dictionary-based parameterization of nonlinear dynamics. The proposed algorithm is based on the training procedure introduced in neural ordinary differential equations (NODE) chen2018neural and employs L1-weight decay of model parameters and magnitude-based pruning strategy, inspired by sparse identification of nonlinear dynamics (SINDy) brunton2016discovering. We have further extended the dictionary-based parameterization approach to structure-preserving parameterization techniques, such as Hamiltonian neural networks, GENERIC neural networks, and port-Hamiltonian networks. For a suite of benchmark problems, we have demonstrated that the proposed training algorithm is very effective in identifying the underlying dynamics from data with expected gains from imposing structure-preservation.
N. Trask and P. Stinis acknowledge funding under the Collaboratory on Mathematics and Physics-Informed Learning Machines for Multiscale and Multiphysics Problems (PhILMs) project funded by DOE Office of Science (Grant number DE-SC001924). N. Trask and K. Lee acknowledge funding from the DOE Early Career program. Sandia National Laboratories is a multi-mission laboratory managed and operated by National Technology and Engineering Solutions of Sandia, LLC., a wholly owned subsidiary of Honeywell International, Inc., for the U.S. Department of Energy’s National Nuclear Security Administration under contract DE-NA0003525. This paper describes objective technical results and analysis. Any subjective views or opinions that might be expressed in the paper do not necessarily represent the views of the U.S. Department of Energy or the United States Government.
Appendix A Training without pruning
Tables 4 and 5 report the coefficients learned by training without the pruning-strategy. Although the algorithm successfully finds the significant coefficients, it fails to zero out the coefficients of the small contributions.
|Hyperbolic||Cubic oscillator||Van der Pol|
Appendix B Comparison with MLP
Figure depict time-instantaneous mean-squared errors measured between the ground truth trajectories and the trajectories computed from the learned dynamics functions,
. We consider the two models: the black-box neural ODE models and the proposed SINDy models. For the SINDy models, we use the same experimental settings considered in Experimental results Section. For the black-box neural ODE models, we consider feed-forward neural networks consisting of four layers with 100 neurons in each layer with the hyperbolic Tangent nonlinear activation (Tanh). For training the both models, we use the same optimizer, Adamax, and use the same initial learning rate 0.01 with the same decay strategy (i.e., exponential decay with the multiplicative factor 0.9987). Both models are trained over 500 epochs and 2000 epochs, respectively, for the first four benchmark problems and the last benchmark problem, respectively.
Appendix C Port-Hamiltonian structure preservation
Here, we consider the dictionary-based parameterization for the port-Hamiltonian systems. We follow the port-Hamilonian neural network formulation presented in desai2021port, which is written as:
where denotes the parameterized Hamiltonian function, denotes the nonzero coefficient for a damping term, and denotes a time-dependent forcing. We consider the dictionary-type parameterization:
set to be a trainable coefficient, and assume that is known.
As in desai2021port, we consider the Duffing equation, where the Hamiltonian function is defined as
With the Hamiltonian, the Duffing equation considering the damping term and the forcing can be written as:
In the following experiment, we consider for parameterizing the Hamiltonian function.
, , , , and . In the experiment, we assume that we know the values of and and identify the Hamiltonian function. The identified Hamiltonian function is and the identified value of .
, , , , and . The identified Hamiltonian function is and the identified value of .