1. Problem Statement
Consider a dynamical system in dimensions of the form
(1.1) 
and measurement data given at timepoints by
The matrix represents i.i.d. measurement noise. The focus of this article is the reconstruction of the dynamics (1.1) from the measurements y.
The SINDy algorithm (Sparse Identification of Nonlinear Dynamics [1]) has been shown to be successful in solving this problem for sparsely represented nonlinear dynamics when noise is small and dynamic scales do not vary across multiple orders of magnitude. This framework assumes that the function in (1.1) is given componentwise by
(1.2) 
for some known family of functions and a sparse weight matrix . Here and throughout we define . The problem is then transformed into solving for by building a data matrix given by
so that the candidate functions are directly evaluated at the noisy data. Solving (1.1) for then reduces to solving
(1.3) 
for a sparse weight matrix w, where is the numerical time derivative of y.
Sequentiallythresholded least squares is then used to arrive at a sparse solution.
The formulation of system discovery problems in terms of a candidate basis of nonlinear functions (1.2) and subsequent discretization (1.3) was first introduced in [7]
in the context of catastrophe prediction with compressed sensing techniques used to enforce sparsity. Since then there has been an explosion of interest in the problem of identifying nonlinear dynamical systems from data, with some of the primary techniques being Gaussian process regression, deep neural networks (
[5]), Bayesian inference (
[9], [10]) and a variety of methods from numerical analysis ([2], [3]). These techniques have been successfully applied to discovery of both ordrinary and partial differential equations. The variety of approaches qualitatitively differ in the interpretability of the resulting datadriven dynamical system, the practicality of the algorithm, and the robustness due to noise, scale separation, and so on. For instance, a neuralnetwork based datadriven dynamical system does not easily lend itself to physical interpretation. As well, certain sparsification techniques are not practical to the general scientific community, where the problem of system identification from data is ubiquitous. The SINDy algorithm allows for direct interpretations of the dynamics from identified differential equations and uses sequentially thresholded leastsquares to enforce sparsity, which is not nearly as robust as other approaches but is easy to implement and has been proven to converge to sparse local minimizers in
[8]. For these reasons, we use sequential thresholding in this article to demonstrate the viability of the method, and note that improvement using a more robust sparsifying strategy is possible.The aim of the present article is to provide rigorous justification for using the weak formulation of the dynamics in place of local pointwise derative approximations, as well as a robust algorithm for doing so. As such, we restrict numerical experiments to autonomous ordinary differential equations for their immenability to analysis. In future works, more robust sparisification measures will be explored. As well, adaptation to PDEs and nonautonomous systems is forthcoming. We note that the use of integral equations for system identification was introduced in
[6], where compressed sensing techniques were used to enforce sparsity, and that this technique can be seen as a special case of the method introduced here. In section 2 we introduce the algorithm with analysis of the resulting error structure and in section 3 we provide numerical experimentation with a range of nonlinear systems.2. Weak SINDy
In this article, we approach the problem (1.3) from a different perspective, by utilizing the weak form of the differential equation. Recall that for any smooth test function (absolutely continuous is enough) and interval , equation (1.1) admits the weak formulation
(2.1) 
With , we arrive at the integral equation of the dynamics explored in [6]. If we instead take to be nonconstant and compactly supported in , we arrive at
(2.2) 
We then define the generalized residual for a given test function by replacing with a candidate element from the span of and x with y as follows:
(2.3) 
Clearly with and we have for all compactlysupported in ; however, y is a discrete set of data, hence (2.3) can at best be approximated numerically, with measurement noise presenting a significant barely to accurate quadrature.
2.1. Method Overview
For analogy with traditional Galerkin methods, consider the forward problem of solving a dynamical system such as (1.1) for x. The Galerkin approach is to seek a solution x represented in a chosen trial basis such that the residual , defined by
is minimized over all test functions living in the span of a given test function basis . If the trial and test function bases are known analytically, inner product of the form appearing in the residual can be computed exactly, so that computational errors result only from representing the solution in a finitedimensional function space.
The method we present here can be considered a datadriven Galerkin method of solving for where the trial “basis” is given by the set of gridfunctions evaluated at the data and only the testfunction basis is known analytically. In this way, inner products appearing in must be approximated numerically, implying that the accuracy of the recovered weights w is ultimately limited by the quadrature scheme used to discretize inner products. Using Lemma 2.1 below, we show that the correct coefficients may be recovered to effective machine precision accuracy (given by the tolerance of the forward ODE solver) from noisefree trajectories y by discretizing (2.2) using the trapezoidal rule and choosing to decay smoothly to zero at the boundaries of its support.
Having chosen a quadrature scheme, the next accuracy barrier is presented by measurement noise, which introduces a bias in the weights. Below we analyze the distribution of the residuals to arrive at a generalized least squares approach where the covariance matrix can be computed directly from the test functions. This analysis also shows that placing test functions near steep gradients in the dynamics improves recovery, hence we develop a selfconsistent and stable algorithm for constructing a test function basis adaptively near these regions which also does not rely on pointwise approximation of derivatives. Overall, we show that when noise is present, our method produces a recovered weight matrix
with the number of significant digits scaling optimally with the signaltonoise ratio
(defined below).Remark. The weak formulation of the dynamics (2.2) introduces a wealth of information. Given timepoints , equation (2.2) affords residuals over all possible supports . Using multiple families of test functions over these supports can be viewed as a type of data assimilation, with the action of each test function providing useful information for the covariance structure of the residuals. The information complexity of such an exhaustive approach quickly becomes computationally intractable; however, we show here that even with large noise recovery the true weights using .
2.2. Algorithm: Weak SINDy
We state here the Weak SINDy algorithm in full generality, which uses a generalized least squares approach with regularization and sequential thresholding to enforce sparsity.
:

Construct matrix of trial gridfunctions

Construct integration matrices V, from test functions such that the th rows satisfy ,

Compute Gram matrix and righthand side so that and

Solve the generalized leastsquares problem with regularization
using SINDy to enforce sparsity.
Below we analyze the residual to arrive at near optimal strategies for choosing and directly from the data . The choice of test function basis is chosen in tandem with a quadrature rule to eliminate numerical errors. Methods of choosing the sparsity and regularization parameters and a priori from the data exist [4], however in this article we do not optimize this process in order to focus on errors resulting from the weak formulation of the dynamics.
2.3. Residual Analysis
Performance of the Weak SINDy method is determined by the behavior of the residuals
Expanding this out and linearizing around the true data, we have
where . The errors manifest in the following ways:

is the misfit between w and

results from measurement error in trial gridfunctions

results from replacing with in the righthand side of (1.1)

is the integration error ( by Lemma 2.1)

results from truncating the Taylor expansion of around the exact data x:
Clearly, recovery of when is straight forward: and are the only error terms, thus one only needs to select a quadrature scheme so that the integration error is negligible will be the minimizer (Figure 2 demonstrates this fact).
For , accurate recovery of
requires one to choose hyperparameters that exemplify the true misfit term
by enforcing that the other error terms are of lower order. We look for and that approximately enforceutilizing that minimizing the residual in the leastsquares sense corresponds to normallydistributed errors.
Approximate Covariance
Neglecting and , we can rewrite with and together as
with
where is the identity in . Assuming are i.i.d with mean and variance
, the transformed random variables
are also i.i.d. and so the Lyupanov central limit theorem asserts that
with
If it holds that , then we get
If we then consider the entire residual vector
, this corresponds to the approximate distributionwith , or .
Adaptive Refinement
Before advocating for a particular type of test function basis, we show that by localizing around large , an approximate cancellation of the error terms and results. Consider the onedimensional case (). When is large compared to , we approximately have
(2.4) 
for some , i.e. the perturbed value lands close to the true trajectory x at the time
. Visually this is clear, and for a heuristic argument, let
be the point of intersection between the tangent line at and :When is large compared to , such an intersection exists and will be very small, hence will approximately lie on the true trajectory.
If we linearize using this approximation we get
(2.5) 
Assuming is sufficiently localized around , (2.4) implies that
hence , while (2.5) implies
having integrated by parts. Putting the pieces together, the residual takes the form
and we see that and have effectively canceled. In higher dimensions this interpretation breaks down, but nevertheless, for any given coordinate , it holds that terms in the error expansion vanish around points where is large, precisely because .
2.4. Test Function Basis
Here we arrive at a test functions space and quadrature scheme to minimize integration errors and enact the heuristic arguments above, which rely on sufficiently localized. To ensure the integration error in approximating inner products is negligible, we rely on the following lemma, which provides a bound on the error in discretizing the weak derivative relation
(2.6) 
for compactly supported .
Lemma 2.1 (Numerical Error in Weak Derivatives).
Let have continuous derivatives of order and define . Then if has roots of multiplicity , then
(2.7) 
where . In other words, the composite trapezoidal rule discretizes the weak derivative relation (2.6) to order .
Proof.
This is a simple consequence of the EulerMaclaurin formula. If is a smooth function, then
where are the Bernoulli numbers. The asymptotic expansion provides corrections to the trapezoidal rule that realize machine precision accuracy up until a certain value of , after which terms in the expansion grow and the series diverges.
In our case, where the root conditions on imply that
So for odd, we have that
For even , the leading term is with a slightly different coefficient. ∎
We now define the test functions space to be unimodal piecewise polynomials of the form
where satisfies and . Functions are compactly supported in and nonnegative. The normalization
ensures that . With , the exact leading order error in term in (2.7) is
which is negligible for a wide range of reasonable and values. The Bernoulli numbers eventually start growing like , but for smaller values of they are moderate. For instance, with and , this error term is up until , where it takes the value , while for , the error is below machine precision for all between 7 and 819. For these reasons, in what follows we choose test functions and discretize all integrals are discretized using the trapezoidal rule. Unless otherwise stated, each function satsifies and so is fully determined by the tuple indicating its polynomial degree and support.
Strategy 1: Uniform Grid
The simplest strategy for choosing a test function basis is to place uniformly on the interval each with support size
and a polynomial degree . This is the uniform grid strategy, only in light of the residual analysis above we introduce two parameters and below to pick the degree and the number of basis functions . We fix the support by
(2.8) 
where is the magnitude of the th Fourier mode of y minus its mean. Large changes in the dynamics are most likely to occur within time intervals of length equal to the largest Fourier mode. We let which selects the polynomial degree as follows. Analytically,
and so we set . The shift parameter is then used to determine and the endpoints :
in other words is the height of intesection between and . This fixes for each pair of neighboring test functions and measures the amount of overlap between successive test functions, which factors into the covariance matrix . Larger implies that neighboring basis functions overlap on more points, with indicating that . Specifically, neighboring basis functions overlap on timepoints.
In Figures 3 and 4 we vary the parameters and and observe that results agree with intuition: larger and smaller lead to better recovery of .
:

Construct matrix of trial gridfunctions

Construct integration matrices V, from test functions parametrizes according to and (2.8).

Compute Gram matrix and righthand side so that and

Compute approximate covariance and Cholesky factorization

Solve the regularized weighted leastsquares problem
using SINDy to enforce sparsity and applied using C.
Strategy 2: Adaptive Grid
Motivated by the arguments above, we now introduce an algorithm for constructing a test function basis localized near points of large change in the dynamics. This occurs in 3 steps: (1) construct a weak approximation to the derivative of the dynamics , (2) sample points c from a cumulative distribution with density proportional to the total variation , (3) construct test functions centered at c using a widthathalfmax parameter . Each of these steps is numerically stable and carried out independently along each dimension. A visual diagram is provided in Figure 1.
Step 1: Weak Derivative Approximation: Define , where the matrix enacts a linear convolution with the derivative of a piecewise polynomial test function of degree and support size so that
The parameters and are chosen by the user, with and corresponding to taking a centered finite difference derivative with 3point stencil. Larger results in more smoothing and minimizes the corruption from noise while still capturing the correct large deviations in the dynamics. For all examples we let and and note that greater disparity between and
results in more pronounced localization (less uniform distribution).
Step 2: Selecting c: Having computed v, define to be the cumulative sum of normalized so that . In this way
is a valid cumulative distribution function with density proportional to the total variation of
y. We then find c by sampling form . Let with begin the size of the test function basis, we then define , or numerically,This stage requires the use to select the number of test functions .
Step 3: Construction of Test functions : Having chosen the location of the centerpoint for each test function , we are left to choose the degree of the polynomial and the supports . The degree is chosen according to the widthathalfmax parameter , which specifies the difference in timepoints between each center and , while the supports are chosen such that . This gives us a nonlinear system of two equations in two unknowns which can be easily solved (i.e. using MATLAB’s fzero). This can be done for one reference test functions and the rest of the weights obtained by translation.
The adaptive grid Weak SINDy algorithm is summarized as follows:
:

Construct matrix of trial gridfunctions

Construct integration matrices V, from test functions determined adaptively by according to the procedure above.

Compute Gram matrix and righthand side so that and

Compute approximate covariance and Cholesky factorization

Solve the regularized weighted leastsquares problem
using SINDy to enforce sparsity and applied using C.
The parameters and play a role in determining how localized the test function basis is around steep gradients and ultimately depend on the timestep . As mentioned above, we set and throughout as this produces sufficient localization for the example featured in this article. For simplicity we fix the number of test functions to be a multiple of the number of trial functions (i.e. etc.). For larger noise it is necessary to use a larger basis, while for small noise it as often sufficient to set . The optimal value of
depends on the timescales of the dynamics and can be chosen from the data using the Fourier transform as in the uniform grid case, however for simplicity we set
throughout.3. Numerical Experiments
We now show that the Weak SINDy approach recovers the correct dynamics to high accuracy over a range of signaltonoise ratios. To generate exact data we use MATLAB’s ode45 with absolute and relative tolerance e. Noise is then added to the exact trajectories by fixing and adding i.i.d Gaussian noise to each data point with mean zero and variance is computed by
We examine the following canonical nonlinear systems with variations in the specified parameters:
Duffing  
Van der Pol  
LotkaVolterra  
Lorenz 
The Duffing equation and Van der Pol oscillator present cases of an approximately linear systems with cubic nonlinearities. Solutions to the Van der Pol oscillator and LotkaVolterra system exhibit orbits with variable speed of motion, in particular regions with rapid change between regions of very litte variation. For the Lorenz system, we focus on recovering the system in the chaotic regime. For this reason we fix the parameters of the differential equation to lie in the region with large Lyupanov exponents and vary the initial conditions. The initial conditions are chosen from a uniform distribution, and , which covers the strange attractor. In this case we see that if the initial conditions lead to trajectories which do not visit both sides of the strange attractor, then the system is not properly identified. This can be expected for recovery in general: trajectories that do not sample important regions of phase space cannot be relied upon to provide an accurate representation of the dynamics at large.
In what follows the sparsity and regularization parameters are set to and .
3.1. NoiseFree Data (Figure 2)
Here we show that in the zeronoise case () we recover the correct weight matrix to within the tolerance of the ODE solver, which is fixed at . We find that accurate recovery occurs regardless of sparsity enforcement or regularization, and so we set , orders of magnitude below any of the coefficients, and . For the datadriven trial basis , we include all polynomials up to degree 5 in the state variables as well as , for and ranging from to . In addition, we find that recovery occurs with the minimal number of basis functions , making the Gram matrix square. We use the uniform grid approach above with support selected from the Fourier transform of y and shift parameter fixed to ensure that .
The main goal of these figures is to examine the effect of increasing the polynomial degree
of the test functions to show that convergence to machine precision is realized in this limit (i.e. convergence to within the accuracy tolerance of the ODE solver). The only outlier in this regard is the Lorenz equation, where for some trajectories the system is not recovered. We find that recovery fails precisely for trajectories that do not visit both sides of the Lorenz attractor, and so inaccurate recovery can be attributed to not have visited a significant enough region of phase space.
Duffing  Van der Pol 
LotkaVolterra  Lorenz 
V1  V2  V3  V4  Notes  
Duffing,  
Van der Pol,  
LotkaVolterra,  
Lorenz  (see below)  (see below)      , 
3.2. SmallNoise Regime (Figures 3 and 4)
We now turn to the case of low to moderate noise levels, examining a signaltonoise ratio in the range . We observe another nice property of the Weak SINDy method, that the error in the coefficients scales with , in that the recovered coefficients having approximately significant digits.
We again use the uniform grid approach. We examine not only the polynomial degree but the number of basis functions used in recovery. To reiterate the arguments above, the magnitude of compared to affects the distribution of the residual, so we define and define the degree by fixing and then calculating . In this way, increasing corresponds to increasing . We look at which corresponds roughly to . This together with the spacing parameter determines the test functions basis. We enforce that two neighboring basis functions and intersect at a height of , so that with , the two functions perfectly overlap, and with their supports are disjoint. In this way, larger corresponds to higher . We examine . For (featured in both Figures) we note that as is varied from to , the number of basis functions ranges from to , or to .
We simulated 200 instantiations of noise for the Duffing equation and Van der Pol oscillator and for each noisy trajectory examined a range of the parameter values and . As one might expect form the noisefree case above, increasing leads monotonically to better recovery. In addition, increasing
also leads to better recovery. The mean and standard deviation of the coefficient error
are also pictured, along with sample resulting datadriven dynamical systems, with trajectories denoted by .




3.3. LargeNoise Regime (Figures 5 to 8)
For the last set of experiments we show that recovery in the large noise regime is satisfactory using strategy 2 (nonuniform grid). The signal to noise ratio is for the Duffing, Van der Pol and Lorenz equations, and for LotkaVolterra. In each case we set the weak differentiation parameters to and and the widthathalfmax to timepoints. For the 2D systems, we use test basis functions, while for the Lorenz equation were used. In each case the correct terms were identified with relative coefficient error less than , indicating approximately two significant digits. We plot the noisy data y, the true data x and the simulated datadriven dynamical systems in dynamo view and phase space to exemplify the separation of scales and the severity of the corruption from noise. We extend by .
For the Duffing equation, the datadriven trajectory diverges slightly from the true data as the system relaxes to equilibrium but is otherwise qualitatively accurate. The Van der Pol oscillator exhibits a case of nearly the same limit cycle being identified but the dominant timescale of is slightly different than that x, so that the datadriven trajectories diverge over time while visiting the same region of phase space. This suggests the level of accuracy needed to accurately recover systems from trajectories with sharp gradients. The recovered LotkaVolterra trajectory is nearly indistiguishable from the true data. The recovered Lorenz trajecgtory remains close to the true trajectory up until around , before the two diverge as is expected from chaotic dynamics. The Lorenz attractor is captured.




Comments
There are no comments yet.